Problems with new Google news bot

thumbnail
34016, modified 16 Years ago. New Member Posts: 24 Join Date: 7/26/07 Recent Posts
Hello,

We have just started seeing these requests in our web server log files:

66.249.65.4 - - [07/Dec/2009:15:49:41 +0000] "GET /somepage/somechildpage;!-1835184563!1087628637!1260200943826 HTTP/1.1" 404 818 "-" "Googlebot-News"

It seems Google have started using a different bot for crawling news content (http://googlewebmastercentral.blogspot.com/2009/12/new-user-agent-for-news.html). The problem appears to be Liferay sees the ';!-1835...' as part of the page name which obviously doesn't exist and therefore a 404 is returned.

How can I tell Liferay to ignore everything from the ';' so it just looks for a page called '/somepage/somechildpage'?


Many thanks,

Mike.