Ask Questions and Find Answers
Important:
Ask is now read-only. You can review any existing questions and answers, but not add anything new.
But - don't panic! While ask is no more, we've replaced it with discuss - the new Liferay Discussion Forum! Read more here here or just visit the site here:
discuss.liferay.com
Problems with new Google news bot
Hello,
We have just started seeing these requests in our web server log files:
66.249.65.4 - - [07/Dec/2009:15:49:41 +0000] "GET /somepage/somechildpage;!-1835184563!1087628637!1260200943826 HTTP/1.1" 404 818 "-" "Googlebot-News"
It seems Google have started using a different bot for crawling news content (http://googlewebmastercentral.blogspot.com/2009/12/new-user-agent-for-news.html). The problem appears to be Liferay sees the ';!-1835...' as part of the page name which obviously doesn't exist and therefore a 404 is returned.
How can I tell Liferay to ignore everything from the ';' so it just looks for a page called '/somepage/somechildpage'?
Many thanks,
Mike.
We have just started seeing these requests in our web server log files:
66.249.65.4 - - [07/Dec/2009:15:49:41 +0000] "GET /somepage/somechildpage;!-1835184563!1087628637!1260200943826 HTTP/1.1" 404 818 "-" "Googlebot-News"
It seems Google have started using a different bot for crawling news content (http://googlewebmastercentral.blogspot.com/2009/12/new-user-agent-for-news.html). The problem appears to be Liferay sees the ';!-1835...' as part of the page name which obviously doesn't exist and therefore a 404 is returned.
How can I tell Liferay to ignore everything from the ';' so it just looks for a page called '/somepage/somechildpage'?
Many thanks,
Mike.
Community
Company
Feedback