RE: What's the proper robots.txt configuration for disallowing access to bots?

Jamie Sammons, modified 1 Year ago.

What's the proper robots.txt configuration for disallowing access to bots?

New Member Posts: 2 Join Date: 6/16/23 Recent Posts

Dear Support Team,

We are facing an issue with suspicious traffic to the website which seems to be originating from various msn/bing bots trying to index various parts/subpages of the website.

I've updated the robots.txt configuration of the Public Pages of the site, to the following rules:

User-Agent: *
Disallow:
User-agent: bingbot
Disallow: /
User-agent: msnbot
Disallow: /
Sitemap: [$PROTOCOL$]://[$HOST$]:[$PORT$]/sitemap.xml

This would disallow accessing the site pages from Agent that contain 'bingbot' or 'msnbot' in the String(s).

Since this doesn't seem to have stopped the bots from crawling the website, do I need to add anything else to these rules or somehow add/re-apply anything else?

Kind Regards,

Antonis

configuration

Olaf Kock, modified 1 Year ago.

RE: What's the proper robots.txt configuration for disallowing access to bots?

Liferay Legend Posts: 6441 Join Date: 9/23/08 Recent Posts

robots.txt needs to be served from the root directory of your server - e.g. example.com/robots.txt - in case you're configuring this in a secondary site, without declaring a virtual host, this particular robots.txt might appear under example.com/web/sitename/robots.txt - you might want to edit the robots.txt of your default site (typically /web/guest), as that's what appears in the root.

Also note that robots.txt is a "recommendation", that robots typically honor, but there are also rogue robots that don't care about your recommendation.

Aravinth Kumar, modified 1 Year ago.

RE: What's the proper robots.txt configuration for disallowing access to bots?

Regular Member Posts: 152 Join Date: 6/26/13 Recent Posts

Hi Antonis,

There are many ways to prevent bot attacks. One way is to use WAF.

Check with some web application firewall to prevent bad bot attacks.

Regards,

Aravinth

Community

Company

Feedback

Ask Questions and Find Answers

Important:

Ask is now read-only. You can review any existing questions and answers, but not add anything new.

But - don't panic! While ask is no more, we've replaced it with discuss - the new Liferay Discussion Forum! Read more here here or just visit the site here:

discuss.liferay.com

RE: What's the proper robots.txt configuration for disallowing access to bots?