<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>What's the proper robots.txt configuration for disallowing access to bots?</title>
  <link rel="self" href="https://liferay.dev/c/message_boards/find_thread?p_l_id=119785294&amp;threadId=122152235" />
  <subtitle>What's the proper robots.txt configuration for disallowing access to bots?</subtitle>
  <id>https://liferay.dev/c/message_boards/find_thread?p_l_id=119785294&amp;threadId=122152235</id>
  <updated>2026-04-05T12:05:32Z</updated>
  <dc:date>2026-04-05T12:05:32Z</dc:date>
  <entry>
    <title>RE: What's the proper robots.txt configuration for disallowing access to bots?</title>
    <link rel="alternate" href="https://liferay.dev/c/message_boards/find_message?p_l_id=119785294&amp;messageId=122210321" />
    <author>
      <name>Aravinth Kumar</name>
    </author>
    <id>https://liferay.dev/c/message_boards/find_message?p_l_id=119785294&amp;messageId=122210321</id>
    <updated>2023-10-31T09:28:18Z</updated>
    <published>2023-10-31T09:19:05Z</published>
    <summary type="html">&lt;p&gt;Hi Antonis, &lt;/p&gt;
&lt;p&gt;There are many ways to prevent bot attacks. One way is to use WAF.&lt;/p&gt;
&lt;p&gt;Check with some web application firewall to prevent bad bot attacks. &lt;/p&gt;
&lt;p&gt;Regards,&lt;/p&gt;
&lt;p&gt;Aravinth&lt;/p&gt;</summary>
    <dc:creator>Aravinth Kumar</dc:creator>
    <dc:date>2023-10-31T09:19:05Z</dc:date>
  </entry>
  <entry>
    <title>RE: What's the proper robots.txt configuration for disallowing access to bots?</title>
    <link rel="alternate" href="https://liferay.dev/c/message_boards/find_message?p_l_id=119785294&amp;messageId=122175401" />
    <author>
      <name>Olaf Kock</name>
    </author>
    <id>https://liferay.dev/c/message_boards/find_message?p_l_id=119785294&amp;messageId=122175401</id>
    <updated>2023-10-26T10:24:58Z</updated>
    <published>2023-10-26T10:24:55Z</published>
    <summary type="html">&lt;p&gt;robots.txt needs to be served from the root directory of your server
  - e.g. example.com/robots.txt - in case you're configuring this in a
  secondary site, without declaring a virtual host, &lt;em&gt;this
  particular&lt;/em&gt;​​​​​​​ robots.txt might appear under
  example.com/web/sitename/robots.txt - you might want to edit the
  robots.txt of your default site (typically /web/guest), as that's what
  appears in the root.&lt;/p&gt;
&lt;p&gt;Also note that robots.txt is a &amp;quot;recommendation&amp;quot;, that
  robots &lt;em&gt;typically&lt;/em&gt; honor, but there are also rogue robots that
  don't care about your recommendation.&lt;/p&gt;</summary>
    <dc:creator>Olaf Kock</dc:creator>
    <dc:date>2023-10-26T10:24:55Z</dc:date>
  </entry>
  <entry>
    <title>What's the proper robots.txt configuration for disallowing access to bots?</title>
    <link rel="alternate" href="https://liferay.dev/c/message_boards/find_message?p_l_id=119785294&amp;messageId=122152234" />
    <author>
      <name>Antonio Papadakis - Pesaresi</name>
    </author>
    <id>https://liferay.dev/c/message_boards/find_message?p_l_id=119785294&amp;messageId=122152234</id>
    <updated>2023-10-23T23:04:52Z</updated>
    <published>2023-10-23T11:32:44Z</published>
    <summary type="html">&lt;p&gt;Dear Support Team,&lt;/p&gt;
&lt;p&gt;We are facing an issue with suspicious traffic to the website which
  seems to be originating from various msn/bing bots trying to index
  various parts/subpages of the website.&lt;/p&gt;
&lt;p&gt;I've updated the robots.txt configuration of the Public Pages of the
  site, to the following rules:&lt;/p&gt;
&lt;p&gt;**&lt;/p&gt;
&lt;p&gt;User-Agent: *&lt;br&gt; Disallow:&lt;br&gt; User-agent: bingbot&lt;br&gt; Disallow:
  /&lt;br&gt; User-agent: msnbot&lt;br&gt; Disallow: /&lt;br&gt; Sitemap: [$PROTOCOL$]://[$HOST$]:[$PORT$]/sitemap.xml&lt;/p&gt;
&lt;p&gt;**&lt;/p&gt;
&lt;p&gt;This would disallow accessing the site pages from Agent that contain
  'bingbot' or 'msnbot' in the String(s). &lt;/p&gt;
&lt;p&gt;Since this doesn't seem to have stopped the bots from crawling the
  website, do I need to add anything else to these rules or somehow
  add/re-apply anything else?&lt;/p&gt;
&lt;p&gt;Kind Regards,&lt;/p&gt;
&lt;p&gt;Antonis&lt;/p&gt;</summary>
    <dc:creator>Antonio Papadakis - Pesaresi</dc:creator>
    <dc:date>2023-10-23T11:32:44Z</dc:date>
  </entry>
</feed>
