Avoid crawl of web pages by search engine using IP and Domain name.

Amit Sharma, modified 6 Years ago. Junior Member Posts: 35 Join Date: 10/17/18 Recent Posts
Hi,
I am using liferay CE 7.1.2 GA3 edition.
I have to avoid website pages to be index by any search engine.

I have setup virtual host with my domain name.

I have configured the Domain name in instance setting.

I also updated the content for robots.txt using
Build -> Pages ->Advance Setting ->Set the robots.txt for pages. as below
User-Agent: *
Disallow:/

when I access http://mydomain.com/robots.txt
I am able to see

User-Agent: *
Disallow:/

But when i access same file using ip
http://10.0.0.1/robots.txt
Content i can see are as bellow:

User-Agent: *
Disallow:

How can i setup liferay server so that pages should not be index by search engine.
Robots setting should works with domain name and IP right now liferay only supporting either way.

Thanks in advance
-Amit Sharma
thumbnail
Christoph Rabel, modified 6 Years ago. Liferay Legend Posts: 1555 Join Date: 9/24/09 Recent Posts
I usually do things like that on a reverse proxy in front of Liferay. It's one of the many perks of having a reverse proxy.

If you need to do this in Liferay, you have to write a filter that intercepts the requests and returns a robots.txt that fits your needs depending on the host header.
https://portal.liferay.dev/docs/7-1/tutorials/-/knowledge_base/t/servlet-filters
Amit Sharma, modified 6 Years ago. Junior Member Posts: 35 Join Date: 10/17/18 Recent Posts
Thanks This solved my problem.