Avoid crawl of web pages by search engine using IP and Domain name.

Amit Sharma, modified 6 Years ago.

Junior Member Posts: 35 Join Date: 10/17/18 Recent Posts

Hi,
I am using liferay CE 7.1.2 GA3 edition.
I have to avoid website pages to be index by any search engine.

I have setup virtual host with my domain name.

I have configured the Domain name in instance setting.

I also updated the content for robots.txt using
Build -> Pages ->Advance Setting ->Set the robots.txt for pages. as below
User-Agent: *
Disallow:/

when I access http://mydomain.com/robots.txt
I am able to see

User-Agent: *
Disallow:/

But when i access same file using ip
http://10.0.0.1/robots.txt
Content i can see are as bellow:

User-Agent: *
Disallow:

How can i setup liferay server so that pages should not be index by search engine.
Robots setting should works with domain name and IP right now liferay only supporting either way.

Thanks in advance
-Amit Sharma

Christoph Rabel, modified 6 Years ago.

RE: Avoid crawl of web pages by search engine using IP and Domain name. (Answer)

Liferay Legend Posts: 1555 Join Date: 9/24/09 Recent Posts

I usually do things like that on a reverse proxy in front of Liferay. It's one of the many perks of having a reverse proxy.

If you need to do this in Liferay, you have to write a filter that intercepts the requests and returns a robots.txt that fits your needs depending on the host header.
https://portal.liferay.dev/docs/7-1/tutorials/-/knowledge_base/t/servlet-filters

Amit Sharma, modified 6 Years ago.

RE: Avoid crawl of web pages by search engine using IP and Domain name.

Junior Member Posts: 35 Join Date: 10/17/18 Recent Posts

Thanks This solved my problem.

Community

Company

Feedback

Ask Questions and Find Answers

Important:

Ask is now read-only. You can review any existing questions and answers, but not add anything new.

But - don't panic! While ask is no more, we've replaced it with discuss - the new Liferay Discussion Forum! Read more here here or just visit the site here:

discuss.liferay.com

Avoid crawl of web pages by search engine using IP and Domain name.