Message Boards
How to Index External Sites for Use in Liferay Searches
Mik Cantrell, modified 6 Years ago.
How to Index External Sites for Use in Liferay Searches
New Member Posts: 4 Join Date: 4/9/09 Recent Posts
I have a project that I would love to use Liferay for. Here are the requirements:
1 - Must be able to store, index and search documents
2 - Must be able to programmatically load files to be indexed and searched
3 - Must be able to index and search external sites
I think Liferay does the first one pretty well right out of the box and I've seen documentation about the 2nd one being possible. Now, the part that is missing or I'm missing is the 3rd one. Is there some way to search external web sites that have been indexed/crawled by 3rd party solutions such as manifold/lucene/solr or is there already something in Liferay for this that I'm not aware of? If so, I would greatly appreciate any guidance you could point me too.
Thanks,
Michael
1 - Must be able to store, index and search documents
2 - Must be able to programmatically load files to be indexed and searched
3 - Must be able to index and search external sites
I think Liferay does the first one pretty well right out of the box and I've seen documentation about the 2nd one being possible. Now, the part that is missing or I'm missing is the 3rd one. Is there some way to search external web sites that have been indexed/crawled by 3rd party solutions such as manifold/lucene/solr or is there already something in Liferay for this that I'm not aware of? If so, I would greatly appreciate any guidance you could point me too.
Thanks,
Michael
Jorge Díaz, modified 6 Years ago.
RE: How to Index External Sites for Use in Liferay Searches
Liferay Master Posts: 753 Join Date: 1/9/14 Recent Posts
Hi Mik,
You have to implement it, some ideas:
See https://web.liferay.com/es/community/forums/-/message_boards/message/87242969
Another idea could be creating a new ServiceBuilder entity storing the external url to a page. (Similar to bookmark entity)
That entity will have a Indexer that will retrieve the external url and will send to index.
As a final step you also have to integrate a crawler. It will retrieve all pages of the site and it will create inside Liferay using the new ServiceBuilder entity.
You have to implement it, some ideas:
See https://web.liferay.com/es/community/forums/-/message_boards/message/87242969
Another idea could be creating a new ServiceBuilder entity storing the external url to a page. (Similar to bookmark entity)
That entity will have a Indexer that will retrieve the external url and will send to index.
As a final step you also have to integrate a crawler. It will retrieve all pages of the site and it will create inside Liferay using the new ServiceBuilder entity.