Updating Document Library Hooks

A common problem that comes up is the need to change how documents are stored in the Document Library.

For example, you may start out storing documents using Jackrabbit hooked to a database. However, as time goes on and you find yourself using more Liferay deployments, the number of database connections reserved for Jackrabbit alone gets dangerously close to the maximum number of database connections supported by your database vender.

So, you want to switch from using JCRHook over to using the FileSystemHook to store documents on a SAN.

If your migration uses two different hooks and you have access to the portal class loader (for example, you're running in the EXT environment, or you have the ability to update the context class loader for your web application like in Tomcat 5.5), the solution is straightforward.

InputStream exportStream =
	sourceHook.getFileAsStream(
		companyId, folderId, fileName, versionNumber);

targetHook.updateFile(
	companyId, portletId, groupId, folderId, fileName,
	versionNumber, fileName, fileEntryId, properties, modifiedDate,
	tagsCategories, tagsEntries, exportStream);
 

In summary, you instantiate the hook objects, pull data from your source hook and push it to your target hook. Then, update your configurations and restart your server to use the new document library hook. (In case you're not sure how to change document library hooks, see the comments for the dl.hook.impl property in portal.properties.)

If you'd like to see how this is accomplished via code samples rather than via textual explanations, these document links might help (just to note, the sample plugins SDK portlets leverage the portal class loader in order to access the classes found in portal-impl.jar, and so you must use Tomcat 5.5 or another servlet container that supports a similar mechanism in order to use them):

However, it's not always so easy. Another common problem is where you start out using Jackrabbit hooked up to a file system (perhaps in the earlier iterations of Liferay where JCRHook was the default), but you want to move to a clustered environment and you do not have access to a SAN.

Therefore, you want to migrate over to using Jackrabbit hooked to a database.

This is different from the previous problem in that you're using the same exact hook in both cases, and the way Liferay handles Jackrabbit inside of this hook makes it so that you can only have one active Jackrabbit workspace configuration (specified in your portal.properties file), and so simply instantiating two different hooks is not possible.

The solution here is to run the migration twice.

First, with the original JCRHook configuration, export the data to an intermediate hook (for example, the Liferay FileSystemHook) and shut down your Liferay instance. Then, you update portal.properties and/or repository.xml to reflect the desired JCRHook configuration, and import from your intermediate hook back to JCRHook.

If you do not have access to the portal class loader, the migration is less straightforward, because you won't have access to the hook classes themselves.

InputStream exportStream =
	DLLocalServiceUtil.getFileAsStream(
		companyId, folderId, fileName, versionNumber);

FileUtil.write(intermediateFileLocation, exportStream);

InputStream importStream =
	new FileInputStream(intermediateFileLocation);

DLLocalServiceUtil.updateFile(
	companyId, portletId, groupId, folderId, fileName,
	versionNumber, fileName, fileEntryId, properties, modifiedDate,
	tagsCategories, tagsEntries, importStream);
 

In summary, you'll have to read the documents using DLLocalServiceUtil with the original configuration and save them to disk in a way where you can parse all the data needed to call updateFile (perhaps in the same way that mirrors the way FileSystemHook works). Then, you re-import those exported documents using DLLocalServiceUtil after updating your configuration and restarting the server.

Blogs
Very interesting! So using these hooks, would it be possible to substitute Jackrabbit for Alfresco transparently, so that the DL portlet continues to just work as is?
That is correct, assuming you've created (or are in the process of creating) a document library Hook which interacts with Alfresco.

I don't believe we officially support this level of integration out of the box, since we're still waiting on the CMIS specification to finalize.
Hi Minhchau, nice work. It would be nice that a document library Hook interacting with Alfresco could be supported in nature. Thus the integration of Liferay and Alfresco would be simple. By the way, when does the CMIS specification get finalized? Thanks.
Minhchau, I've modified your sample for 5.1.1, in addition to some simple 'down and dirty' multi-threaded modifications. (I've got a lot of documents to migrate). I'd be happy to share, any way I can upload or get them to you?
Of course! That's the main benefit of sharing things via open source, right?

I've added a new share folder to my documents page which you should be able to upload to (I've given your user the ability to add and update documents).
So here's my variant, which has been modified to work with 5.1.1 and uses a simple threaded model. I wouldn't recommend to anyone to manage their own threads this way normally. But for something that's most likely going to be run in a controlled environment, should be fine. I'm assuming one knows how to hookup an Action to the main servlet. The Code

On my HW, the original version was migrating about 10 files / minute (most of that time pulling files out of the DB via JCR). My threaded version running at 30 threads, is migrating approx 50 files / minute on the same hardware. Enjoy!
What about the attachments on the wiki pages? Is it possible to do something similar? any ideas on where to start?

Thanks!
By default, wiki page attachments and forum message attachments are stored in the document library in a hidden system folder.

Therefore, as a side-effect (or as an added bonus, depending on how you view it), the process described in this blog entry will also change how both wiki attachments and forum message attachments are stored by Liferay.
I'm testing the tool to migrate the documents from JCR (database persistence) to the file system. The Document Library documents were updated fine, but unfortunately the wiki attachments were not. After the upgrade the wiki page I was using for tests has 0 attachments, originally had 2. I'm guessing I would have to add the hidden folder to code migrating the files. Do you know how can I identify that folder and the other question is if there is a hidden folder for every wiki/company?

Thanks!
There is a folder in every company that has an id of zero, and that's the 'system' folder for that company. However, now that I think about it, it's a virtual folder, so there probably isn't a formal entry in the DLFolder table, and it probably wouldn't be returned by DLFolderLocalServiceUtil. That was an oversight on my part.

So to migrate it, call DLFileEntryLocalServiceUtil.getFileEntries(0), which should return all the system folder entries across all companies. If it works, the DLFileEntry objects will contain the appropriate companyId, which should be the only other piece of information you need to do provide to the hook.

I'll try this myself, and if it works, I'll update the linked samples. =)
Sounds great! I think I'll wait, you will have it a lot faster than me. Thanks again.
DLFileEntry objects aren't created for system folder entries either, so that strategy didn't work. There may be a cleaner way to get a list of the files in the hidden system folder, but I haven't come up with one yet.

I've updated the sample code with an example of how wiki attachments and message board posts can be migrated using the getAttachmentsFiles() methods of the wiki pages and the message board post objects, but anything else that's put in the system won't be migrated unless you explicitly iterate over them.
Hi Minhchau, good job. There is an error of this portlet in EE 5.1.6. Thanks.

Exception in thread "Thread-86" java.lang.NoSuchMethodError: com.liferay.documentlibrary.util.Hook.updateFile(JLjava/lang/String;JJLjava/lang/String;DLjava/lang/String;JLjava/lang/String;Ljava/util/Date;[Ljava/lang/String;[Ljava/lang/String;Ljava/io/InputStream;)V
The sample portlet was written against 5.2.x, so if you wish to use it against 5.1.x (like Jim Klo), you'll need to do some rewriting to use the correct Hook API. I'll go ahead and clarify the original blog post.
Hi Minhchau, Thank you. Could you please provide the sample portlet against 5.1.x? This would be useful ...
The following hooks are stayed in portal-impl of versions 5.1 and 5.2;
com.liferay.documentlibrary.util.FileSystemHook;
com.liferay.documentlibrary.util.Hook;
com.liferay.documentlibrary.util.JCRHook;

Thus the sample portlet you provided in Plugins SDK has to refer to portal-impl.jar. How to configure it in plugins SDK?

In 5.3, com.liferay.documentlibrary.util.Hook has been moved to portal-service. Is it a good idea to move FileSystemHook and JCRHook to portal-service?

Thanks

Jonas Yuan
Hi Dang, it is working fine in 5.2.*: JCR<->File; but it failed File->S3 as:

21:52:57,421 WARN [RestS3Service:317] Response '/,/' - Unexpected response code
403, expected 200
21:52:57,484 WARN [RestS3Service:317] Response '/,/' - Unexpected response code
403, expected 200
21:52:57,484 WARN [RestS3Service:324] Response '/,/' - Received error response
with XML message
21:52:57,500 ERROR [S3Hook:80] S3 PUT failed for '/,/'
21:52:57,609 ERROR [DocumentLibraryMigrator:340] migrate of DLFE-1.zip failed
From the log4j error message, it looks like it's failing to properly construct the S3Hook (since line 80 is in the middle of the S3Hook constructor).

I don't personally have an Amazon WS account to test, but to verify, have you set the portal.properties corresponding to S3Hook? According to the source, you should have values for dl.hook.s3.access.key, dl.hook.s3.secret.key, and dl.hook.s3.bucket.name.

Or were you just trying it out because it was an available drop-down option?
Hello Minhchau I hope you are doing fine,

I deployed sample 5.1.x on my local env and I'm constantly getting a ClassCastException when trying to migrate content.

I have running 5.1.5 version on a tomcat 6 bundle.

This is the test case:

Clean Install of environment (Fresh DB (Mysql) and Server Instance v. 5.1.5 )
I have JCRHook as default.
I have added some documents to Document Library and Image Gallery
Browsed Document Library Portlet to see documents and it worked.
Created a Journal Article that links to these documents. (It Works).

Uploaded the Migration Portlet
Selected source Hook: JCRHook
Selected target Hook: FileSystemHook

run...

... Errors (ClassCastExceptions) All Docs failed.

20:13:09,712 ERROR [DocumentLibraryMigrator:339] migrate of DLFE-1.pdf failed
java.lang.ClassCastException:

Any thoughts on this I really appreciate it.

Thanks! great post btw emoticon
Hi Hugo,

I have deployed the Migration portlet in 5.2.3, when i am trying to migrate the document from Filesystem to JCR tables, it gives the same error as you specified above,

[#|2009-12-02T00:13:25.986-0500|INFO|sun-appserver2.1|javax.enterprise.system.stream.out|_ThreadID=23;_ThreadName=Thread-13843;|00:13:25,986 ERROR [DocumentLibraryMigrator:340] migrate of DLFE-203.jpg failed
java.lang.ClassCastException:

Any help on this ?

Thanks
Hi Hugo,
I am also getting this same exception
'java.lang.ClassCastException: com.liferay.portal.jcr.jackrabbit.JCRFactoryImpl'

I am using version 5.2.2 on tomcat 6 bundle.

were you able to resolve this problem?

Thanks!
FYI, I just committed this migration code to trunk.

http://issues.liferay.com/browse/LPS-6534