Pre-Upgrade Scripting (6.2 > DXP) - Pt 2 - Document Version Cleanup

Do you have a lot of documents in your Liferay instance? It may be worth it to clean out old document versions... Let's take a look:

mysql> select count(*) from dlfileentry; Returns 12,300.
mysql> select count(*) from dlfileversion; Returns 13, 678.

Looks worth it to remove historic versions. I decided to explore one of our sites that has a lot of documents in their repository, using Mitesh Panchal's excellent script for targeting document version deletions by site and folderId. I added a preview mode option and some logging and exception catches and figured out that if you use "0" for the folderId, the script will crawl the entire site's set of folders. Awesome start.

Upon reviewing the results of this first site's log of traced transactions, I discovered some flawed data..:

Folder Name: Supplemental Course Links...
Deleting >>1.4 with Title=Testing Success and Remediation Plan.pdf
Deleting >>1.2 with Title=Testing Success and Remediation Plan.pdf
Deleting >>1.1 with Title=Testing Success and Remediation Plan.pdf
Folder Name: Professio...

It looks like version 1.4 would be deleted and 1.3 is being retained. Indeed, Liferay's database table dlfileentry version=  record is not the true *latest* version for the same file as listed in the dlfileversion table. Time to see how pervasive this problem is. After some inquiries, I learned there were more than 50 cases across a variety of sites where this problem scenario recurred. In this case we will use DLFileVersionLocalServiceUtil.getLatestFileVersion() to get the accurate latest version for each file and delete the rest.

For this use case, this variation on Mitesh's script works well for any given site and or specific folder. But we want to clean them all out. Here is a version that is reworked to accomplish that with a dynamic query to improve performance...:

But this script did not run, due to the errors being caught for every dlfileentry record that had *no* corresponding dlfileversion record. It turns out that after installing XTivia's "Documents and Media File System Checker" portlet that these same records were also showing up in that portlet's output as having no file existing in the file system. The entries are useless to end users and do not even list out in the UI. Makes sense, but does make this harder...

Enter a script to delete a targeted list of fileEntryIds that are the offending / missing documents. Here is a mysql statement that identifies file entries with no version records...:

select fileEntryId from dlfileentry where fileEntryId NOT IN (select dlfe.fileEntryId from dlfileentry dlfe, dlfileversion dlfv where dlfe.fileEntryId=dlfv.fileEntryId and dlfv.version=dlfe.version);

Here is a quick script to target-delete a list of specific fileEntryIds you want to remove from the system. Once this script is run, the full document version cleanup script will run.