The reason we moved to 7zip bundles

As some of you may have already discovered, 7.1 GA2 was released as a 7zip bundle instead of the typical zip bundle. This probably caused a ton of issues. Even our own Dev Tools are not yet equipped to handle 7z files since all the events took up our time.

I will provide you with the reasons why we had to make this move and hopefully, everyone will come to the conclusion that this was the best decision, albeit, the communication could have been handled significantly better.

The original goal that led to providing 7z bundles was to improve startup times. We discovered that if we prepopulate the OSGi state, we were able to significantly reduce startup times by 2-3 times. As we began our testing, our zip bundles were not preserving timestamps correctly. They were rounding our timestamps to the nearest seconds which invalidated our OSGi state. We also found that our bundles had grown to 1.2 gigabytes!

This improvement imposed 2 requirements:
  • maintain the original timestamp
  • significantly increase the number of duplicate files.

We began to look for solutions. Naturally, tar.gz was the first solution that came to mind. It would easily preserve the timestamps but it did not solve the file size issue. While some people may find a large download acceptable, we did not believe that it would be appropriate for some of our use cases. As a result, someone suggested that we investigate 7zip because 7zip will actually detect for duplicate files and treat them as a single file during compression. This significantly brought down the file size from 1.2 gigabytes to 400 megabytes. It was the perfect solution for us. So this is why we have ultimately decided to use 7zip instead of zips. 

Since our initial development, we have also fixed the duplicate file issue. This means that tar.gz is also viable as a solution (though the bundles are slightly larger at 600 megabytes). From now on, we will be providing 7zip bundles and also tar.gz bundles. Internally we will be using 7zip because ultimately that 200-megabyte difference is still too significant for our use cases, but for everyone else, you guys can decide what works best for you.  

Blogs

Sounds great as far as Ansible's unarchive module can handle that (haven't tested it yet). Otherwise, we have a problem :)

By docs 7z isn't supported https://docs.ansible.com/ansible/latest/modules/unarchive_module.html. This module uses unzip for zip files and it shouldn't be hard to extend it to 7z. Seems like you're up to some python development and a pull request for ansible :) 

Yep. It is a problem. If you need I've got already some hacky Ansible tasks installing p7zip and run it in case of detecting "7z" archive extension ;)

In recent Linux distributions (less than 2 years old) the tar command supports xz compression, is LZMA like  7zip.

I have re-compressed  liferay-ce-portal-tomcat-7.1.1-ga2-20181105121645556.7z bundle  using tar+xz (tar --xz cf liferay-ce-portal-tomcat-7.1.1-ga2.tar.xz liferay-ce-portal-7.1.1-ga2).

 

7z is 426 MiB, tar.xz is 453 MiB.

 

Ansible unarchive supports .tar.xz

In fact if you use maximum compression you can go down to 446M with tar.xz  That said, apparently 40Mb difference is not worth  the switch from `tar.gz` to `tar.xz`  And for Windows users 7zip is apparently the preferred solution

Just checked and "tar.gz" iz a way to go for Ansible users. File is ~60MB bigger than "7z" but works seamlessly with "unarchive".

Kudos for decreasing startup time so drastically, that's awesome! But I don't see the problem with the bundle file size.

- I've downloaded the 7.1.1 7Z file. It is 446MB.

- Expanded (before first startup), the 7.1.1 is about 641MB. So not the 1.2GB that is mentioned in the blog post.

- If I then use normal ZIP to compress it again, the file size is 518MB. This is only 72MB more than the 7Z variant.

Am I missing something here?

The main issue is preserving the timestamps which `zip` apparently does not do properly. The file size is not that much of an issue anymore as we managed to get rid of some of the duplicated files that initially caused the mentioned 1.2GB size.