Getting Started with Building Liferay from Source

When a new intern onboards in the LAX office, their first step is to build Liferay from source. As a side-effect of that, usually the person that handles the intern onboarding will see that the reported ant all time is in hours rather than in minutes, and then start asking people, "Is it normal for it take this long to build Liferay for the first time?"

It happens frequently enough that sometimes my hyperactive imagination supposes that a UI intern's entire first day must be spent cloning the Liferay GitHub repository and building it from source, and then going home.

In hindsight, this must be what the experience is like for a developer in the community looking to contribute back to Liferay as well. Though, rather than go home in a literal sense, they might stare at the source code they've downloaded and built (assuming they got that far) and think, "Wow, if it took this long just to get started, it must be really terrible to try to do more than that," and choose to go home in a less literal way, but with more dramatic flair.

One of the many community-related discussions we have been having internally is how we can make things better, both internally and externally, when it comes to working with Liferay at the source code level. Questions like, "How do we make it easier to compile Liferay?" or "How do we make it easier to debug Liferay?" After all, just how open source are you if it's an uphill battle to compile from source in order find out if things are already fixed in branch?

We don't have great answers to these problems yet, but I believe that we can, at a minimum, provide a little more transparency about what we are trying internally to make things better for ourselves. Sharing that information might give all of us a better path forward, because if nothing else, it lets us ask important questions about the pain points rather than bikeshed color ones.

Step 1: Upgrade Your Build System

Let's say I were to create a survey asking the question, "Which of the following numbers best describes your build time on master in minutes (please round up)?" and gave people a list of options, ranging from 5 minutes and going all the way up to two hours.

This question makes the unstated assumption that you are able to successfully build master from source, because none of the options is, "I can't build master from source." Granted, it may seem strange that I call that an "assumption", because why would you not be able to build an open source product from source?

Trick question.

If you've seen me at past Liferay North America Symposiums, and if you were really knowledgeable about Dell computers in the way that many people are knowledgeable about shoes or cars, you'd know that I've been sporting a Dell Latitude E6510 for a very long time.

It's a nice machine, sporting a mighty 8 GB of RAM. Since memory is one of the more common bottlenecks, this made it at least on-par with some of the machines I saw developers using when I visited Liferay clients as a consultant. However, to be completely honest, a machine with those specifications has no hope of building the current master branch of Liferay without intimate knowledge of Liferay build process internals. Whenever I attempted to build master from source without customizing the build process, my computer was guaranteed to spontaneously reboot itself in the middle.

So why was this not really a problem for other Liferay developers?

Liferay has a policy of asking its developers to accept upgrades to their hardware once every two to three years. The idea is that if new hardware increases your productivity, it's such a low cost investment that it's always worthwhile to make. A handful of people resist upgrades (inertia, emotional attachment to Home and End keys, etc.), but since almost everyone chooses to upgrade, Liferay has an ivory tower problem, where much of Liferay has no idea what it's like to even start up Liferay on an older machine, not to even discuss what it's like to compile Liferay on those older machines.

  • Liferay tries to do parallel builds, which consumes a lot of memory. To successfully build Liferay from source, a dedicated build system needs 8 GB of memory, while a developer machine with an IDE running needs at least 16 GB of memory.
  • Liferay writes an additional X GB every time you build, a lot of it being just copies of JARs and node_modules folders. While it will succeed on platter drives, if you care about build time, you'll want Liferay source code to live on a solid-state drive to handle the mass file creation.

So eventually, I ran into a separate problem which required a computer upgrade, I needed to run a virtual machine that itself wanted 4 GB, and so that combined with running Liferay alongside an IDE meant my machine wasn't up to handling my task. After upgrading, the experience of building Liferay is substantially different from how it used to be. While I have other problems like an oversensitive mousepad, building Liferay is no longer something that made me wonder what else could possibly go wrong.

If you weren't planning on upgrading your computer in the near future, it doesn't make sense to upgrade your computer just to build Liferay. Instead, consider spinning up a virtual machine in a cloud computing environment that has the minimum requirements, such as something in your company's internal cloud infrastructure, or even a spot instance on Amazon EC2. Then you can use those servers to perform the build and you can download the result to your local computer.

Step 2: Clone Central Repository

So let's assume you've got a computer or virtual machine satisfying the requirements listed above. The next step is to get the source code so you can use this machine to build Liferay from source. This is the command you would use to do it:

git clone git@github.com:liferay/liferay-portal.git

However, the first step that interns get hung up on is waiting for this clone to complete. If you've ever tried to do that, you'll find that Liferay has violated one of the best practices of version control and we've committed a large folder full of binary files, .gradle. As a result of having this massive folder, GitHub sends us angry emails and, of course, cloning our repository takes hours.

How does Liferay make this better internally? Well, in the LAX office, the usual answer is to plug in the ethernet cable. Liferay invested heavily in fast internet, and so simply plugging in the ethernet cable makes the multi-hour process finish in 30 minutes.

However, it turns out that there is actually a better answer, even in the LAX office. Each office has a mirror that holds archives of various GitHub repositories, including liferay/liferay-portal. We suspect the original being mirrored is maintained by Quality Assurance, because we have heard that keeping all of our thousands of automated testing servers in sync used to result in angry emails from GitHub. Since it's an internal mirror, this means that downloading X GB and unzipping it takes a few minutes, even over WiFi, and it's on the order of seconds if you plug in your ethernet cable.

So, in order to improve our internal processes, we've been trying to get the people who manage our new hires and new interns to recognize that such a mirror exists and to use it during their onboarding process to save a lot of time for new hires on their first day.

So what does this mean for you?

Essentially, if you plan to clone the code directly onto your computer for simplicity, you'll need to make sure that it's during a time where you won't shut down the computer for a few hours and when you don't need it for anything (maybe run it overnight), because it's a time-consuming process.

Alternately, have a remote server perform the clone, and then download an archive of the .git folder to your local computer, similar to what Liferay is trying to do internally. This will free up your machine to do useful things, and even spinning up Amazon EC2 spot instances (like an m1.small) and bringing things down with either SCP or an S3 bucket as an intermediate point may be beneficial.

Step 3: Clone the Binaries Cache

We mentioned the very large .gradle folder, but something else we noticed over time is that both master and 7.0.x share a lot of libraries, and they were constantly getting rewritten as you switched between branches. So, to make this situation slightly more tolerable, what we've done is we've created a separate repository just for binaries. When building Liferay, you will also want a copy of this binaries cache. For convenience, make it a sibling folder of the portal source folder you cloned in the previous step.

git clone git@github.com:liferay/liferay-binaries-cache-2017.git

Step 4: Build Central Repository

The next step is your first build from source. This is done with a single command that theoretically handles everything. However, before you run this single command, you might need to do things to reduce the number of resources it consumes.

  • Liferay issues a lot of requests to the NPM registry in parallel builds. You can cap this by checking build.properties for nodejs.npm.args, and taking the commented out line and adding it to your own build.USERNAME.properties.
  • Liferay includes a lot of extra things most people never need. You can remove these by checking build.properties for build.include.dirs and using its commented out value in your build.USERNAME.properties, or adjusting it for your needs if you want more than what it tries by default.
  • If you're on Windows, disable Windows Defender (or at least disable it on specific folders or drives). The ongoing scan drastically slows down Liferay builds.

After you've thought through all of the above, you're almost ready for the command itself. When Liferay introduced the liferay-binaries-cache-2017, it also introduced another way for the build to fail: Liferay's build process tries to auto-update this cache at build time, but since it's constantly synchronizing this folder, you might actually run into a situation where the cache cannot update because it already contains the files it wants to add! So you'll need to add extra commands to clean up the folder before you build.

cd liferay-binaries-cache-2017
git clean -xdf
git reset --hard
cd ../liferay-portal

At this point, you are now ready for the command itself, which requires that you download and install Apache Ant. After knowing that this is what I'm asking you to download, you might also realize that this means that the entry point for everything is build.xml.

ant all

So now you've built the latest Liferay source code, right?

Another trick question!

What's in the master branch of liferay-portal is actually not the latest code. Liferay has started moving things into subrepositories, which you can see from the hundreds of strangely named repositories that have popped up under the Liferay GitHub account.

However, a lot of these repositories are just placeholders. These placeholders are in what's called "push" mode, where code from the liferay-portal repository is pushed to the subrepository. However, a handful of them (five at the time of this writing) are actually active where they're in what's called "pull" mode, where code is pulled from the subrepository into the liferay-portal repository on-demand. You know the difference by looking at the .gitrepo file in each subrepository and checking the line describing the mode.

However, because all of those files are actually also on the central repository, after you've cloned the liferay-portal repository, you can use the files there to find out which subrepositories are active with git, grep, and xargs magic run from the root of the repository.

git ls-files modules | grep -F .gitrepo | xargs grep -Fl 'mode = pull' | xargs grep -h 'remote = ' | cut -d'=' -f 2

I will dive into more detail on the subrepositories in a later entry when we talk about submitting fixes, but for now, they're not relevant to getting off the ground running other than an awareness that they exist, and an awareness that additional wrinkles exist in the fix submission process as a side-effect of their existence.

Step 5: Choose an IDE

At this point, you've built Liferay, and the next thing you might want to do is point an IDE with a debugger to the artifact you've built, so that you can see what it's doing after you start it up. However, if you point an IDE to the Liferay source code in order to load the source files, whether it's Netbeans, Eclipse, or IntelliJ, you'll notice that while Liferay has a lot of default configurations populated, but these default files are missing about 90% of Liferay's source folders.

If you're using Netbeans, given the people who have forked the Liferay Source Netbeans Project Builder overlap exactly with the team I know for sure uses Netbeans, this tool will help you hit the ground running. Since there are recent commits to the repository, I have confidence that the Netbeans users team actively maintains it, though I can't say with equal confidence how they'll react to the news that I'm telling other people about it.

If you're using Eclipse or Liferay IDE, then Jorge Diaz has you covered with his generate-modules-classpath script, which he has blogged about in the past, and his blog post explains its capabilities much clearly than I would be able to in a mini-section at the end of this getting started guide.

If you're using IntelliJ IDEA Ultimate, you can take advantage of the liferay-intellij project and leave any suggestions. It was originally written as a streams tutorial for Java 7 developers rather than as a tool, and I still try to keep it as a streams tutorial even as I make improvements to it, but I'm open to any improvement ideas that make people's lives easier in interacting with Liferay core code.

Step 6: Bend Liferay to Your Will

So now that everything is setup for your first build, and you're able to at least attach a debugger to Liferay, the next thing is to explain what you can do with this newly-discovered power.

However, that's going to require walking through a non-toy example for it to make sense, so I'll do that in my next post so that this one can stay as a "Getting Started" guide.

Blogs
Thanks for introduction. I am waiting for next blog of the series.
Should I start clone the .gradle from https://github.com/liferay/liferay-portal/tree/master/.gradle/caches/modules-2/files-2.1 ?
Unlike SVN, you won't be able to grab specific folders with Git, so you'll be cloning the root of the repository. I believe SSH is faster than HTTPS, so unless you have a company firewall that only allows HTTPS, you should use SSH.

git clone git@github.com:liferay/liferay-portal.git
Can I contribute to one module? For example, I would like to contribute to the *com-liferay-portal-remote module*, in this case should I fork this module and then pull-request or make the fork of liferay-portal?
Liferay source size in repo is growing day by day. So when we clone it fetches all its history due to which it clone a big size repo. Is there a way we can clone only with latest history.It would be easier to contribute for community members. Waiting for such an blog that solves this concern.
Thanks
Well, "git clone" allows for a "depth" parameter that lets you specify how many commits you want to retain, and a "shallow-since" that lets you specify a time range.

https://git-scm.com/docs/git-clone

You would also want to exclude tags if you want a small repository (not sure if that's implicit when it's a single branch clone; the documentation makes it sound like it needs to be explicit).

git clone --depth=1 --branch master --no-tags git@github.com:liferay/liferay-portal.git

I tested from home, and the clone takes about 6 minutes and the resulting repository uses 4 GB, with 2 GB inside of the .git folder.

However, a shallow clone means that you have to use the GitHub UI in order to navigate history, because your local copy has no history at all. So, while the resulting file size is smaller and it does take less time to clone, the tedium of not being able to use local tooling makes it less desirable.

But maybe that's just my biased opinion, since my day to day work requires navigating the repository history.
Thanks
it's a great help.
After liferay source build is complete. if it is needed to build a particular module only let say shopping module. Is it feasible?

Reragds
Yes. If you've built the whole thing from source, you can deploy individual modules after that. The second blog in this series has an example where it redeploys the dynamic-data-mapping-data-provider-instance module.

https://web.liferay.com/web/minhchau.dang/blog/-/blogs/troubleshooting-liferay-from-source#step-6-test-your-changes-part-1

What exactly are the prerequisites on the development machine?

If you want to build Liferay from source after cloning the Git repository, you will want 16+ GB RAM and 200+ GB of hard disk space.

Sorry for being too vague. I'd rather know which software/versions need to be installed to make a successful build.

The CONTRIBUTING guide is a little outdated in terms of hardware (it states that you can build Liferay with 8 GB of RAM, which is no longer true), but the software is listed in the System Requirements section is still true.

 

https://github.com/liferay/liferay-portal/blob/master/CONTRIBUTING.markdown

 

As for versions, it's odd that we didn't list it, but you need Ant 1.9.x (there are some cryptic errors when you use Ant 1.10.x) and the latest JDK 1.8 (I don't remember exactly when it was fixed, but there was a JDK bug that prevented Liferay from building). The build process will invoke the Gradle wrapper to download the version of Gradle it needs.

 

For environment variables, you need your ANT_OPTS set to allow at least 4 GB of memory to be used (-Xmx4g -Xms4g). If you have a slower machine, you'll also want to set ANT_OPTS to limit the number of Gradle workers (-Dorg.gradle.workers.max=1).

Any idea how to compile the Liferay NetBeans Source project. I have downloaded the source from liferays website. Tried opening it in netbeans project but no clue how to compile it. Link for the downloads:

https://sourceforge.net/projects/lportal/files/Liferay%20Portal/7.1.1%20GA2/liferay-ce-portal-src-7.1.1-ga2-20181101125651026.zip/download