This website uses cookies to ensure you get the best experience. Learn More.
Creating your own analytics platform within Liferay: A distributed commit log
On last entry I made a quick overview over the proposed solution for the "problem" of building an analytics platform within the Liferay platform. Along this entry I will go deeper into the log data structure, I will present the Apache Kakfa project and we will analyse how we can connect Liferay and Kafka each other.
As a quickly reminder, I previously said that a log data structure is a perfectly fit when you have a data worflow problem.
A log is a very simple data structure (possibly one of the simplest one). It is just an ordered and append only sequence of records. For those of you who are familiar with database internals, log data structures have been widely used to implement the ACID support in relational databases and its usages have evolved over time and now it used to implement replication among databases (you can take a look to many of the implementations available out there).
Ordering and data distribution are even more important when we move into the distributed systems world; you can take a look to protocols like ZAB (protocol used by Zookeeper), RAFT (consensus algorithm that is designed to be easy to understand) or Viewstamped Replication. Sadly, distributed systems theory is beyond the scope of this blog post :)
Let's move into some more practical details and let's see how we can model all the different streams of information we already have
It is not the goal of the post to cover Kafka's internals so, in case you are interested, a good documentation is available on their web page.
While building the first prototype of the communication channel between both systems I had a few goals in mind:
I've built a small OSGi plugin which allows to bridge our Liferay portal installation with a Kafka broker through the Message BUS API. A general overview of how this integration works is shown in the next figure
The data flow depicted in the previous picture is extremely simple:
You can find all the source code of the Kafka bridge at my Gitbhub repo.
Let's write a small example where we publish all the blog ratings into our Kafka broker