Blogs
Liferay now has a batch mode to support batch inserts and updates.
Introduction
My friend and coworker Eric Chin recently posed a question on our internal slack channels asking if anyone had experience with Liferay's Batch mode and he was not finding enough supporting documentation on it.
Although I hadn't yet worked with Liferay Batch, I kind of took it as a challenge to not only give it a try, but more importantly to document it in a new blog post, so here we are...
Liferay introduced Batch support in 7.3 to add bulk data export and import into Liferay. Batch can be invoked through the Headless API, through Java code, and even by dropping a special zip file into the Liferay deploy folder.
Batch leverages the Headless code, so the entities that can be processed through Batch are those entities that you can access individually through the Headless APIs. Although this may seem like a shortcoming, it actually ensures that imported batch data will go through the same layers of business logic that the headless entities go through.
Before we can start using batch mode, we need to understand the supported data formats.
Batch Data Formats
Batch data, whether exported or imported, must be in a specifically supported data format: CSV, XLS/XLSX, or JSON/JSONL. The format for the export/import will be provided as an argument when invoking the Batch Engine.
When invoking the Batch Engine, you get to specify the columns in the export/import, so the contents are not static at all; required columns must be provided, but the rest are optional. Also the columns do not need to be in a specific order, they can be changed to whatever you specify.
CSV and XLS/XLSX
For both CSV and XLS/XLSX formats, you're basically getting a two-dimensional table of rows for each record and columns for each field.
Neither of these formats support a deep hierarchy of data. This
means they do not support handling extracting values from child
objects, so for example a StructuredContentFolder
has a
Creator
object as a member field, but the
Creator
or its own member fields cannot be exported as
part of that batch.
Maps are supported though, and this helps with localized values.
These are referenced by the key. The StructuredContent
object has a title field which is a map of language keys to values, so
you can reference title.en
as the column to access the
English version of the title.
JSON and JSONL
These two formats are more robust in that they support the hierarchical data for child objects, and they use the familiar JSON format.
JSONL is a specialized version of JSON that is specified by jsonlines.org and has the following constraints:
- Filename extension must be
.jsonl
- One valid JSON object per line (no pretty printed JSON allowed)
- File encoding must be UTF-8
For simple JSON, it must be a valid array of JSON objects, so the first and last characters are going to be the square brackets:
[ {...}, {...}, {...} ]
For JSON, the file can be pretty printed if you’d like, the “no pretty printing” restriction is only for the JSONL per the specification.
Batch Engine
The core of Batch is based around the Batch Engine (BE). The BE is an asynchronous system that handles two types of tasks, export tasks and import tasks.
For the export tasks, the
BatchEngineExportTaskLocalService
is used to add a new
BatchEngineExportTask. Once added, the
BatchEngineExportTaskExecutor
is used to process the
export task, and the exported data can be extracted from the task when
it completes.
For the import tasks, the
BatchEngineImportTaskLocalService
is used to add a new
BatchEngineImportTask. Once added, the
BatchEngineImportTaskExecutor
is used to process the
import task.
Let's see how to import and export some blog posts using the BE...
Importing BlogPostings
In Liferay, the Blogs portlet works with BlogEntry
entities, but the headless version of these is the BlogPosting.
Because Batch is based off of the Headless endpoints, we need to work
with the headless objects.
When using the BatchEngineImportTaskLocalService
to
handle the import, we can provide a
Map<String,String>
of field name mappings. To keep
the code simple, we'll use a pretty short map:
Map<String,String> fieldMappings = new HashMap(); fieldMappings.put("altHeadline", "alternativeHeadline"); fieldMappings.put("body", "articleBody"); fieldMappings.put("pubDate", "datePublished"); fieldMappings.put("headline", "headline"); fieldMappings.put("site", "siteId");
The key is the column name in the import data and the value is
the field name from the BlogPosting
that the data
should store to.
If your source data uses the same field names as the
BlogPosting
, you can skip the field mapping and just pass
a null to the API.
Submitting the Import Task
To submit the import task, you need an @Reference
to the BatchEngineImportTaskLocalService
into your
class. Then you can use the
addBatchEngineImportTask()
method to submit the task
such as follows:
BatchEngineImportTask importTask; importTask = _batchEngineImportTaskLocalService.addBatchEngineImportTask( companyId, userId, numRecords, null, BlogPosting.class.getName(), importDataContent, "JSON", BatchEngineTaskExecutorStatus.INITIAL.name(), fieldMappings, BatchEngineTaskOperation.CREATE.name(), null, null);
There are two different constant classes here to review.
The BatchEngineTaskExecutorStatus
is the status for
the individual task. You should always use INITIAL
. As
the BatchEngineTaskExecutor
runs, it will change the
status first to STARTED
and it will finish as
COMPLETED
or FAILED
depending upon the
outcome of the batch run.
The BatchEngineTaskOperation
is the operation the
engine is going to complete. The options are CREATE
,
READ
, UPDATE
or DELETE
(your
basic CRUD options). You won't be using the READ
option
for the import task, but the others come in handy.
The only thing here indicating the type of data that is being
imported is the BlogPosting
class reference and the
content data itself and optionally the field mappings. Regardless of
what data you want to import using Batch, as long as you have the
corresponding RESTBuilder entity class available, you can import using batch.
Note that the importDataContent above is a byte[] array of the data for the import. It can be just the data alone, or you can pass the zipped data to the call (the Batch Engine will unzip the data when it is being processed).
Executing the Import Task
Once your task has been created, you can queue it up for
execution. You'll need to @Reference
in a
BatchEngineImportTaskExecutor
to execute the task as follows:
_batchEngineImportTaskExecutor.execute(importTask);
Because the import is an asynchronous process, this may not execute the import right away and it will not wait for the import to complete before returning back to your code.
Exporting BlogPostings
Exporting entities requires three key elements: the data format to use, an optional list of field names to include, and finally the parameters to use to complete the data query.
For the optional list of field names, you could pass an empty list (to include all fields from the entity) or you can specify those fields that you wanted to use. We could create our list as:
List<String> fieldNames = new ArrayList(); fieldNames.add("id"); fieldNames.add("alternativeHeadline"); fieldNames.add("datePublished"); fieldNames.add("headline"); fieldNames.add("articleBody");
The parameters are those arguments that will be necessary to
complete or include in the data query. For the blog postings, we may
want to limit the export to blog postings from a single site rather
than all sites. Parameters are passed using a
Map<String,Serializable>
created such as follows:
Map<String,Serializable> params = new HashMap(); long siteId = 1234L; params.put("siteId", siteId);
One of the cool parts is that you can actually pass any of the parameters that you might use on a headless GET request to sort and filter the list. The search, sort, filter, fields, restrictFields and flatten parameters are all supported when it comes to exporting the data, just add the parameters to the map with the right corresponding values and you're good to go.
Just check out these links to see how to use the parameters:
- https://help.liferay.com/hc/en-us/articles/360036343152-Filter-Sort-and-Search
- https://help.liferay.com/hc/en-us/articles/360039425651-Restrict-Properties
- https://help.liferay.com/hc/en-us/articles/360039026232-Filterable-properties
When specifying fields, note that you not only control the fields that will be included in the export, you also control the order of those fields in the exported content.
Submitting the Export Task
Once we have the three key pieces of information, we can submit
the export task. We need to have an @Reference
on the
BatchEngineExportTaskLocalService
to add the new task:
BatchEngineExportTask exportTask; exportTask = _batchEngineExportTaskLocalService.addBatchEngineExportTask( companyId, userId, null, BlogPosting.class.getName(), "JSON", BatchEngineTaskExecuteStatus.INITIAL.name(), fieldNames, params, null);
So the only things here that identify the type of data being
exported is the BlogPosting
class and the fieldNames list
(params might also give it away depending upon what has been defined).
This same call can be made to export any headless entity the system
has, not just BlogPosting
s.
Executing the Export Task
To execute our new export task, we need an
@Reference
on the BatchEngineExportTaskExecutor
:
_batchEngineExportTaskExecutor.execute(exportTask);
This will queue up the export task, but as it is an asynchronous process it may not start right away and may not complete before the method call returns.
Extracting the Exported Data
We need to take an extra step to get an InputStream
to the exported data:
InputStream is = _batchEngineExportTaskLocalService.openContentInputStream( exportTask.getBatchEngineExportTaskId());
Once we have the input stream, we can write it to a file or send it out via a network connection or do whatever we need to with it, just be sure to close the stream when you're done with it.
Note that we just jumped into getting the input stream but we
didn't check the status of the export to see if it has reached the
BatchEngineTaskExecuteStatus.COMPLETED
or
FAILED
states, so the input stream might not be ready
yet. You could block and wait for the status to be updated if you are
concerned that you are trying to access the exported data before it is
ready. The headless batch export will return a stub object on each
call until the status is COMPLETED
, only then is the
exported data returned.
Deleting BlogPostings
Batch also supports deletions, but as a special form of a BatchEngineImportTask
.
The data formats are all supported, but you are only going to pass a single element, the ID. Here's what our JSONL might look like to delete a bunch of BlogPostings:
{"id":1234} {"id":1235} {"id":1236}
We'd also be invoking the
BatchEngineImportTaskLocalService
and
BatchEngineImportTaskExecutor
to do the real work:
BatchEngineImportTask deleteTask; deleteTask = _batchEngineImportTaskLocalService.addBatchEngineImportTask( companyId, userId, 10, null, BlogPosting.class.getName(), deleteJsonContent, "JSONL", BatchEngineExecuteStatus.INITIAL.name(), null, BatchEngineTaskOperation.DELETE.name(), null, null); _batchEngineImportTaskExcecutor.execute(deleteTask);
Again this is an asynchronous task, so the deletion may not start when this call is issued and it may not complete when the method returns.
Batch REST Services
The Batch Engine can also be accessed via REST service calls.
A PUT
, POST
, or DELETE
request can be sent to the REST endpoints w/ the "/batch"
suffix, such as http://localhost:8080/o/headless-delivery/v1.0/sites/{siteId}/blog-postings/batch
.
As defined here,
the only URL parameter is the site id. The body of the request is
going to be the JSON (the only supported format) for the batch
operation (CREATE
, UPDATE
, or
DELETE
). The batch export operation would be handled by
the regular
GET request.
Direct Batch Engine Services
An additional set of Batch REST services are exposed by the Batch Engine itself, so you don't have to go to the individual batch methods themselves.
These entry points are defined here, but they're pretty easy to use. They are direct methods to create ExportTasks and ImportTasks (kind of like what we did in the Java-based APIs above).
To do a batch create of blog posts using the direct Batch REST
endpoints, I would be doing a POST to
http://localhost:8080/o/headless-batch-engine/v1.0/import-task/com.liferay.headless.delivery.dto.v1_0.BlogPosting
,
and the body of the POST would be the data for the import (CSV, JSON,
etc., just make sure the content type matches the data format).
The last argument on the URL is the fully qualified class name
for the headless data type, so it is basically the value for
BlogPosting.class.getName()
. An easy way to find it is to
navigate in SwaggerHub to the schema definition, and on the right hand
side look for the x-class-name
attribute default value. For the BlogPosting
, I checked
here
to get the string.
There are corresponding APIs for the remaining
UPDATE
and DELETE
operations, and another
REST entry point for the /export-task operations for the batch export.
Auto-Deploy Batch Files
Yes, you can also auto-deploy Batch files too!
To deploy a batch file, you'll be creating a special zip file
and dropping that file into the $LIFERAY_HOME/deploy
folder.
The zip file will contain two files: a batch-engine.json file (defining the batch job) and another file (with any name) that contains the data. Both files can be in a subdirectory in the zip file, but they have to be in the same directory. I would recommend just putting them at the root of the zip file and keep things as simple as possible.
The batch-engine.json file is a JSON file that
defines the batch job and basically captures all of the same details
that we see in the
BatchEngineImportTaskLocalService.addBatchEngineImportTask()
method:
{ "callbackURL": null, "className": "com.liferay.headless.delivery.dto.v1_0.BlogPosting", "companyId": 20095, "fieldNameMappingMap": { "altHeadline": "alternativeHeadline", "body": "articleBody", "pubDate": "datePublished", "headline": "headline", "site": "siteId" }, "parameters": null, "userId": 20124, "version": "v1.0" }
So many of the same parameters, and again the only part
specifying that I'm doing a BlogPosting
is the
className value, so I could just as easily swap out with another
Liferay Headless type or my own custom types.
The data file that goes along with this can have any name and it must have the correct extension that identifies the format of the data, so .json for JSON, .jsonl for JSONL, .csv for CSV, etc., and it has to be in the same directory in the zip file as the batch-engine.json file is.
In the example I provided above, you can see that the parameters
argument is null. This will work to import blog postings as long as
the data file includes the "site" column that is defined in
the fieldNameMappingMap. If your blog postings data does not have a
"site" column with the site id, you'd have to add it to the
parameters stanza to support loading the blogs into a specific site.
In this case the entry would look like "parameters":
{"siteId": "20123"},
. Like the
fieldNameMappingMap, this is how you would declare necessary
parameters for the batch data processing.
Custom Batch Handling
Liferay Batch also supports building your own Batch handling class. Just create a module and create a component that implements the BatchEngineTaskItemDelegate interface. It's a Generic interface which defines the CRUD operations for the Generic type that they handle.
So even though there is no Headless support for the Liferay Role
entity, you could register an implementation of
BatchEngineTaskItemDelegate<Role>
and get the Batch
Engine to support roles either via the API or the auto-deploy batch
files (you won't get the REST endpoint without full RESTBuilder support).
Conclusion
Well that's all I know about Batch and how to use it in Liferay. I hope you find this useful for your own implementations!