When we announced the December 2013 release, an exciting new feature also saw daylight: The Batch Module. If you haven’t read the post describing the feature’s highlights, you should, but today I’d like to focus on how the <batch:commit>block interacts with Anypoint™ Connectors and more specifically, how you can leverage your own connectors to take advantage of the feature.
In a nutshell, you can use a Batch Commit block to collect a subset of records for bulk upsert to an external source or service. For example, rather than upserting each individual contact (i.e. record) to Google Contacts, you can configure a Batch Commit to collect, lets say 100 records, and then upsert all of them to Google Contacts in one chunk. Within a batch step – the only place you can apply it – you can use a Batch Commit to wrap an outbound message processor.
Mixing in Connectors
This is all great but what do connectors have to do with this? Well, the only reason why the example above makes any sense at all is because the Google Connector is capable of doing bulk operations. If the connector only supported updating records one at a time, then there would be no reason for <batch:commit> to exists.
But wait! The batch module was only released two months ago, yet connectors like Google Contacts, Salesforce, Netsuite, etc, have had bulk operations for years! True that. But what we didn’t have until Batch came along was a construct allowing us to do record level error handling.
Suppose that you’re upserting 200 records in Salesforce. In the past, if 100 of them failed and the other 100 were successful, it was up to you to parse the connector response, pull the failed from the successful apart and take appropiate action. If you wanted to do the same with Google Contacts, you again found yourself needing to do everything again, with the extra complexity that you couldn’t reuse your code because Google and Salesforce APIs use completely different representations to notify the operation’s result.
Our goal with the batch module is clear: make this stuff simple. We no longer want you struggling to figure out each API’s representation for a bulk result and handling each failed record independently – from now on, you can rely on <batch:commit> to do that for you automatically.
It’s not magic
“A kind of magic” is one of my favorite songs from Queen, specially the live performance at Wembley Stadium in ’86. Although the magic described in that song doesn’t apply to batch and connector’s mechanisms, there’s one phrase in that song which accurately describes the problem here: “There can only be one”.
If we want the Batch module to understand all types of a bulk operations results, we need to start by defining a canonical way of representing it. We did so in a class called BulkOperationResult which defines the following contact:
Basically, the above class is a Master-Detail relationship in which:
- BulkOperationResult represents the operation as a whole, playing the role of the master
- BulkItem represents the result for each individual record, playing the role of the detail
- Both classes are immutable
- There’s an ordering relationship between the master and the detail. The first item in the BulkItem list has to correspond to the first record in the original bulk. The second has to correspond to the second one, and so forth.
In case you’re curious, this is how BulkItem’s contact looks like:
So, that’s it? We just modify all connectors to return a BulkOperationResult object on all bulk operations and we’re done? Not quite. That would be the recommended practice for new connectors moving forward, but for existing connectors we would be breaking backwards compatibility with any existing Mule application written before the release of the Batch module, which are manually handling the output of bulk operations.
What we did in these cases is have those connectors register a Transformer. Since it’s each connector’s responsibility to understand each API’s domain, it also makes sense to ask each connector to translate it’s own bulk operation representation to a BulkOperationResult object.
Let’s see an example. This is the signature for an operation in the Google Contacts connector which performs a bulk operation:
Let’s forget about the implementation of the method right now. The take away from the above snippet is that the operation will return a List of BatchResult objects. Let’s see how to register a transformer that goes from that to a BulkOperationResult:
And for the big finale, the code of the transformer itself:
Important things to notice about the above transformer:
- It extends AbstractDiscoverableTransformer. This is so that the batch module can dynamically find it in runtime.
- It defines the source and target data types on its constructor
- The doTransform() method does “the magic”
- Notice how BulkOperationResult and BulkItem classes provide convenient Builder objects to decouple their inner representations from your connector’s code
And that’s pretty much it! The last consideration to take is: what happens if I use a bulk operation in a <batch:commit> using a connector that doesn’t support reporting a BulkOperationResult? Well, in that case you have two options:
- Write the transformer and register it yourself at an application level
- Just let it be and in case of exception, batch will fail all records alike
Wrapping it up
In this article we discussed why it’s important for connectors to support bulk operations whenever possible (some APIs just can’t do it, that’s not your fault). For new connectors, we advice to always return instances of the canonical BulkOperationResult class. If you want to add batch support to an existing connector without breaking backwards compatibility, we covered how to register discoverable transformers to do the trick.
As always, I hope you enjoyed the reading, and since we spoke about magic so much, this time I’ll say farewell with some music. Enjoy!