Data Synchronizing made easy with Watermarks in Mule ESB

Facebooktwittergoogle_plusredditpinterestlinkedinmail

We’ve all been there. Sooner or later, someone asks you to periodically synchronize information from one system into another. No shame to admit it, it happens in the best families. Such an integration should start with getting the objects that have been modified since the last sync (or all of them in the case of the very first sync). Sure, this first part sounds like the easiest of all the sync process (and in some cases it actually is), but it’s not without its complexity. Step by step you need to:

  • Go into a persistent store and grab the timestamp for the last update
  • If this is the first update, then you need to use a default value
  • Go into the source system and get all the objects that have been updated since that timestamp
  • Update the persistent store with the timestamp that should be used on the next sync. Notice that because the server time on your machine probably differs from the clock on the source system, you can’t simply use a value like “server.dateTime”. Instead, you need to select the greatest timestamp in the dataset you just got
  • Finally, you need to have logic not to update the timestamp if the sync failed

Not so simple after all isn’t it? Let’s take a look at a sample flow that does this in Mule:

This is how it would look like in XML:

At Mulesoft we believe that this use case is generic enough to deserve a simpler solution. The flow above is way too complex and we haven’t even done the sync logic yet! So, we came out with the concept of Watermark.

Watermarking

The concept of Watermark refers to a flood after-match in which you look at the water stains in a wall to figure how high the water got, which is pretty much what we want to do in this use case: figure out which was the last item we updated and move from there on. What does watermark do for you?

  • It will automatically handle the ObjectStore, you don’t need to worry about that anymore
  • It will take care of checking if a watermark value already exists
  • It will help you get the next value
  • It will update the value when the flow is completed and will leave it untouched if it fails.
  • It will work with any type of List, Collection, Iterable or Iterator (including auto-paging ones)

So, let’s take a look at how the same flow looks using watermark:Way simpler isn’t it? What where’s the magic? Where did the behaviour go? The answer is the poll element. Let’s take a look:


As you can see the poll element now has a watermark element that implements the same behaviour we saw in the first example but in a declarative way:

  • First we provided the name of the flowVar that will hold the value
  • Then we provide the default value expression, in case the component cannot find a value for the watermark. This is the first big gain: we don’t have to worry about the ObjectStore’s state
  • Then we choose a selector, which is the criteria we want to use to pick the next value. There’re four available selectors: MIN, MAX, FIRST and LAST. In our case, we want the greatest LastModifiedDate so we’ll choose MAX
  • Finally, we entered a selector expression. This expression works in tandem with the selector by being executed on each object the salesforce query returned. The selector collects the return values and selects the right value
  • Optionally you can specify which object store you want to use but you don’t have to. Mule will select the user object store automatically.

If the flow ends without errors, watermark will automatically update the object store with the new value. Yeah! Go away complexity! Look how much more compact the XML looks like:

Advanced use cases

In our experience, the four available selectors pretty much fit most of the use cases. If your use case requires custom logic to determine the new value, you can also provide your own update expression which will be evaluated once at the end of the flow.

Summary

Watermark is a tool to simplify querying for updated objects, which is a very common use case when synchronizing data. This feature is available in the latest Studio. What do you think of it? Thanks for reading!

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Mariano Gonzalez

I first started working on the IT industry in 2001 and I've been working as software architect and team leader since 2006, mainly on BPM/ERP applications for industries in the agriculture, energy, government, IT, telecom and content management industries, serving roles of Team Leader and software architect. Since 2011, I also started to specialize in SaaS applications and integration on the Cloud. I have a strong orientation to both, software architecturing and design as for team management, I'm constantly looking for challenges that allows me to develop and increase my social and technical skills

You may also like...

2 Responses

  1. mule user says:

    Hi

    Do you know when default-expression=”#[server.dateTime.format("yyyy-MM-dd’T’HH:mm:ss.SSS’Z’")]” is calcuated. If it is calculated after the query is done isn’t there a chance that the next query might miss things that were updated from the time query was fired to the time query is completed

    • Watermark values are always determined before executing the message processors in the scope.

      If the poll is running for the first time and there’s no watermark value in the ObjectStore, then the default-expression is calculated. Otherwise, the stored value is used.

      In either case, you’ll always have a valid watermark value before running the query.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>