React on Azure Data Lake Store Gen2 events!

Azure Data Lake Storage Gen2 has now support for Event Grid events, similar to Azure Blob Storage.  This is ideal to start processes automatically, based on certain events that happen within the Data Lake.  In my scenario, I want to process a file, when it is added to the Data Lake.  Bear in mind that at the time of writing, this feature is still in early preview, limited to certain regions.

Investigate Event Grid events

The list of available events for Azure Data Lake Storage is documented over here.  The event that catches my interest is the Microsoft.Storage.BlobCreated event.  At first sight, this looks identical to the events that are available for Azure Blob Storage, but the devil is in the (documented) detail.  By default, there are two Microsoft.Storage.BlobCreated events: one for the CreateFile operation and one for the FlushWithClose operation.  As the documentation states: it’s advised to take the latter.

adlevents1

Create Event Grid subscription

  • Create a Logic App that starts with the Request trigger.  Copy the Request URL.
  • Go to the Data Lake Storage Account and click on the Events tab.

adlevents2

  • Click on the Add Event Subscription button
  • Provide the following properties:
    • A meaningful name
    • Select the Blob Created event type
    • Select the Webhook endpoint type
    • Provide the Logic Apps URL as the endpoint

adlevents3

  • Create an advanced filter to ensure we only get a single event per created file

adlevents4

  • Click on Create to finalize the setup of the event subscription.

Configure the Logic App

  • Event Grid can send messages in batch, so first we have to configure the SplitOn trigger.

adlevents5

adlevents6

  • Provide the following (complex) expressions to extract the needed information:
    • Account name:
      split(split(triggerBody().data.url, '/')[2], '.')[0]
    • File system name:
      split(triggerBody().data.url, '/')[3]
    • Folder path:
      substring(split(triggerBody().data.url, '.net/')[1], indexOf(split(triggerBody().data.url, '.net/')[1], '/'), sub(lastIndexOf(split(triggerBody().data.url, '.net/')[1], '/'), indexOf(split(triggerBody().data.url, '.net/')[1], '/')))
    • File name:
      split(triggerBody().data.url, '/')[sub(length(split(triggerBody().data.url, '/')), 1)]

adlevents7

  • In this way, you can get the file content and perform the rest of the needed processing:

adlevents8

Conclusion

It was a bit of a hassle to get the expressions working, but once they are configured, it works as a charm!

Happy eventing!
Toon

About me

Hi! I’m Toon Vanhoutte, a hands-on Azure architect – based in Belgium – with a big passion for teaching and helping people out. I’m happy to assist you during your Azure journey with high-quality advisory and I would love to teach you Azure’s possibilities via my tailored training courses.

Subscribe to the blog