React on Azure Data Lake Store Gen2 events!

Azure Data Lake Storage Gen2 has now support for Event Grid events, similar to Azure Blob Storage.  This is ideal to start processes automatically, based on certain events that happen within the Data Lake.  In my scenario, I want to process a file, when it is added to the Data Lake.  Bear in mind that at the time of writing, this feature is still in early preview, limited to certain regions.

Investigate Event Grid events

The list of available events for Azure Data Lake Storage is documented over here.  The event that catches my interest is the Microsoft.Storage.BlobCreated event.  At first sight, this looks identical to the events that are available for Azure Blob Storage, but the devil is in the (documented) detail.  By default, there are two Microsoft.Storage.BlobCreated events: one for the CreateFile operation and one for the FlushWithClose operation.  As the documentation states: it’s advised to take the latter.

adlevents1

Create Event Grid subscription

  • Create a Logic App that starts with the Request trigger.  Copy the Request URL.
  • Go to the Data Lake Storage Account and click on the Events tab.

adlevents2

  • Click on the Add Event Subscription button
  • Provide the following properties:
    • A meaningful name
    • Select the Blob Created event type
    • Select the Webhook endpoint type
    • Provide the Logic Apps URL as the endpoint

adlevents3

  • Create an advanced filter to ensure we only get a single event per created file

adlevents4

  • Click on Create to finalize the setup of the event subscription.

Configure the Logic App

  • Event Grid can send messages in batch, so first we have to configure the SplitOn trigger.

adlevents5

adlevents6

  • Provide the following (complex) expressions to extract the needed information:
    • Account name:
      split(split(triggerBody().data.url, '/')[2], '.')[0]
    • File system name:
      split(triggerBody().data.url, '/')[3]
    • Folder path:
      substring(split(triggerBody().data.url, '.net/')[1], indexOf(split(triggerBody().data.url, '.net/')[1], '/'), sub(lastIndexOf(split(triggerBody().data.url, '.net/')[1], '/'), indexOf(split(triggerBody().data.url, '.net/')[1], '/')))
    • File name:
      split(triggerBody().data.url, '/')[sub(length(split(triggerBody().data.url, '/')), 1)]

adlevents7

  • In this way, you can get the file content and perform the rest of the needed processing:

adlevents8

Conclusion

It was a bit of a hassle to get the expressions working, but once they are configured, it works as a charm!

Happy eventing!
Toon

ABOUT

MEET THE YOUR AZURE COACH TEAM

Your Azure Coach is specialized in organizing Azure trainings that are infused with real-life experience. All our coaches are active consultants, who are very passionate and who love to share their Azure expertise with you.