Azure Data Lake Storage Gen2 has now support for Event Grid events, similar to Azure Blob Storage. This is ideal to start processes automatically, based on certain events that happen within the Data Lake. In my scenario, I want to process a file, when it is added to the Data Lake. Bear in mind that at the time of writing, this feature is still in early preview, limited to certain regions.
Investigate Event Grid events
The list of available events for Azure Data Lake Storage is documented over here. The event that catches my interest is the Microsoft.Storage.BlobCreated event. At first sight, this looks identical to the events that are available for Azure Blob Storage, but the devil is in the (documented) detail. By default, there are two Microsoft.Storage.BlobCreated events: one for the CreateFile operation and one for the FlushWithClose operation. As the documentation states: it’s advised to take the latter.
Create Event Grid subscription
- Create a Logic App that starts with the Request trigger. Copy the Request URL.
- Go to the Data Lake Storage Account and click on the Events tab.
- Click on the Add Event Subscription button
- Provide the following properties:
- A meaningful name
- Select the Blob Created event type
- Select the Webhook endpoint type
- Provide the Logic Apps URL as the endpoint
- Create an advanced filter to ensure we only get a single event per created file
- Click on Create to finalize the setup of the event subscription.
Configure the Logic App
- Event Grid can send messages in batch, so first we have to configure the SplitOn trigger.
- Use the Get File action of the Azure Data Lake Gen2 connector.
- Provide the following (complex) expressions to extract the needed information:
- Account name:
split(split(triggerBody().data.url, '/')[2], '.')[0]
- File system name:
split(triggerBody().data.url, '/')[3]
- Folder path:
substring(split(triggerBody().data.url, '.net/')[1], indexOf(split(triggerBody().data.url, '.net/')[1], '/'), sub(lastIndexOf(split(triggerBody().data.url, '.net/')[1], '/'), indexOf(split(triggerBody().data.url, '.net/')[1], '/')))
- File name:
split(triggerBody().data.url, '/')[sub(length(split(triggerBody().data.url, '/')), 1)]
- Account name:
- In this way, you can get the file content and perform the rest of the needed processing:
Conclusion
It was a bit of a hassle to get the expressions working, but once they are configured, it works as a charm!
Happy eventing!
Toon