Recently, I used Azure Data Factory again to build a big data ingestion pipeline. The input data came from different sources, but needed to end up eventually in Azure Blob Storage. Everything went pretty smooth and I had my data in the desired format, inside the expected blob container. One final task that remained: I needed to clean-up some temporary blobs within my container. It was an unpleasant surprise to see that there’s no out-of-the-box support to delete files in blob storage. It’s apparently a highly requested feature.
On the web, I’ve found several alternatives, of which using Logic Apps seemed the most convenient workaround. However, I was not satisfied, as this solution introduces an additional complexity and dependency. Why not using the Azure Blob Storage REST API? Authentication against this API can be typically hard, unless you can leverage Managed Service Identity. As Data Factory supports MSI, I was curious if it could work… Yes it did, otherwise I wouldn’t start a blog on it 🙂
- Let’s add the Web activity and give it a meaning full name:
- Provide the URL of the blob, as the documentation states. Select the DELETE method and include also the mandatory x-ms-date and x-ms-version headers:
- As the authentication method, select MSI and provide https://storage.azure.com/ as the resource:
- Give the Managed Service Identity Storage Blob Data Contributor access rights on the storage account. Remark that being an owner is not sufficient, in contradiction to what the documentation states. This is probably because it’s still a preview feature. Thanks Joonas Westlin, for your help on this one!
- If all if this is configured correctly, you can easily delete the blob from its storage account:
Another password-less authentication, which we can only encourage. Please remark that, at the time of writing, this AD integration is still a preview feature for Azure Storage. A real pity that the Azure Data Factory documentation is not updated with this great MSI functionality!