A while ago, I have built an web-based self-service portal that facilitated multiple teams in the organisation, setting up their Access Control (ACLs) for corresponding data lake folders.
The portal application was targeting Azure Data Lake Gen 1. Recently I wanted to achieve the same but on Azure Data Lake Gen 2. At the time of writing this post, there’s no official NuGet package for ACL management targeting Data Lake Gen 2. One must rely on REST API only.
Read about known issues and limitations of Azure Data Lake Storage Gen 2
Further more, the REST API documentations do not provide example snippets like many other Azure resources. Therefore, it takes time to demystify the REST APIs to manipulate ACLs. Good new is, I have done that for you and will share a straight-forward C# class that wraps the details and issues correct REST API calls to a Data Lake Store Gen 2.
About Azure Data Lake Store Gen 2
Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics. Data Lake Storage Gen2 is significantly different from it’s earlier version known as Azure Data Lake Storage Gen1, Gen2 is entirely built on Azure Blob storage.
Data Lake Storage Gen2 is the result of converging the capabilities of two existing Azure storage services, Azure Blob storage and Azure Data Lake Storage Gen1. Gen1 Features such as file system semantics, directory, and file level security and scale are combined with low-cost, tiered storage, high availability/disaster recovery capabilities from Azure Blob storage.
Let’s get started!
Create a Service Principal
First we would need a service principal. We will use this principal to authenticate to Azure Active Directory (using OAuth 2.0 protocol) in order to authorize our REST calls. We will use Azure CLI to do that.
az ad sp create-for-rbac --name ServicePrincipalName
Add required permissions
Now you need to grant permission for your application to access Azure Storage.
- Click on the application Settings
- Click on Required permissions
- Click on Add
- Click Select API
- Filter on Azure Storage
- Click on Azure Storage
- Click Select
- Click the checkbox next to Access Azure Storage
- Click Select
- Click Done
Now we have Client ID, Client Secret and Tenant ID (take it from the Properties tab of Azure Active Directory – listed as Directory ID).
Access Token from Azure Active Directory
Let’s write some C# code to get an Access Token from Azure Active Directory:
Creating ADLS Gen 2 REST client
Once we have the token provider, we can jump in implementing the REST client for Azure Data Lake.
Data Lake ACLs and POSIX permissions
The security model for Data Lake Gen2 supports ACL and POSIX permissions along with some extra granularity specific to Data Lake Storage Gen2. Settings may be configured through Storage Explorer or through frameworks like Hive and Spark. We will do that via REST API in this post.
There are two kinds of access control lists (ACLs), Access ACLs and Default ACLs.
- Access ACLs: These control access to an object. Files and folders both have Access ACLs.
- Default ACLs: A “template” of ACLs associated with a folder that determine the Access ACLs for any child items that are created under that folder. Files do not have Default ACLs.
Here’s the table of allowed grant types:
While we define ACLs we need to use a short form of these grant types. Microsoft Document explained these short form in below table:
However, in our code we would also simplify the POSIX ACL notations by using some supporting classes as below. That way REST client consumers do not need to spend time building the short form of their aimed grant criteria’s.
Now we can create methods to perform different REST calls, let’s start by creating a file system.
Here we are retrieving a Access Token and then issuing a REST call to Azure Data Lake Storage Gen 2 API to create a new file system. Next, we will create a folder and file in it and then set some Access Control to them.
Let’s create the folder:
And creating file in it. Now, file creation (ingestion in Data Lake) is not that straight forward, at least, one can’t do that by a single call. We would have to first create an empty file, then we can write some content in it. We can also append content to an existing file. Finally, we would require to flush the buffer so the new content gets persisted.
Let’s do that, first we will see how to create an empty file:
The above snippet will create an empty file, now we will read all content from a local file (from PC) and write them into the empty file in Azure Data Lake that we just created.
Right! Now time to set Access control to the directory or files inside a directory. Here’s the method that we will use to do that.
The entire File system REST API class can be found here. Here’s an example how we can use this methods from a console application.
Until, there’s an Official Client Package released, if you’re into Azure Data Lake Store Gen 2 and wondering how to accomplish these REST calls – I hope this post helped you to move further!
Thanks for reading.