Access Control management via REST API – Azure Data Lake Gen 2

Background

A while ago, I have built an web-based self-service portal that facilitated multiple teams in the organisation, setting up their Access Control (ACLs) for corresponding data lake folders.

The portal application was targeting Azure Data Lake Gen 1. Recently I wanted to achieve the same but on Azure Data Lake Gen 2. At the time of writing this post, there’s no official NuGet package for ACL management targeting Data Lake Gen 2. One must rely on REST API only.

Read about known issues and limitations of Azure Data Lake Storage Gen 2

Further more, the REST API documentations do not provide example snippets like many other Azure resources. Therefore, it takes time to demystify the REST APIs to manipulate ACLs. Good new is, I have done that for you and will share a straight-forward C# class that wraps the details and issues correct REST API calls to a Data Lake Store Gen 2.

About Azure Data Lake Store Gen 2

Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics. Data Lake Storage Gen2 is significantly different from it’s earlier version known as Azure Data Lake Storage Gen1, Gen2 is entirely built on Azure Blob storage.

Data Lake Storage Gen2 is the result of converging the capabilities of two existing Azure storage services, Azure Blob storage and Azure Data Lake Storage Gen1. Gen1 Features such as file system semantics, directory, and file level security and scale are combined with low-cost, tiered storage, high availability/disaster recovery capabilities from Azure Blob storage.

Let’s get started!

Create a Service Principal

First we would need a service principal. We will use this principal to authenticate to Azure Active Directory (using OAuth 2.0 protocol) in order to authorize our REST calls. We will use Azure CLI to do that.

az ad sp create-for-rbac --name ServicePrincipalName
Add required permissions

Now you need to grant permission for your application to access Azure Storage.

  • Click on the application Settings
  • Click on Required permissions
  • Click on Add
  • Click Select API
  • Filter on Azure Storage
  • Click on Azure Storage
  • Click Select
  • Click the checkbox next to Access Azure Storage
  • Click Select
  • Click Done

App

Now we have Client ID, Client Secret and Tenant ID (take it from the Properties tab of Azure Active Directory – listed as Directory ID).

Access Token from Azure Active Directory

Let’s write some C# code to get an Access Token from Azure Active Directory:

Creating ADLS Gen 2 REST client

Once we have the token provider, we can jump in implementing the REST client for Azure Data Lake.

Data Lake  ACLs and POSIX permissions

The security model for Data Lake Gen2 supports ACL and POSIX permissions along with some extra granularity specific to Data Lake Storage Gen2. Settings may be configured through Storage Explorer or through frameworks like Hive and Spark. We will do that via REST API in this post.

There are two kinds of access control lists (ACLs), Access ACLs and Default ACLs.

  • Access ACLs: These control access to an object. Files and folders both have Access ACLs.
  • Default ACLs: A “template” of ACLs associated with a folder that determine the Access ACLs for any child items that are created under that folder. Files do not have Default ACLs.

Here’s the table of allowed grant types:

acl1

While we define ACLs we need to use a short form of these grant types. Microsoft Document explained these short form in below table:

posix

However, in our code we would also simplify the POSIX ACL notations by using some supporting classes as below. That way REST client consumers do not need to spend time building the short form of their aimed grant criteria’s.

Now we can create methods to perform different REST calls, let’s start by creating a file system.

Here we are retrieving a Access Token and then issuing a REST call to Azure Data Lake Storage Gen 2 API to create a new file system. Next, we will create a folder and file in it and then set some Access Control to them.

Let’s create the folder:

And creating file in it. Now, file creation (ingestion in Data Lake) is not that straight forward, at least, one can’t do that by a single call. We would have to first create an empty file, then we can write some content in it. We can also append content to an existing file. Finally, we would require to flush the buffer so the new content gets persisted.

Let’s do that, first we will see how to create an empty file:

The above snippet will create an empty file, now we will read all content from a local file (from PC) and write them into the empty file in Azure Data Lake that we just created.

Right! Now time to set Access control to the directory or files inside a directory. Here’s the method that we will use to do that.

The entire File system REST API class can be found here. Here’s an example how we can use this methods from a console application.

Conclusion

Until, there’s an Official Client Package released, if you’re into Azure Data Lake Store Gen 2 and wondering how to accomplish these REST calls – I hope this post helped you to move further!

Thanks for reading.

 

OpenSSL as Service

OpenSSL is awesome! Though, requires little manual work to remember all the commands, executing them in a machine that has OpenSSL installed. In this post, I’m about to build an HTTP API over OpenSSL, with the most commonly used commands (and the possibility to extend it further – as required). This will help folks who wants to run OpenSSL in a private network but wants to orchestrate it in their automation workflows.

Background

Ever wanted to automate the TLS (also known as SSL) configuration process for your web application? You know, the sites that served via HTTPS and Chrome shows a green “secure” mark in address bar. Serving site over HTTP is insecure (even for static contents) and major browsers will mark those sites as not secure, Chrome already does that today.

Serving contents via HTTPS involves buying a digital certificate (aka SSL/TLS certificate) from certificate authorities (CA). The process seemed complicated (sometimes expensive too) by many average site owners or developers. Let’s encrypt addressed this hardship and made it painless. It’s an open certificate authority that provides free TLS certificates in an automated and elegant way.

However, free certificates might not be ideal for enterprise scenarios. Enterprise might have a requirement to buy certificate from a specific CA. In many cases, that process is manual and often complicated and slow. Typically, the workflow starts by generating a Certificate Signing request (also known as CSR) which requires generating asymmetric key pair (a public and private key pair). Which is then sent to CA to get a Digital Identity certificate. This doesn’t stop here. Once the certificate is provided by the CA, sometimes (Specially if you are in IIS, .net or Azure world) it’s needed to be converted to a PFX (Personal Information Exchange) file to deploy the certificate to the web server.

PFX (aka PKCS #12) is a file format defines an archive file format for storing many cryptography objects as a single file. It’s used to bundle a private key with it’s X.509 certificate or bundling all the members of a chain of trust. This file may be encrypted and signed. The internal storage containers (aka SafeBags), may also be encrypted and signed.

Generating CSR, converting a Digital Identity certificate to PFX format are often done manually. There are some online services that allows you generating CSRs – via an API or an UI. These are very useful and handy, but not the best fit for an enterprise. Because the private keys need to be shared with the online provider – to generate the CSR. Which leads people to use the vastly popular utility – OpenSSL in their local workstation – generating CSRs. In this article, this is exactly what I am trying to avoid. I wanted to have an API over OpenSSL – so that I can invoke it from my other automation workflow running in the Cloud.

Next, we will see how we can expose the OpenSSL over HTTP API in a Docker container, so we can run the container in our private enterprise network and orchestrate this in our certificate automation workflows.

The Solution Design

We will write a .net core web app, exposing the OpenSSL command via web API. Web API requests will fork OpenSSL process with the command and will return the outcome as web API response.

OpenSSL behind .net core Web API

We are using System.Diagnostics.Process to lunch OpenSSL in our code. This is assuming we will have OpenSSL executable present in our path. Which we will ensure soon with Docker.

        private static StringBuilder ExecuteOpenSsl(string command)
        {
            var logs = new StringBuilder();
            var executableName = "openssl";
            var processInfo = new ProcessStartInfo(executableName)
            {
                Arguments = command,
                UseShellExecute = false,
                RedirectStandardError = true,
                RedirectStandardOutput = true,
                CreateNoWindow = true
            };

            var process = Process.Start(processInfo);
            while (!process.StandardOutput.EndOfStream)
            {
                logs.AppendLine(process.StandardOutput.ReadLine());
            }
            logs.AppendLine(process.StandardError.ReadToEnd());
            return logs;
        }

This is simply kicking off OpenSSL executable with a command and capturing the output (or errors). We can now use this in our Web API controller.

    /// <summary>
    /// The Open SSL API
    /// </summary>
    [Produces("application/json")]
    [Route("api/OpenSsl")]
    public class OpenSslController : Controller
    {
        /// <summary>
        /// Creates a new CSR
        /// </summary>
        /// Payload info
        /// The CSR with private key
        [HttpPost("CSR")]
        public async Task Csr([FromBody] CsrRequestPayload payload)
        {
            var response = await CertificateManager.GenerateCSRAsync(payload);
            return new JsonResult(response);
        }

This snippet only shows one example, where we are receiving a CSR generation request and using the OpenSSL to generate, returning the CSR details (in a base64 encoded string format) as API response.

Other commands are following the same model, so skipping them here.

Building Docker Image

Above snippet assumes that we have OpenSSL installed in the machine and the executable’s path is registered in our system’s path. We will turn that assumption to a fact by installing OpenSSL in our Docker image.

FROM microsoft/aspnetcore:2.0 AS base

RUN apt-get update -y
RUN apt-get install openssl

Here we are using aspnetcore:2.0 as our base image (which is a Linux distribution) and installing OpenSSL right after.

Let’s Run it!

I have built the docker image and published it to Docker Hub. All we need is to run it:

Untitled-1

The default port of the web API is 80, though in this example we will run it on 8080. Let’s open a browser pointing to:

http:localhost:8080/ 

Voila! We have our API’s. Here’s the Swagger UI for the web API.

swagger

And we can test our CSR generation API via Postman:

Postman

The complete code for this web app with Docker file can be found in this GitHub Repository. The Docker image is in Docker Hub.

Thanks for reading.