Key Vault as backing store of Azure Functions

If you have used Azure function, you probably are aware that Azure Functions leverages a Storage Account underneath to support the file storage (where the function app code resides as Azure File share) and also as a backing store to keep Functions Keys (the secrets that are used in Function invocations).

Containers

Figure: Storage Account containers – “azure-webjobs-secrets”

If you look inside the container there are files with following contents:

secrets-in-storages

Figure: These JSON files has the function keys

host-json

Figure: Encrypted master keys  and other function keys

I have been in a conversation where; it was not appreciated to see the keys stored in the storage account. The security and governance team was seeking for a better place to keep these keys. Where secrets can be further restricted from developer access.

Of course, we can create a VNET around the storage accountand use private link but that has some other consequence as the content (functions implementations artifacts) stored also into the storage account. Configuring two separate storage account can address this better, however, this can make the setup complicated than it has to be.
A better option could be to store this keys into a Key Vault as backing store – which is a great feature of Azure functions, but I’ve found few people are aware of this due to lack of documentations. In this article I will show you how to move these secrets to a Key Vault.

To do so, we need to configure few Application Settings into the Function App. They are given below:

App Settings name Value
AzureWebJobsSecretStorageType keyvault
AzureWebJobsSecretStorageKeyVaultName <Key Vault Name>
AzureWebJobsSecretStorageKeyVaultConnectionString <Connection String or Leave it empty with Managed Identity configured on Azure Functions>

Once you have configured the above settings, you need to enable Managed Identity on your Azure Function. You will have to accomplish that in Identity section under platform features tab. That is a much better option in my opinion as we don’t need to maintain any more secrets to talk to Key vault securely. Go ahead and turn the system identity toggle on. This will create a service principal with the same name as Azure Function application you have.

managedidentity

Figure: Enabling system assigned managed identity on Function app
Next step is to add a rule to the key vault’s access policies for the service principal created in earlier step.

access policyu

Figure: Key vault Access policy
That’s it, hit your function app now and you will see the keys are stored inside the Key vault. You can safely delete the container from the storage account now.

secretsinkeyvault

Figure: Secrets are stored in Key Vault

Hope this will save time when you are concerned to keep the keys in storage account.
The Azure Function is open sourced and is in GitHub. You can have a look into the sources and see other interesting ideas that you may play with.

Access Control management via REST API – Azure Data Lake Gen 2

Background

A while ago, I have built an web-based self-service portal that facilitated multiple teams in the organisation, setting up their Access Control (ACLs) for corresponding data lake folders.

The portal application was targeting Azure Data Lake Gen 1. Recently I wanted to achieve the same but on Azure Data Lake Gen 2. At the time of writing this post, there’s no official NuGet package for ACL management targeting Data Lake Gen 2. One must rely on REST API only.

Read about known issues and limitations of Azure Data Lake Storage Gen 2

Further more, the REST API documentations do not provide example snippets like many other Azure resources. Therefore, it takes time to demystify the REST APIs to manipulate ACLs. Good new is, I have done that for you and will share a straight-forward C# class that wraps the details and issues correct REST API calls to a Data Lake Store Gen 2.

About Azure Data Lake Store Gen 2

Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics. Data Lake Storage Gen2 is significantly different from it’s earlier version known as Azure Data Lake Storage Gen1, Gen2 is entirely built on Azure Blob storage.

Data Lake Storage Gen2 is the result of converging the capabilities of two existing Azure storage services, Azure Blob storage and Azure Data Lake Storage Gen1. Gen1 Features such as file system semantics, directory, and file level security and scale are combined with low-cost, tiered storage, high availability/disaster recovery capabilities from Azure Blob storage.

Let’s get started!

Create a Service Principal

First we would need a service principal. We will use this principal to authenticate to Azure Active Directory (using OAuth 2.0 protocol) in order to authorize our REST calls. We will use Azure CLI to do that.

az ad sp create-for-rbac --name ServicePrincipalName
Add required permissions

Now you need to grant permission for your application to access Azure Storage.

  • Click on the application Settings
  • Click on Required permissions
  • Click on Add
  • Click Select API
  • Filter on Azure Storage
  • Click on Azure Storage
  • Click Select
  • Click the checkbox next to Access Azure Storage
  • Click Select
  • Click Done

App

Now we have Client ID, Client Secret and Tenant ID (take it from the Properties tab of Azure Active Directory – listed as Directory ID).

Access Token from Azure Active Directory

Let’s write some C# code to get an Access Token from Azure Active Directory:

public class TokenProvider
{
private readonly string tenantId;
private readonly string clientId;
private readonly string secret;
private readonly string scopeUri;
private const string IdentityEndpoint = "https://login.microsoftonline.com";
private const string DEFAULT_SCOPE = "https://management.azure.com/";
private const string MEDIATYPE = "application/x-www-form-urlencoded";
public OAuthTokenProvider(string tenantId, string clientId, string secret, string scopeUri = DEFAULT_SCOPE)
{
this.tenantId = tenantId;
this.clientId = WebUtility.UrlEncode(clientId);
this.secret = WebUtility.UrlEncode(secret);
this.scopeUri = WebUtility.UrlEncode(scopeUri);
}
public async Task<Token> GetAccessTokenV2EndpointAsync()
{
var url = $"{IdentityEndpoint}/{this.tenantId}/oauth2/v2.0/token";
var Http = Statics.Http;
Http.DefaultRequestHeaders.Accept.Clear();
Http.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue(MEDIATYPE));
var body = $"grant_type=client_credentials&client_id={clientId}&client_secret={secret}&scope={scopeUri}";
var response = await Http.PostAsync(url, new StringContent(body, Encoding.UTF8, MEDIATYPE));
if (response.IsSuccessStatusCode)
{
var tokenResponse = await response.Content.ReadAsStringAsync();
return JsonConvert.DeserializeObject<Token>(tokenResponse);
}
return default(Token);
}
public class Token
{
public string access_token { get; set; }
public string token_type { get; set; }
public int expires_in { get; set; }
public int ext_expires_in { get; set; }
}
}

view raw
token-provider.cs
hosted with ❤ by GitHub

Creating ADLS Gen 2 REST client

Once we have the token provider, we can jump in implementing the REST client for Azure Data Lake.

public class FileSystemApi
{
private readonly string storageAccountName;
private readonly OAuthTokenProvider tokenProvider;
private readonly Uri baseUri;
private const string ACK_HEADER_NAME = "x-ms-acl";
private const string API_VERSION_HEADER_NAME = "x-ms-version";
private const string API_VERSION_HEADER_VALUE = "2018-11-09";
private int Timeout = 100;
public FileSystemApi(string storageAccountName, OAuthTokenProvider tokenProvider)
{
this.storageAccountName = storageAccountName;
this.tokenProvider = tokenProvider;
this.baseUri = new Uri($"https://{this.storageAccountName}.dfs.core.windows.net");
}

view raw
file-system.cs
hosted with ❤ by GitHub

Data Lake  ACLs and POSIX permissions

The security model for Data Lake Gen2 supports ACL and POSIX permissions along with some extra granularity specific to Data Lake Storage Gen2. Settings may be configured through Storage Explorer or through frameworks like Hive and Spark. We will do that via REST API in this post.

There are two kinds of access control lists (ACLs), Access ACLs and Default ACLs.

  • Access ACLs: These control access to an object. Files and folders both have Access ACLs.
  • Default ACLs: A “template” of ACLs associated with a folder that determine the Access ACLs for any child items that are created under that folder. Files do not have Default ACLs.

Here’s the table of allowed grant types:

acl1

While we define ACLs we need to use a short form of these grant types. Microsoft Document explained these short form in below table:

posix

However, in our code we would also simplify the POSIX ACL notations by using some supporting classes as below. That way REST client consumers do not need to spend time building the short form of their aimed grant criteria’s.

public enum AclType
{
User,
Group,
Other,
Mask
}
public enum AclScope
{
Access,
Default
}
[FlagsAttribute]
public enum GrantType : short
{
None = 0,
Read = 1,
Write = 2,
Execute = 4
};
public class AclEntry
{
public AclEntry(AclScope scope, AclType type, string upnOrObjectId, GrantType grant)
{
Scope = scope;
AclType = type;
UpnOrObjectId = upnOrObjectId;
Grant = grant;
}
public AclScope Scope { get; private set; }
public AclType AclType { get; private set; }
public string UpnOrObjectId { get; private set; }
public GrantType Grant { get; private set; }
public string GetGrantPosixFormat()
{
return $"{(this.Grant.HasFlag(GrantType.Read) ? 'r' : '-')}{(this.Grant.HasFlag(GrantType.Write) ? 'w' : '-')}{(this.Grant.HasFlag(GrantType.Execute) ? 'x' : '-')}";
}
public override string ToString()
{
return $"{(this.Scope == AclScope.Default ? "default:" : string.Empty)}{this.AclType.ToString().ToLowerInvariant()}:{this.UpnOrObjectId}:{GetGrantPosixFormat()}";
}
}

view raw
acl-supports.cs
hosted with ❤ by GitHub

Now we can create methods to perform different REST calls, let’s start by creating a file system.

public async Task<bool> CreateFileSystemAsync(
string fileSystemName)
{
var tokenInfo = await tokenProvider.GetAccessTokenV2EndpointAsync();
var jsonContent = new StringContent(string.Empty);
var headers = Statics.Http.DefaultRequestHeaders;
headers.Clear();
headers.Add("Authorization", $"Bearer {tokenInfo.access_token}");
headers.Add(API_VERSION_HEADER_NAME, API_VERSION_HEADER_VALUE);
var response = await Statics.Http.PutAsync($"{baseUri}{WebUtility.UrlEncode(fileSystemName)}?resource=filesystem", jsonContent);
return response.IsSuccessStatusCode;
}

Here we are retrieving a Access Token and then issuing a REST call to Azure Data Lake Storage Gen 2 API to create a new file system. Next, we will create a folder and file in it and then set some Access Control to them.

Let’s create the folder:

public async Task<bool> CreateDirectoryAsync(string fileSystemName, string fullPath)
{
var tokenInfo = await tokenProvider.GetAccessTokenV2EndpointAsync();
var jsonContent = new StringContent(string.Empty);
var headers = Statics.Http.DefaultRequestHeaders;
headers.Clear();
headers.Add("Authorization", $"Bearer {tokenInfo.access_token}");
headers.Add(API_VERSION_HEADER_NAME, API_VERSION_HEADER_VALUE);
var response = await Statics.Http.PutAsync($"{baseUri}{WebUtility.UrlEncode(fileSystemName)}{fullPath}?resource=directory", jsonContent);
return response.IsSuccessStatusCode;
}

view raw
CreateDirectory.cs
hosted with ❤ by GitHub

And creating file in it. Now, file creation (ingestion in Data Lake) is not that straight forward, at least, one can’t do that by a single call. We would have to first create an empty file, then we can write some content in it. We can also append content to an existing file. Finally, we would require to flush the buffer so the new content gets persisted.

Let’s do that, first we will see how to create an empty file:

public async Task<bool> CreateEmptyFileAsync(string fileSystemName, string path, string fileName)
{
var tokenInfo = await tokenProvider.GetAccessTokenV2EndpointAsync();
var jsonContent = new StringContent(string.Empty);
var headers = Statics.Http.DefaultRequestHeaders;
headers.Clear();
headers.Add("Authorization", $"Bearer {tokenInfo.access_token}");
headers.Add(API_VERSION_HEADER_NAME, API_VERSION_HEADER_VALUE);
var response = await Statics.Http.PutAsync($"{baseUri}{WebUtility.UrlEncode(fileSystemName)}{path}{fileName}?resource=file", jsonContent);
return response.IsSuccessStatusCode;
}

view raw
CreateEmptyFile.cs
hosted with ❤ by GitHub

The above snippet will create an empty file, now we will read all content from a local file (from PC) and write them into the empty file in Azure Data Lake that we just created.

public async Task<bool> CreateFileAsync(string filesystem, string path,
string fileName, Stream stream)
{
var operationResult = await this.CreateEmptyFileAsync(filesystem, path, fileName);
if (operationResult)
{
var tokenInfo = await tokenProvider.GetAccessTokenV2EndpointAsync();
var headers = Statics.Http.DefaultRequestHeaders;
headers.Clear();
headers.Add("Authorization", $"Bearer {tokenInfo.access_token}");
headers.Add(API_VERSION_HEADER_NAME, API_VERSION_HEADER_VALUE);
using (var streamContent = new StreamContent(stream))
{
var resourceUrl = $"{baseUri}{filesystem}{path}{fileName}?action=append&timeout={this.Timeout}&position=0";
var msg = new HttpRequestMessage(new HttpMethod("PATCH"), resourceUrl);
msg.Content = streamContent;
var response = await Statics.Http.SendAsync(msg);
//flush the buffer to commit the file
var flushUrl = $"{baseUri}{filesystem}{path}{fileName}?action=flush&timeout={this.Timeout}&position={msg.Content.Headers.ContentLength}";
var flushMsg = new HttpRequestMessage(new HttpMethod("PATCH"), flushUrl);
response = await Statics.Http.SendAsync(flushMsg);
return response.IsSuccessStatusCode;
}
}
return false;
}

view raw
CreateFile.cs
hosted with ❤ by GitHub

Right! Now time to set Access control to the directory or files inside a directory. Here’s the method that we will use to do that.

public async Task<bool> SetAccessControlAsync(string fileSystemName, string path, AclEntry[] acls)
{
var targetPath = $"{WebUtility.UrlEncode(fileSystemName)}{path}";
var tokenInfo = await tokenProvider.GetAccessTokenV2EndpointAsync();
var jsonContent = new StringContent(string.Empty);
var headers = Statics.Http.DefaultRequestHeaders;
headers.Clear();
headers.Add("Authorization", $"Bearer {tokenInfo.access_token}");
headers.Add(API_VERSION_HEADER_NAME, API_VERSION_HEADER_VALUE);
headers.Add(ACK_HEADER_NAME, string.Join(',', acls.Select(a => a.ToString()).ToArray()));
var response = await Statics.Http.PatchAsync($"{baseUri}{targetPath}?action=setAccessControl", jsonContent);
return response.IsSuccessStatusCode;
}

view raw
SetAcl.cs
hosted with ❤ by GitHub

The entire File system REST API class can be found here. Here’s an example how we can use this methods from a console application.

var tokenProvider = new OAuthTokenProvider(tenantId, clientId, secret, scope);
var hdfs = new FileSystemApi(storageAccountName, tokenProvider);
var response = hdfs.CreateFileSystemAsync(fileSystemName).Result;
hdfs.CreateDirectoryAsync(fileSystemName, "/demo").Wait();
hdfs.CreateEmptyFileAsync(fileSystemName, "/demo/", "example.txt").Wait();
var stream = new FileStream(@"C:\temp.txt", FileMode.Open, FileAccess.Read);
hdfs.CreateFileAsync(fileSystemName, "/demo/", "mytest.txt", stream).Wait();
var acls = new AclEntry[]
{
new AclEntry(
AclScope.Access,
AclType.Group,
"2dec2374-3c51-4743-b247-ad6f80ce4f0b",
(GrantType.Read | GrantType.Execute)),
new AclEntry(
AclScope.Access,
AclType.Group,
"62049695-0418-428e-a5e4-64600d6d68d8",
(GrantType.Read | GrantType.Write | GrantType.Execute)),
new AclEntry(
AclScope.Default,
AclType.Group,
"62049695-0418-428e-a5e4-64600d6d68d8",
(GrantType.Read | GrantType.Write | GrantType.Execute))
};
hdfs.SetAccessControlAsync(fileSystemName, "/", acls).Wait();

view raw
Console.cs
hosted with ❤ by GitHub

Conclusion

Until, there’s an Official Client Package released, if you’re into Azure Data Lake Store Gen 2 and wondering how to accomplish these REST calls – I hope this post helped you to move further!

Thanks for reading.