Elastic self-hosted pool for Azure DevOps

Introduction

If you are using Azure Pipelines, then you surely have used Microsoft-hosted agent. With Microsoft-hosted agents, maintenance and upgrades are taken care of for you. However, there are times when self-hosted agents are needed (i.e. customized images, network connectivity requirements etc.). Pipeline agents can be hosted as stand-alone, on Azure virtual machine scale-sets, as Docker containers. Container based agents are amazingly fast to spin up. This led many to run self-hosted agents on Kubernetes cluster. I am sure, many has done this before, but I didn’t find a complete solution in my search. Therefore, I have decided to build one.

Demo

Architecture

The architecture can be drawn as following:

Architecture diagram

There is a controller in a designated namespace that keeps an eye on to the agent pool in Azure DevOps and as soon as it sees new job requests queued, it spins up a container agent. It also listens for Kubernetes events when Pods are completed executing a pipeline-job, when such events raise, the controller cleans up the pod and unregister the agent from Azure DevOps. Azure DevOps unfortunately doesn’t have a service hook event, i.e. “Job queued” or such. Therefore, the controller uses REST API to look for incoming job requests. Because there is a latency involved into the process, the controller always keeps N number of “standby” agents on Kubernetes. The standby count you can configure as needed.

How to use

Installing the controller is straight-forward. the controller dynamically spins pods, deletes completed pods etc. Therefore, it requires cluster-role. It also uses a Custom Resource Definition to isolate the agent pod specifications from the controller.

Install controller

The following manifest will install all required CRDs, Cluster Role, Service Account and Cluster Role bindings in a separate namespace called “octolamp-system“.

kubectl apply -f https://raw.githubusercontent.com/cloudoven/azdo-k8s-agents/main/src/kubernetes/install.yaml
Configure agent namespace

You need to create a separate namespace where the Azure DevOps agent pods would be created and observed.

kubectl create namespace azdo-agents

Next, we need to create the container specification for the Azure DevOps Agents. This is the container image that Microsoft documented how to create them. I have created my own, however, you should create your own image and install the necessary tools according to your CI/CD needs. We will define these as a Custom Resource Definition. Let’s create a file named agent-spec.yaml with the following content:

apiVersion: "azdo.octolamp.nl/v1"
kind: AgentSpec
metadata:
  namespace: azdo-agents
  name: cloudovenagent
spec:
  prefix: "container-agent"
  image: moimhossain/azdo-agent-linux-x64:latest

The image field needs to point to the container image that you want to use as pipeline agent. Then apply this manifest to Kubernetes:

kubectl apply -f agent-spec.yaml

Next, we will deploy the controller with the details of Azure DevOps organization. Let’s create a file controller.yaml with following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: octolamp-agent-controller
  namespace: octolamp-system
spec:
  replicas: 1
  selector:
    matchLabels:
      run: octolamp-agent-controller
  template:
    metadata:
      labels:
        run: octolamp-agent-controller
    spec:
      serviceAccountName: octolamp-service-account
      containers:
      - env:
        - name: AZDO_ORG_URI
          value: https://dev.azure.com/<YOUR ORG NAME>
        - name: AZDO_TOKEN
          value: "YOUR PAT TOKEN"
        - name: AZDO_POOLNAME
          value: "<pool to run container agents>"
        - name: TARGET_NAMESPACE
          value: "azdo-agents"
        - name: STANDBY_AGENT_COUNT
          value: "2"
        - name: MAX_AGENT_COUNT
          value: "25"                              
        image: moimhossain/octolamp-agent-controller:v1.0.0-beta
        imagePullPolicy: Always
        name: octolamp-agent-controller
        resources:
          limits:
            cpu: 100m
            memory: 100Mi

This file needs to be updated according to your Azure DevOps organization URL, a personal access token, a pool that is exist in your Azure DevOps organization (create one from Organization Settings > pipeline settings > agent pools > Add pool).

Other properties taht you can configure, is to define a number for Standby agent count and maximum agents that it should create (limiting to a threshold).

Now, apply these changes to Kubernetes cluster.

kubectl apply -f controller.yaml

That’s it. At this point, you should see that container agents are spinning up and they will show up on Azure DevOps agent pool UI.

Elastic scale

Scale-out

I am using Azure Kubernetes service for this example, and AKS supports autoscaling feature for node-pools. That means, when the controller spins up too many agents that the AKS node pool doesn’t have capacity for, AKS will spin up new nodes into the pool which in-terns add capacity to the controller to spin up more agents (assuming you have lot of pipelines run in a brief time window).

Scale-down

The controller keeps track of pod completed events in Kubernetes and whenever a pod completes a pipeline run, it removes the pod from the cluster and unregisters it from the Azure DevOps agent pool list. Therefore, if there are no pipelines awaiting, the controller will scale down all the agents back to the standby count – within ~2 mins. Which will trigger AKS auto-scaler eventually and nodes will be scaled down with in ~10 mins.

Windows agents?

This article doesn’t demonstrate the windows-based agents. However, as you have seen, the controller allows you to change the agent image and image spec (with CRDs) – you should be able to make that work without much effort.

Conclusion

The entire code can be found in GitHub. This source code is MIT licensed, provided as-is (without any warranties) and you can use, modify without issues. However, I would appreciate if you acknowledged the author. Also, more than welcome to contribute directly to GitHub.

Great weekend!

Azure App Service with Front-door – how to fix outbound URLs?

https://docs.microsoft.com/en-us/azure/frontdoor/quickstart-create-front-doorThis article shows how to rewrite IIS URLs (outbound) with URL rewrite module to configure legacy asp.net web apps hosted on Azure App Service but safeguarded with a WAF (Front-door/Application Gateway). Setting up Azure Front-door or Azure Application Gateway are fairly straight forward process and well documented in Microsoft Azure Docs. That is beyond the scope of this repository. However, when you deploy a legacy asp.net application that uses either OAuth 2.0/Open ID connect based authentications or OWIN based authentication middleware you often run into a broken AuthN flow because the application might not care about the front-door/gateway host-headers and produces a redirect (302 permanent) to a path that constructed based on App service URI’s – as oppose to the front-door/gateway URI.

Once can of course fix this by changing the code – but that sometimes might not be practical/possible. Other alternative approach (which is described below) is to catch one or more specific outbound URIs (the redirect flows) in a web.config file and use the IIS URL rewrite module to overwrite them accordingly. That would require changes in web.config but not the source codes. Which is rather cleaner approach as you can do that even from the source control management (KUDO) sites.

Why Outbound rules in URL rewrite?

If we want to make any changes in the response headers or content after the processing has been completed by the specific handler or execution engine before sending it to the client, we can use outbound rules.

How we can do this?

Here’s an example web.config file that shows some outbound rules to later HTTP headers in outbound responses.

      <!- Creating Rewrite rules -->
      <rewrite>
        <outboundRules>          
          <!-- The below rule captures a 302 (redirect) response 
               with 'Location' response header contains 
               an outbound URL (coming from the web app) 
               that has 'signin-oidc' in the path.  
               When there are 'signin-oidc' present into the path, it 
               will match the regular expression and rewrite the Location 
               header with the hostname that comes from 
               your front-door/application gateway URL. The notion {R:2} preserves 
               any following query parameters or sub path that was 
               present in the original URL -->
          <rule name="changeURI" enabled="true">
            <match 
                serverVariable="RESPONSE_Location" 
                pattern="^(.*)/signin-oidc(.+)" 
                ignoreCase="true" />
            <action type="Rewrite" 
                value="https://my-waf.azureafd.net/signin-oidc{R:2}" />
          </rule>          
        </outboundRules>
      </rewrite>   

Explanation

The above rule captures a 302 (redirect) response with Location response header contains an outbound URL (coming from the web app) that has signin-oidc in the path. When there are signin-oidc present into the path, it will match the regular expression and rewrite the Location header with the hostname that comes from your front-door/application gateway URL (i.e. https://my-waf.azurefd.net). The notion {R:2} preserves any following query parameters or sub path that was present in the original URL.

In order to understand the {R:2} syntax in depth, please read the back-references in Microsoft documentation.

The important bit from the document is quoted below:

Usage of back-references is the same regardless of which pattern syntax was used to capture them. Back-references can be used in the following locations within rewrite rules:

  • In condition input strings
  • In rule actions, specifically:
    • url attribute of Rewrite and Redirect action
    • statusLine and responseLine of a CustomResponse action
  • In a key parameter to the rewrite map

Back-references to condition patterns are identified by {C:N} where N is from 0 to 9. Back-references to rule patterns are identified by {R:N} where N is from 0 to 9. Note that for both types of back-references, {R:0} and {C:0}, will contain the matched string.

For example, in this pattern:

^(www\.)(.*)$

For the string: http://www.foo.com the back-references will be indexed as follows:

{C:0} - www.foo.com
{C:1} - www.
{C:2} - foo.com

Within a rule action, you can use the back-references to the rule pattern and to the last matched condition of that rule. Within a condition input string, you can use the back-references to the rule pattern and to the previously matched condition.

The following rule example demonstrates how back-references are created and referenced:

<rule name="Rewrite subdomain">
 <match url="^(.+)" /> <!-- rule back-reference is captured here -->
 <conditions>
  <!-- condition back-reference is captured here -->
  <add input="{HTTP_HOST}" type="Pattern" pattern="^([^.]+)\.mysite\.com$" /> 
 </conditions>
 <!-- rewrite action uses back-references to condition and 
      to rule when rewriting the url -->
 <action type="Rewrite" url="{C:1}/{R:1}" /> 
</rule>

How to create and test these pattern (with RegEx)?

Check out this Microsoft Documentation how to use the Test pattern tool that comes with IIS installation.

How to create and test patterns

That’s about it. You can find an example web.config file (with complete configuration) in this GitHub repository.

Bridge to Kubernetes – be confident on shipping software

Bridge to Kubernetes is a successor of Azure Dev Space. Distributed software’s are comprised of more than one services (often referred as micro-services), they depend on each other (one service invoking APIs of another service) to deliver capabilities to end users. While separations of services bring flexibility in delivering features (or bug fixes) faster, it adds complexity into the system as well as makes the developers workflow (inner loop) difficult. Imagine, a product has three services, backend (running a database for instance), an API service (middleware – that talks to the backend service) and frontend (that servers user interfaces to the end users and invokes API services). Running these services in a Kubernetes cluster means, three different deployments, corresponding services and possibly an ingress object. When we think the developer workflow for API service, we immediately see the complexity, as the developers of APIs now need to run a local version of the backend service and the frontend service to issue a request to debug or test their APIs. While the API developers might not fully aware how-to setup the backend service on their machine – as that’s built by a separate team. They now either must fake that service (with proxy/stubs) or learn all the details how to run backend service on their development workstation. This is cumbersome and still doesn’t guarantee that their API service will behave exactly as expected when they are running it in Test or production environment. Which leads to run an acceptance checks after deployment and increases lead time. This also makes it complicated to issue reproductions in local machine for troubleshooting purposes.

This is where Bridge to Kubernetes (previously Azure Dev Space) comes to rescue. Bridge to Kubernetes can connect development workstation to Kubernetes cluster, it eliminates the need to manually source, configure and compile external dependencies on development workstation. Environment variables, connection strings and volumes from the cluster are inherited and available to a microservice code running locally.

Setting up the environment

Let’s create a simple scenario – we will create 3 .net core API applications, namely, backend, API, and frontend. These apps do the minimum on purpose – to make really emphasize the specifics of the Bridge to Kubernetes (rather distracting with lot of convoluted feature codes). The backend looks like following:

app.UseEndpoints(endpoints =>
{
    endpoints.MapGet("/", async context =>
    {
        await context.Response.WriteAsync("Hello from Backend!");
    });
});

Basically, backend app exposes single route, and any request comes to that served with a greeting. Next, we will look at the API app (the middleware that consumes backend service):

app.UseEndpoints(endpoints =>
{
    endpoints.MapGet("/", async context =>
    {
        using var client = new System.Net.Http.HttpClient();
        var request = new System.Net.Http.HttpRequestMessage
        {
            RequestUri = new Uri("http://backend/")
        };
        var header = "kubernetes-route-as";
        if (context.Request.Headers.ContainsKey(header))
        {
            request.Headers.Add(header, context.Request.Headers[header] as IEnumerable<string>);
        }
        var response = await client.SendAsync(request);
        await context.Response.WriteAsync($"API bits {await response.Content.ReadAsStringAsync()}");

    });
});

Important to notice that we are checking for any headers with “kubernetes-route-as” key, when they are present, we are propagating that to the upstream invocations. This would be required later when we will see the Bridge to Kubernetes in action.

Lastly, we have our frontend service, invoking API service (as API did to backend):

app.UseEndpoints(endpoints =>
{
    endpoints.MapGet("/", async context =>
    {
        using var client = new System.Net.Http.HttpClient();
        var request = new System.Net.Http.HttpRequestMessage
        {
            RequestUri = new Uri("http://api/")
        };
        var header = "kubernetes-route-as";
        if (context.Request.Headers.ContainsKey(header))
        {
            request.Headers.Add(header, context.Request.Headers[header] as IEnumerable<string>);
        }
        await context.Response.WriteAsync($"Front End Bit --> {await response.Content.ReadAsStringAsync()}");

    });
}

Now we will build all these services and deploy them into the cluster (Azure Kubernetes Service). To keep the simple and easy to follow, we will deploy them using manifest files (as opposed to Helm charts).

The backend manifest looks following:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: b2kapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: private-registry/backend:beta
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: backend
  namespace: b2kapp
spec:
  type: ClusterIP
  ports:
  - port: 80
  selector:
    app: backend

The manifest for API looks almost identical to above:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  namespace: b2kapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: moimhossain/api:beta
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: api
  namespace: b2kapp
spec:
  type: ClusterIP
  ports:
  - port: 80
  selector:
    app: api

Finally, the manifest for frontend service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  namespace: b2kapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: moimhossain/frontend:beta1
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: frontend
  namespace: b2kapp
spec:
  type: ClusterIP
  ports:
  - port: 80
  selector:
    app: frontend
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: frontend-ingress
  namespace: b2kapp
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    nginx.ingress.kubernetes.io/rewrite-target: /$1
spec:
  rules:
  - host: octo-lamp.nl
    http:
      paths:
      - backend:
          serviceName: frontend
          servicePort: 80
        path: /(.*)

You notice we added an ingress resource for the frontend only – and on a specific host name. I have used nginx ingress for the cluster and mapped the external IP of the ingress controller to an Azure DNS.

Applying all these manifests will deliver the application on http://octo-lamp.nl

Debugging scenario

Now that we have the application running, let’s say we want to debug the middleware service – API. Bridge to Kubernetes is a hybrid solution, that requires installation on Kubernetes cluster and on our local machine. In the cluster, we will install only the routing manager component from Bridge to Kubernetes. Using the following command:

kubectl apply -n b2kapp -f https://raw.githubusercontent.com/microsoft/mindaro/master/routingmanager.yml

At this point, if we see the pods into the namespace b2kapp we should see the following pods:

K9S view to the AKS cluster

To debug the api service locally, we would require installing the bridge to Kubernetes extension for Visual studio or VS Code (whichever you prefer to use)- I will be using visual studio in this case. Open the API project in Visual studio and you will notice there is a new launch profile – bridge to Kubernetes. Select that profile and hit F5. You will be asked to configure the bridge to Kubernetes:

We will select the correct namespace and service (in this case API) to debug. One important option here to select the routing isolation mode. If checked, B2K will offer a dynamic sub-route (with URL) that we can navigate to route traffic only coming via that sub-route specific URLs – this leave the regular traffic uninterrupted while we are debugging. once you press Ok, B2K will setup the cluster with few envoy proxies to route traffics to our local machine and hit any debug points that we have set.

The routing magic is done by two processes running in local machine in the background.

The DSC.exe is that process that dynamically allocate ports in local machine and use Kubernetes port forwarding to bind those ports to an agent running in Kubernetes – that is how the traffics are forwarded from the cloud to our local machine.

One thing to point out, that we are not building any docker images or running docker container during the debugging – it’s all happening on bare metal local machine (very typical way of debugging .net apps or node apps). This brings fast setup and a lightweight way to debug a service.

The other process is EndpointManager.exe – this is the process that requires elevated permissions because it modifies the hosts on local machine. Which in turn, allows API app to resolve a non-existent backend URI (http://backend) on local machine and manage to route that traffic back to the cluster where the service is running. If you open the C:\Windows\System32\drivers\etc\host file while running the debugger you will see these changes:

# Added by Bridge To Kubernetes
127.1.1.8 frontend frontend.b2kapp frontend.b2kapp.svc frontend.b2kapp.svc.cluster.local
127.1.1.7 backend backend.b2kapp backend.b2kapp.svc backend.b2kapp.svc.cluster.local
127.1.1.6 api api.b2kapp api.b2kapp.svc api.b2kapp.svc.cluster.local
# End of section

Running pull request workflow

One can also run a Pull Request workflow using the capability of Bridge to Kubernetes. that allows a team to deploy a feature that is in a feature branch (not yet merged to the release/master/main branch) and deploy that in Kubernetes using the isolation mode. That way, you can test a single service with new features (or bug fixes) by visiting it through the sub-domain URI and test the feature how that behaves in cluster. Of course, all the dependent services are real instances of service running into the cluster. This really can boost the confidence of releasing either a feature or bug fixes for any DevOps teams.

The way you do that, is to deploy a clone of the service (API service for this example) and PODs with some specific labels and annotations. Let’s say I have a manifest for API service – specifically written for PR flow, that would look like below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-PRBRANCENAME
  namespace:  b2kapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api-PRBRANCENAME
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  minReadySeconds: 5 
  template:
    metadata:
      annotations:
        routing.visualstudio.io/route-on-header: kubernetes-route-as=PRBRANCENAME
      labels:
        app: api-PRBRANCENAME
        routing.visualstudio.io/route-from: api
    spec:
      nodeSelector:
        "beta.kubernetes.io/os": linux
      containers:
      - name: api
        image: DOCKER_REGISTRY_NAME/b2k8s-api:DOCKER_IMAGE_VERSION
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: api-PRBRANCENAME
  namespace: b2kapp
spec:
  type: ClusterIP
  ports:
  - port: 80
  selector:
    app: api-PRBRANCENAME

All I need to do in the Pipeline that builds the PR, is to come up with a branch name (typically branch names are provided in tools like Jenkins or Azure DevOps in environment variables) and replace the word PRBRANCENAME with the branch name. then simply apply the manifest to the same namespace. Once you do that, the routing manager does the following:

  • Duplicates all ingresses (including load balancer ingresses) found in the namespace using the PRBRANCENAME for the subdomain.
  • Creates an envoy pod for each service associated with duplicated ingresses with the PRBRANCENAME subdomain.
  • Creates an additional envoy pod for the service you are working on in isolation. This allows requests with the subdomain to be routed to your development computer.
  • Configures routing rules for each envoy pod to handle routing for services with the subdomain.

Therefore, if we now visit the PRBRANCENAME.octo-lamp.nl – we will see that requests are trafficked through the newly deploy API service (where the features are built) and the rest of the traffics remains unchanged. A great way to build release confidences.

Conclusion

That’s all for today. I seriously think it’s a neat approach to build confidence in any DevOps teams that runs services on Kubernetes.

Thanks for reading!

How to use ADFS/SAML2.0 as Identity provider with Azure AD B2C

Azure Active Directory B2C (Azure AD B2C) provides support for the SAML 2.0 identity provider. With this capability, you can create a technical profile in Azure AD B2C to federate with SAML-based identity provider, such as ADFS. Thus, allow users to sign in with their existing enterprise identities. Microsoft has good docs on this topic, however, there are few steps that are currently not adequately documented and might lead to errors, unless you’re an identity pro.

In this blog post, I will write down the steps you need to take to setup ADFS federation with Azure AD B2C and the steps that you should watch out.

Setting up ADFS in Azure VM

We would need an ADFS server for this exercise, and we will spin up a virtual machine in Azure with ADFS installed. This article explains the steps you need to standup a virtual machine with ADFS in it. However, this article would use a self-signed certificate in ADFS configuration and Azure AD B2C won’t appreciate that. Azure AD B2C will only accept certificates from well known certificate authorities (CA).

Creating certificate

We will use a free certificate from Let’s Encrypt. I have used win-acme for that. Download win-acme from the site and lunch it in a windows machine. Crucial step is when you provide the Domain name for the certificate, you need to make sure, you have provided the DNS name of your virtual machine here.

One problem though, you won’t be able to export the certificate with private key – which is required by ADFS. You need additional steps to export certificate with private keys and here’s an article that shows how to do that. With the certificate (with private keys) you can complete the ADFS setup as described in this article.
Once created, you might want to check the federation setup with a simple ASP.net MVC app – just to make sure, users defined in ADDS are able to Sign-in with your ADFS. I have a simple application in GitHub that allows you to do that. Just change the web.config file with the domain name of your ADFS.

<appSettings>
	<add key="ida:ADFSMetadata" value="https://woodbine.westeurope.cloudapp.azure.com/FederationMetadata/2007-06/FederationMetadata.xml" />
	<add key="ida:Wtrealm" value="https://localhost:44380/" />
</appSettings>

You would also need to create a relying party trust in ADFS for this app. Make sure you have the following LDAP attributes mapped as outgoing claims in Claim Rules for your relying party trust configuration.

Creating Azure AD B2C Tenant

Create a tenant for Azure AD B2C from Azure Portal as described in here.

Creating certificate

You would need one more certificate that Azure AD B2C would use to sign the SAML requests sent to ADFS (SAML identity provider). For production scenarios, you should use a proper CA certificate, but for non-production scenarios, a self-signed certificate will suffice. You can generate one in PowerShell like below:

$tenantName = "<YOUR TENANT NAME>.onmicrosoft.com"
$pwdText = "<YOUR PASSWORD HERE>"

$Cert = New-SelfSignedCertificate -CertStoreLocation Cert:\CurrentUser\My -DnsName "$tenantName" -Subject "B2C SAML Signing Cert" -HashAlgorithm SHA256 -KeySpec Signature -KeyLength 2048
$pwd = ConvertTo-SecureString -String $pwdText -Force -AsPlainText
Export-PfxCertificate -Cert $Cert -FilePath .\B2CSigningCert.pfx -Password $pwd

Next, you need to upload the certificate in Azure AD B2C. In Azure portal go to Azure AD B2C, then Identity Experience Framework tab from the left, then Policy Keys (also from the left menu) and add a new policy key. Choose upload options and provide the PFX file generated earlier. Provide the name of the policy key ‘ADFSSamlCert’ (Azure AD B2C will append a suffix to that on save, which will look like B2C_1A_ADFSSamlCert).

Add signing and encryption keys

Follow Microsoft Documents to accomplish the following:

Create signing key.
Create encryption key.
Register the IdentityExperienceFramework application.
Register the ProxyIdentityExperienceFramework application.

Creating custom policy for ADFS

Microsoft has custom policy starter pack – that is super handy. However, you can also find the custom policy files that are customized with the technical profile for ADFS and also defined the user journeys in them in my GitHub repository. You can download the files from GitHub and search for the word woodbineb2c (the Azure AD B2C that I have used) and replace with your Azure AD B2C tenant name. Next, search for the word woodbine.westeurope.cloudapp.azure.com (that’s Azure VM where I have installed ADFS) and replace it with your ADFS FQDN.
Now, upload these XML files (6 files in total) in Azure AD B2C’s Identity Experience Framework’s Custom Policy blade. Upload them in following order:

1. TrustFrameworkBase.xml
2. TrustFrameworkExtensions.xml
3. SignUpOrSignInADFS.xml
4. SignUpOrSignin.xml
5. ProfileEdit.xml
6. PasswordReset.xml

Configure an AD FS relying party trust

To use AD FS as an identity provider in Azure AD B2C, you need to create an AD FS Relying Party Trust with the Azure AD B2C SAML metadata. The SAML metadata of Azure AD B2C technical profile typically looks like below:

https://your-tenant-name.b2clogin.com/your-tenant-name.onmicrosoft.com/your-policy/samlp/metadata?idptp=your-technical-profile

Replace the following values:
your-tenant with your tenant name, such as your-tenant.onmicrosoft.com.
your-policy with your policy name. For example, B2C_1A_signup_signin_adfs.
your-technical-profile with the name of your SAML identity provider technical profile. For example, Contoso-SAML2.

Follow this Microsoft Document to configure the relying party trust. However, when define the claim rules, map the DisplayName attribute to display_name or name (with lower-case) in outgoing claim.   

Request Signing algorithm

You would need to make sure that the SAML requests sent by Azure AD B2C is signed with the expected signature algorithm configured in AD FS. By default, Azure AD B2C seems signing requests with rsa-sha1 – therefore, make sure in AD FS relying party trust property you have selected the same algorithm.

Enforce signatures for SAML assertions in ADFS

Lastly, ADFS by default won’t sign the assertions it sends to Azure AD B2C and Azure AD B2C will throw errors when that’s the case. You can resolve it by forcing ADFS to put signature on both message and assertions by running the following powershell command in ADFS server:

Set-AdfsRelyingPartyTrust -TargetName <RP Name> -SamlResponseSignature MessageAndAssertion

Create Application to test

I have a node application that uses MSAL 2.0 to test this setup. I have customized the sample from a Microsoft QuickStart sample. You can find the source code in here. You need to modify the B2C endpoints before you launch it. Here’s the instructions how to modify these endpoints.

Conclusion

There are quite some steps to set things up to get the correct behavior. I hope this would help you – if you are trying to setup AD FS as Identity Provider to your Azure AD B2C tenant.

Thank you!

Azure Resource Governance with Template Specs & Biceps

All the example codes are available in GitHub.

Background

Governance of cloud estates is challenging for businesses. It’s crucial to enforce security policies, workload redundancies, uniformity (such as naming conventions), simplify deployments with packaged artifacts (i.e., ARM templates), Azure role-based access control (Azure RBAC) across the enterprise.

Generally, the idea is, a centralized team (sometimes referred as platform team) builds and publishes Infrastructure-as-code artifacts and number of product development teams consume them, only providing their own parameters.

Azure offers native capabilities like Azure Policy, Blueprints and Management groups to address this problem. But there are wide range of external solutions (Terraform, Pulumi etc.) available too.

One attribute of Terraform, strikes me a lot is the ability to store a versioned module in a registry and consume it from the registry. The same principle that familiar to engineers and widely used in programming languages – such as NuGet for .net, Maven for Java, npm for Node

With ARM templates it’s rather unpleasant. If you currently have your templates in an Azure repo, GitHub repo or storage account, you run into several challenges when trying to share and use the templates. For a user to deploy it, the template must either be local or the URL for the template must be publicly accessible. To get around this limitation, you might share copies of the template with users who need to deploy it, or open access to the repo or storage account. When users own local copies of a template, these copies can eventually diverge from the original template. When you make a repo or storage account publicly accessible, you may allow unintended users to access the template.

Azure Resource Manager – Template Spec

Microsoft delivered some cool new features for Resource manager templates recently. One of these features, named as Template Spec. Template Spec is a first-class Azure Resource type, but it really is just a regular ARM template. Best part is that you can version it, persist it in Azure – just like a Terraform registry, share it across the organization with RBAC and consume them from repository.

Template Specs is currently in preview. To use it, you must install the latest version of PowerShell or Azure CLI. For Azure PowerShell, use version 5.0.0 or later. For Azure CLI, use version 2.14.2 or later.

The benefit of using template specs is that you can create canonical templates and share them with teams in your organization. The template specs are secure because they’re available to Azure Resource Manager for deployment, but not accessible to users without Azure RBAC permission. Users only need read access to the template spec to deploy its template, so you can share the template without allowing others to modify it.

The templates you include in a template spec should be verified by the platform team (or administrators) in your organization to follow the organization’s requirements and guidance.

How template spec works?

If you are familiar with ARM template, template specs are not new to you. They are just typical ARM templates and stored in Azure Resource group as “template spec” with a version number. That means, you can take any ARM template (the template JSON file only – without any parameter files) and publish it as template spec, using PowerShell, Azure CLI or REST API.

Publishing Template Spec

Let’s say I have a template that defines just an Application Insight component.

{
    "contentVersion": "1.0.0.0",
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "parameters": {
        "appInsights": { "type": "string" },
        "location": { "type": "string" }
    },
    "resources": [{
            "type": "microsoft.insights/components",
            "apiVersion": "2020-02-02-preview",
            "name": "[parameters('appInsights')]",
            "location": "[parameters('location')]",
            "properties": {
                "ApplicationId": "[parameters('appInsights')]",
                "Application_Type": "web"
            }
        }
    ],
    "outputs": {
        "instrumentationKey": {
            "type": "string",
            "value": "[reference(parameters('appInsights')).InstrumentationKey]"
        }
    }
}

We can now publish this as “template spec” using Azure CLI:

az ts create \
    --name "cloudoven-appInsights" \
    --version $VERSION \
    --resource-group $RESOURCE_GROUP \
    --location $LOCATION \
    --template-file "component.json" \
    --yes --query 'id' -o json

Once published, you can see it in Azure Portal – appeared as a new resource type Microsoft.Resources/templateSpecs.

Consuming Template Spec

Every published Template Spec has a unique ID. To consume a Template Spec, all you need is the ID of the Template Spec. You can retrieve the ID in Azure CLI:

APPINS_TSID=$(az ts show --resource-group $TSRGP --name $TSNAME --version $VERSION --query 'id' -o json)
echo Template Spec ID: $APPINS_TSID

You can now, deploy Azure Resources with the Template Spec ID and optionally your own parameters, like following:

az deployment group create \
  --resource-group $RESOURCEGROUP \
  --template-spec $APPINS_TSID \
  --parameters "parameters.json

Linked Templates & Modularizations

Overtime, Infrastructure-as-code tends to become big monolithic file containing numerous resources. ARM templates (thanks to all it’s verbosity) specially known for growing big fast and becomes difficult to comprehend by reading a large JSON file. You could address the issue before by using linked templates. However, with a caveat that linked templates needed to be accessible via an URL in the public internet – far from ideal.

Good news is Template Spec got this covered. If the main template for your template spec references linked templates, the PowerShell and CLI commands can automatically find and package the linked templates from your local drive.

Example

Here I have an example ARM template that defines multiple resources (Application Insights, Server farm and a web app) in small files and finally creating a main template that brings everything together. One can then publish the main template as Template Spec – hence any consumer can provision their web app just by pointing to the ID of the template spec. Here’s the interesting bit of the main template:

"resources": [
        {
            "type": "Microsoft.Resources/deployments",
            "apiVersion": "2020-06-01",
            "name": "DeployAppInsights",
            "properties": {
                "mode": "Incremental",
                "parameters": {
                    "appInsights": { "value": "[parameters('appInsights')]"},
                    "location": {"value": "[parameters('location')]"}
                },
                "templateLink": {                    
                    "relativePath": "../appInsights/component.json"
                }
            }
        },
        {
            "type": "Microsoft.Resources/deployments",
            "apiVersion": "2020-06-01",
            "name": "DeployHostingplan",
            "properties": {
                "mode": "Incremental",      
                "templateLink": {
                    "relativePath": "../server-farm/component.json"
                }
            }
        },
        {
            "type": "Microsoft.Resources/deployments",
            "apiVersion": "2020-06-01",
            "name": "DeployWebApp",
            "dependsOn": [ "DeployHostingplan" ],
            "properties": {
                "mode": "Incremental",             
                "templateLink": {
                    "relativePath": "../web-app/component.json"
                }
            }
        }
    ]

You see, Template Specs natively offers the modularity, centralized registry, however, they are still ARM JSON files. One common criticism of ARM template is it’s too verbose and JSON are not particularly famous for readability.

Microsoft is aiming to address these concerns with a new Domain Specific Language (DSL) that named a Azure Bicep.  

What is Bicep?

Bicep aims to drastically simplify the authoring experience with a cleaner syntax and better support for modularity and code re-use. Bicep is a transparent abstraction over ARM and ARM templates, which means anything that can be done in an ARM Template can be done in bicep (outside of temporary known limitations).

If we take the same Application Insight component (above) and re-write it in Bicep, it looks following:

param appInsights string
param location string = resourceGroup().location
resource appIns 'Microsoft.Insights/components@2020-02-02-preview' = {
  name: appInsights
  location: location
  kind: appInsights
  properties: {
    Application_Type: 'web'
  }
}
output InstrumentationKey string = appIns.properties.InstrumentationKey

Very clean and concise compared to the ARM JSON version of it. If you are coming from Terraform, you might already find yourself at home – because Bicep took lot of inspiration from Terraform HCL (HashiCorp Language). You save bicep scripts with .bicep file extensions.

Important things to understand, Bicep is a client-side language layer sits on top of ARM json. The idea is you write it in Bicep then compile the script using a Bicep compiler (or Transpiler) to produce ARM JSON as compiled artifact and you still deploy ARM template (JSON) to Azure. Here’s how you compile a bicep file to produce the ARM JSON:

bicep build ./main.bicep

Bicep currently is in experimental state, and not recommended to use in production.

Creating Template Spec in Bicep

The above Example – you’ve seen how you could create a Template Spec that is quite modularized into linked templates. Let’s rewrite that in Bicep and see how clean and simple it looks:

param webAppName string = ''
param appInsights string = ''
param location string = resourceGroup().location
param hostingPlanName string = ''
param containerSpec string =''
param costCenter string
param environment string

module appInsightsDeployment '../appinsights/component.bicep' = {
  name: 'appInsightsDeployment'
  params:{
    appInsights: '${appInsights}'
    location: '${location}'
    costCenter: costCenter
    environment: environment
  }
}

module deployHostingplan '../server-farm/component.bicep' = {
  name: 'deployHostingplan'
  params:{
    hostingPlanName:  '${hostingPlanName}'
    location: '${location}'
    costCenter: costCenter
    environment: environment    
  }   
}

module deployWebApp '../web-app/component.bicep' = {
  name: 'deployWebApp'
  params:{
    location: '${location}'
    webAppName: '${webAppName}'
    instrumentationKey: appInsightsDeployment.outputs.InstrumentationKey
    serverFarmId: deployHostingplan.outputs.hostingPlanId
    containerSpec: '${containerSpec}'
    costCenter: costCenter
    environment: environment    
  }   
}

Notice, Bicep came up with a nice keyword module to address the linked template scenario. Bicep module is an opaque set of one or more resources to be deployed together. It only exposes parameters and outputs as contract to other Bicep files, hiding details on how internal resources are defined. This allows you to abstract away complex details of the raw resource declaration from the end user who now only needs to be concerned about the module contract. Parameters and outputs are optional.

CI/CD – GitHub Action

Let’s create a delivery pipeline in GitHub action to compile our Bicep file and publish them as Template Spec in Azure. Following GitHub workflow, installs Bicep tools, compiles the scripts, and finally publishes them as Template Spec in Azure. You can see the complete repository in GitHub.

jobs:  
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Install Bicep CLI
        working-directory: ./src/template-spec-bicep/paas-components
        run: |
          chmod +x ./install-bicep.sh
          ./install-bicep.sh
      - name: Compile Bicep Scripts
        working-directory: ./src/template-spec-bicep/paas-components
        run: |
          chmod +x ./build-templates.sh
          ./build-templates.sh
      - name: Azure Login
        uses: Azure/login@v1.1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
      - name: Deploy Template Specs to Azure
        working-directory: ./src/template-spec-bicep/paas-components        
        run: |          
          chmod +x ./deploy-templates.sh
          ./deploy-templates.sh

Consuming the template spec doesn’t change disregarding the choice you make with Bicep or ARM template. Consumers just create a parameter file and deploy resource only specifying the Template Spec ID. You can see an example of Consumer workflow (also GitHub action) here.

Conclusion

I am excited how the Azure team is offering new tools like Bicep, Template Specs to simplify the cloud governance, self-service areas. It’s important to understand that Bicep team is not competing with available tools in the space (like Terraform etc.) rather offering more options to folks in their cloud journey.

Bicep is in experimental phase now and Template Specs are not in preview, therefore, don’t use them in production just yet.   

Azure DevOps Security & Permissions REST API

Every Few months I notice the following Saga repeats. I face a challenge where I need to programmatically manage security aspects of Azure DevOps resources (like Repository, Pipeline, Environment etc.). I do lookup the Azure DevOps REST API documentation, realize that the Permissions & Security API’s are notoriously complicated and inadequately documented. So, I begin with F12 to kick off the Development tools for Browser and intercepting HTTP requests. Trying to guess what’s payloads are exchanged and try to come up with appropriate HTTP requests myself. However strange it might sound, usually this method works for me (actually worked almost all the time). But it’s a painful and time-consuming process. Recently I had to go through this process one more time and I promised to myself that once I am done, I will write a Blog post about it and put the code in a GitHub repository – so next time I will save myself some time & pain. That’s exactly what this post is all about.

Security & Permission REST API

As I have said, the security REST API is complicated and inadequately documented. Typically, each family of resources (work items, Git repositories, etc.) is secured using a different namespace. The first challenge is to find out the namespace IDs.

Then each security namespace contains zero or more access control lists (aka. ACLs). Each access control list contains a token, an inherit flag and a set of zero or more access control entries. Each access control entry contains an identity descriptor, an allowed permissions bitmask, and a denied permissions bitmask.

Tokens are arbitrary strings representing resources in Azure DevOps. Token format differs per resource type; however, hierarchy and separator characters are common between all tokens. Now, where do you find these tokens format? Well, I mostly find them by intercepting the Browser HTTP payloads. To save me from future efforts, I have created a .net Object model around the security namespace IDs, permissions and tokens – so when I consume those libraries, I can ignore these lower-level elements and have a higher order APIs to manage permissions. You can investigate the GitHub repository to learn about it. However, just to make it more fun to use, I have spent a bit time to create a Manifest file (Yes, stolen from Kubernetes world) where I can get my future job done only by writing YAML files – as oppose to .net/C# codes.

Instructions to use

The repository contains two projects (once is a Library – produced a DLL and another is the console executable application) and the console executable is named as azdoctl.exe.

The idea is to create a manifest file (yaml format) and apply the changes via the azdoctl.exe:

> azdoctl apply -f manifest.yaml
Manifest file

You need to create a manifest file to describe your Azure DevOps project and permissions. The format of the manifest file is in yaml (and idea is borrowed from Kubernetes manifest files.)

Schema

Here’s the schema of the manifest file:

apiVersion: apps/v1
kind: Project
metadata:
  name: Bi-Team-Project
  description: Project for BI Engineering team
template:
  name: Agile
  sourceControlType: Git

Manifest file starts with the team project name and description. Each manifest file can have only one team project definition.

Teams

Next, we can define teams for the project with following yaml block:

teams:
  - name: Bi-Core-Team
    description: The core team that run BI projects
    admins:
      - name: Timothy Green
        id: 4ae3c851-6ef3-4748-bef9-4f809736d538
      - name: Linda
        id: 9c5918c7-ef03-4059-a49e-aa6e6d761423
    membership:
      groups:
        - name: 'UX Specialists'
          id: a2931c86-e975-4220-aa89-dc3f952290f4
      users:
        - name: Timothy Green
          id: 4ae3c851-6ef3-4748-bef9-4f809736d538
        - name: Linda
          id: 9c5918c7-ef03-4059-a49e-aa6e6d761423

Here we can create teams and assign admins and members to them. All the references (name and ids) must be valid in Azure Active Directory. Ids are Object ID for group or users in Azure Active directory.

Repository

Next, we can define the repository – that must be created and assigned permissions to.

repositories:
  - name: Sample-Git-Repository
    permissions:
      - group: 'Data-Scientists'
        origin: aad
        allowed:
          - GenericRead
          - GenericContribute
          - CreateBranch
          - PullRequestContribute
      - group: 'BI-Scrum-masters'
        origin: aad
        allowed:
          - GenericRead
          - GenericContribute
          - CreateBranch
          - PullRequestContribute
          - PolicyExempt

Again, you can apply an Azure AD group with very fine-grained permissions to each repository that you want to create.

List of all the allowed permissions:

        Administer,
        GenericRead,
        GenericContribute,
        ForcePush,
        CreateBranch,
        CreateTag, 
        ManageNote,    
        PolicyExempt,   
        CreateRepository, 
        DeleteRepository,
        RenameRepository,
        EditPolicies,
        RemoveOthersLocks,
        ManagePermissions,
        PullRequestContribute,
        PullRequestBypassPolicy

Environment

You can create environments and assign permissions to them with following yaml block.

environments:
  - name: Development-Environment
    description: 'Deployment environment for Developers'
    permissions:
      - group: 'Bi-Developers'
        origin: aad
        roles: 
          - Administrator
  - name: Production-Environment
    description: 'Deployment environment for Production'
    permissions:
      - group: 'Bi-Developers'
        origin: aad
        roles: 
          - User        

Build and Release (pipeline) folders

You can also create Folders for build and release pipelines and apply specific permission during bootstrap. That way teams can have fine grained permissions into these folders.

Build Pipeline Folders

Here’s the snippet for creating build folders.

buildFolders:
  - path: '/Bi-Application-Builds'
    permissions:
      - group: 'Bi-Developers'
        origin: aad
        allowed:
          - ViewBuilds
          - QueueBuilds
          - StopBuilds
          - ViewBuildDefinition
          - EditBuildDefinition
          - DeleteBuilds

And, for the release pipelines:

releaseFolders:
  - path: '/Bi-Application-Relases'
    permissions:
      - group: 'Bi-Developers'
        origin: aad
        allowed:
          - ViewReleaseDefinition
          - EditReleaseDefinition
          - ViewReleases
          - CreateReleases
          - EditReleaseEnvironment
          - DeleteReleaseEnvironment
          - ManageDeployments

Once you have the yaml file defined, you can apply it as described above.

Conclusion

That’s it for today. By the way,

The code is provided as-is, with MIT license. You can use it, replicate it, modify it as much as you wish. I would appreciate if you acknowledged the usefulness, but that’s not enforced. You are free to use it anyway you want.

And, that also means, the author is not taking any responsibility to provide any guarantee or such.

Thanks!

Manage Kubernetes running anywhere via Azure Arc

Azure Arc (currently in preview) allows attach and configure Kubernetes Clusters running anywhere (inside or outside of Azure). Once connected the clusters shows up in Azure portal and allows applying tags, policies like other resources. This brings simplicity and uniformity managing both cloud and on-premises resources in a single management pane (Azure Portal).

Azure Arc enabled Kubernetes is in preview. It’s NOT recommended for production workloads.

Following are the key scenarios where Azure Arc adds value:

  • Connect Kubernetes running outside of Azure for inventory, grouping, and tagging.
  • Apply policies by using Azure Policy for Kubernetes.
  • Deploy applications and apply configuration by using GitOps-based configuration management.
  • Use Azure Monitor for containers to view and monitor your clusters.

Connect an on-premises (or another cloud) clusters to Azure Arc

I have used the local Kubernetes (docker desktop) for this, however, the steps are identical for any other Kubernetes clusters. All you need is to run the following Azure CLI command from a machine where you can reach both the on-premises Kubernetes cluster and Azure.

az connectedk8s connect --name <ClusterName> --resource-group <ResourceGroup>

It will take moment and then the cluster is connected to Azure. We can see that in Azure portal:

Once we have the connected cluster to Azure – we can create/edit tags just like any other Azure resource. Which is awesome.

Same goes true for the Azure Policies – I can apply any compliance constraints to the cluster and monitor their compliance status in Azure Security center.

GitOps on Arc enabled Kubernetes cluster

The next piece of feature is interesting and can be very useful for many scenarios. This is much like infrastructure-as-code for your Kubernetes configuration (namespaces, deployments etc.). The idea is we define one or more git repository that keeps the desired state of the cluster (i.e. namespaces, deployments etc.) in Yaml files and Azure Resource Manager does the necessary actions to apply those desired state into the connected cluster. Microsoft Document describes how this works:

The connection between your cluster and one or more Git repositories is tracked in Azure Resource Manager as a sourceControlConfiguration extension resource. The sourceControlConfiguration resource properties represent where and how Kubernetes resources should flow from Git to your cluster. The sourceControlConfiguration data is stored encrypted at rest in an Azure Cosmos DB database to ensure data confidentiality.

The config-agent running in your cluster is responsible for watching for new or updated sourceControlConfiguration extension resources on the Azure Arc enabled Kubernetes resource, deploying a flux operator to watch the Git repository, and propagating any updates made to the sourceControlConfiguration. It is even possible to create multiple sourceControlConfiguration resources with namespace scope on the same Azure Arc enabled Kubernetes cluster to achieve multi-tenancy. In such a case, each operator can only deploy configurations to its respective namespace.

An example Git repository can be found in here: https://github.com/Azure/arc-k8s-demo. We can create the configuration from the Portal or via Azure CLI:

az k8sconfiguration create \
    --name cluster-config \
    --cluster-name AzureArcTest1 --resource-group AzureArcTest \
    --operator-instance-name cluster-config --operator-namespace cluster-config \
    --repository-url https://github.com/Azure/arc-k8s-demo \
    --scope cluster --cluster-type connectedClusters

That’s it, We can see that in Azure Portal:

With that setup committing changes to the Git repository will now reflect in connected cluster.

Monitoring

Connected clusters can also be monitored with Azure Monitor for containers. It’s as simple as creating a Log analytics workspace and configuring the cluster to push metrics to it. This document describes the steps to enable monitoring.

I have seen some scenarios where people running on-premises (or in other cloud) clusters heavily using Prometheus and Grafana for monitoring clusters. Good news, we can get the same on Azure Arc enabled clusters. Once we have the metrics available in Azure Log Analytics, we can use Grafana to point to the workspace – it takes less than a minute and few button clicks (no-code configuration required).

Isn’t it awesome? Go, checkout Azure Arc for Kubernetes today.

Restricting Unverified Kubernetes Content with Docker Content Trust

Docker Content Trust (DCT) provides the ability to use digital signatures for data sent to and received from remote Docker registries. These signatures allow client-side or runtime verification of the integrity and publisher of specific image tags.

Signed tags
Image source: Docker Content Trust


Through DCT, image publishers can sign their images and image consumers can ensure that the images they pull are signed. Publishers could be individuals or organizations manually signing their content or automated software supply chains signing content as part of their release process.


Azure Container Registry implements Docker’s content trust model, enabling pushing and pulling of signed images. Once Content Trust is enabled in Azure Container Registry, signing an image is extremely easy as below:

Signing an image from console

Problem statement

Now that we can have DCT enabled in Azure Container Registry (i.e. allows pushing signed images into the repository), we want to make sure Kubernetes Cluster would deny running any images that are not signed.

However, Azure Kubernetes Service doesn’t have the feature (as of the today while writing this article) to restrict only signed images to be executed in the cluster. This doesn’t mean we’re stuck until AKS releases the feature. We can implement the content trust restriction using a custom Admission controller. In this article I want to share how one can create their own custom Admission controller to achieve DCT in an AKS cluster.

What are Admission Controllers?

In a nutshell, Kubernetes admission controllers are plugins that govern and enforce how the cluster is used. They can be thought of as a gatekeeper that intercept (authenticated) API requests and may change the request object or deny the request altogether.

Admission Controller Phases
Image source: Kubernetes Documentations

The admission control process has two phases: the mutating phase is executed first, followed by the validating phase. Consequently, admission controllers can act as mutating or validating controllers or as a combination of both.

In this article we will create a validation controller that would check if the docker image signature and will disallow Kubernetes to run any unsigned image for a selected Azure Container Registry. And we will create our Admission controller in .net core.

Writing a custom Admission Controller

Writing Admission Controller is fairly easy and straightforward – they’re just web APIs (Webhook) get invoked by API server while a resource is about to be created/updated/deleted etc. Webhook can respond to that request with an Allowed or Disallowed flag.

Kubernetes uses mutual TLS for all the communication with Admission controllers hence, we would need to create self-signed certificate for our Admission controller service.

Creating Self Signed Certificates

Following is a bash script that would generate a custom CA (certificate authority) and a pair of certificates for our web hook API.

#!/usr/bin/env bash

# Generate the CA cert and private key
openssl req -nodes -new -x509 -keyout ca.key -out ca.crt -subj "/CN=TailSpin CA"
# Generate the private key for the tailspin server
openssl genrsa -out tailspin-server-tls.key 2048
# Generate a Certificate Signing Request (CSR) for the private key, and sign it with the private key of the CA.

## NOTE
## The CN should be in this format: <service name>.<namespace>.svc
openssl req -new -key tailspin-server-tls.key -subj "/CN=tailspin-admission.tailspin.svc" \
    | openssl x509 -req -CA ca.crt -CAkey ca.key -CAcreateserial -out tailspin-server-tls.crt

We will create a simple asp.net core web api project. It’s as simple as a File->New asp.net web API project. Except, we would configure Kestrel to use the TLS certificates (created above) while listening by modifying the Program.cs as below.

public static IHostBuilder CreateHostBuilder(string[] args) =>
  Host.CreateDefaultBuilder(args)
      .ConfigureWebHostDefaults(webBuilder => {
        webBuilder
         .UseStartup<Startup>()
         .UseKestrel(options => {                        
 options.ConfigureHttpsDefaults(connectionOptions => 
     {                                    connectionOptions.AllowAnyClientCertificate();
connectionOptions.OnAuthenticate = (context, options) => {                                        Console.WriteLine($"OnAuthenticate:: TLS Connection ID: {context.ConnectionId}");
     };
   });
   options.ListenAnyIP(443, async listenOptions => {
     var certificate = await ResourceReader.GetEmbeddedStreamAsync(ResourceReader.Certificates.TLS_CERT);
     var privateKey = await ResourceReader.GetEmbeddedStreamAsync(ResourceReader.Certificates.TLS_KEY);

      Console.WriteLine("Certificate and Key received...creating PFX..");
      var pfxCertificate = CertificateHelper.GetPfxCertificate(
                certificate,
                privateKey);
      listenOptions.UseHttps(pfxCertificate);
      Console.WriteLine("Service is listening to TLS port");
      });
   });
});

Next, we will create a middle-ware/handler in Startup.cs to handle the Admission Validation requests.

        // This method gets called by the runtime. Use this method to configure the HTTP request pipeline.
        public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
        {
            app.Use(GetAdminissionMiddleware());

The implementation of the middleware looks as following:

public Func<HttpContext, Func<Task>, Task> GetAdminissionMiddleware()
        {            
            var acrName = Environment.GetEnvironmentVariable("RegistryFullName"); // "/subscriptions/XX/resourceGroups/YY/providers/Microsoft.ContainerRegistry/registries/ZZ";
            return async (context, next) =>
            {
                if (context.Request.Path.HasValue && context.Request.Path.Value.Equals("/admission"))
                {
                    using var streamReader = new StreamReader(context.Request.Body);
                    var body = await streamReader.ReadToEndAsync();
                    var payload = JsonConvert.DeserializeObject<AdmissionRequest>(body);

                    var allowed = true;
                    if (payload.Request.Operation.Equals("CREATE") || payload.Request.Operation.Equals("UPDATE"))
                    {                        
                        foreach(var container in payload.Request.Object.Spec.Containers)
                        {
                            allowed = (await RegistryHelper.VerifyAsync(acrName, container.Image)) && allowed;
                        }
                    }
                    await GenerateResponseAsync(context, payload, allowed);
                }
                await next();
            };
        }

What we are doing here is, once we receive a request from API server about a POD to be created or updated, we intercept the request, validate if the image has Digital Signature in place (DCT) and reply to API server accordingly. Here’s the response to API server:

private static async Task GenerateResponseAsync(HttpContext context, AdmissionRequest payload, bool allowed)
        {
            context.Response.Headers.Add("content-type", "application/json");
            await context
                .Response
                .WriteAsync(JsonConvert.SerializeObject(new AdmissionReviewResponse
                {
                    ApiVersion = payload.ApiVersion,
                    Kind = payload.Kind,
                    Response = new ResponsePayload
                    {
                        Uid = payload.Request.Uid,
                        Allowed = allowed
                    }
                }));
        }

Now let’s see how we can verify the image has a signed tag in Azure Container Registry.

public class RegistryHelper
    {
        private static HttpClient http = new HttpClient();

        public async static Task<bool> VerifyAsync(string acrName, string imageWithTag)
        {
            var imageNameWithTag = imageWithTag.Split(":".ToCharArray());
            var credentials = SdkContext
                .AzureCredentialsFactory
                .FromServicePrincipal(Environment.GetEnvironmentVariable("ADClientID"),
                Environment.GetEnvironmentVariable("ADClientSecret"),
                Environment.GetEnvironmentVariable("ADTenantID"),
                AzureEnvironment.AzureGlobalCloud);

            var azure = Azure
                .Configure()
                .WithLogLevel(HttpLoggingDelegatingHandler.Level.Basic)
                .Authenticate(credentials)
                .WithSubscription(Environment.GetEnvironmentVariable("ADSubscriptionID"));
           
            var azureRegistry = await azure.ContainerRegistries.GetByIdAsync(acrName);
            var creds = await azureRegistry.GetCredentialsAsync();           

            var authenticationString = $"{creds.Username}:{creds.AccessKeys[AccessKeyType.Primary]}";
            var base64EncodedAuthenticationString = 
                Convert.ToBase64String(ASCIIEncoding.UTF8.GetBytes(authenticationString));
            http.DefaultRequestHeaders.Add("Authorization", "Basic " + base64EncodedAuthenticationString);

            var response = await http.GetAsync($"https://{azureRegistry.Name}.azurecr.io/acr/v1/{imageNameWithTag[0]}/_tags/{imageNameWithTag[1]}");
            var repository = JsonConvert.DeserializeObject<RepositoryTag>(await response.Content.ReadAsStringAsync());

            return repository.Tag.Signed;
        }
    }

This class uses Azure REST API for Azure Container Registry to check if the image tag was digitally signed. That’s all. We would create a docker image (tailspin-admission:latest) for this application and deploy it to Kubernetes with the following manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tailspin-admission
  namespace: tailspin
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tailspin-admission
  template:
    metadata:
      labels:
        app: tailspin-admission
    spec:
      nodeSelector:
        "beta.kubernetes.io/os": linux
      containers:
      - name: tailspin-admission
        image: "acr.io/tailspin-admission:latest"
        ports:
        - containerPort: 443
---
apiVersion: v1
kind: Service
metadata:
  name: tailspin-admission
  namespace: tailspin
spec:
  type: LoadBalancer
  selector:
    app: tailspin-admission
  ports:
    - port: 443
      targetPort: 443

Now our Admission Controller is running, we need to register this as a validation web hook with the following manifest:

apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
  name: tailspin-admission-controller
webhooks:
  - name: tailspin-admission.tailspin.svc
    admissionReviewVersions: ["v1", "v1beta1"]
    failurePolicy: Fail
    clientConfig:
      service:
        name: tailspin-admission
        namespace: tailspin
        path: "/admission"
      caBundle: ${CA_PEM_B64}
    rules:
      - operations: [ "*" ]
        apiGroups: [""]
        apiVersions: ["*"]
        resources: ["*"]

Here the ${CA_PEM_B64} needs to be filled with the base64 of our CA certificate that we generated above. The following bash script can do that:

# Read the PEM-encoded CA certificate, base64 encode it, and replace the `${CA_PEM_B64}` placeholder in the YAML
# template with it. Then, create the Kubernetes resources.
ca_pem_b64="$(openssl base64 -A <"../certs/ca.crt")"
(sed -e 's@${CA_PEM_B64}@'"$ca_pem_b64"'@g' <"./manifests/webhook-deployment.template.yaml") > "./manifests/webhook-deployment.yaml"

We can now deploy this manifest (KubeCtl) to our AKS cluster. And we are done! The cluster will now prevent running any unsigned images coming from our Azure Container Registry.

Summary

A critical aspect for DCT feature on AKS is enabling a solution which can satisfy all requirements such as content moving repositories or across registries which may extend beyond the current scope of the Azure Container Registry content trust feature as seen today. And enabling DCT on each AKS node is not a feasible solution as many Kubernetes images are not signed.

A custom Admission controller allow you to avoid these limitations and complexity today still being able to enforce Content Trust specific to your own ACR and images only. This article shows a very quick and simple way to create a custom Admission Controller – in .net core and how easy it is to create your own security policy for your AKS cluster.

Hope you find it useful and interesting.

Azure DevOps Multi-Stage pipelines for Enterprise AKS scenarios

Background

Multi-Stage Azure pipelines enables writing the build (continuous integration) and deploy (continuous delivery) in Pipeline-as-Code (YAML) that gets stored into a version control (Git repository). However, deploying in multiple environments (test, acceptance, production etc.) needs approvals/control gates. Often different stakeholders (product owners/Operations folks) are involved into that process of approvals. In addition to that, restricting secrets/credentials for higher-order stages (i.e. production) from developers are not uncommon.

Good news is Azure DevOps allows doing all that, with notions called Environment and resources. The idea is environment (e.g. Production) are configured with resources (e.g. Kubernetes, Virtual machine etc.) in them, then “approval policy” configured for the environment. When a pipeline targets environment in deployment stage, it pauses with a pending approval from responsible authorities (i.e. groups or users). Azure DevOps offers awesome UI to create environments, setting up approval policies.

The problem begins when we want to automate environment creations to scale the process.

Problem statement

As of today (while writing this article)- provisioning and setting up approve policies for environments via REST API is not documented and publicly unavailable – there is a feature request awaiting.
In this article, I will share some code that can be used to automate provisioning environment, approval policy management.

Scenario

It’s fairly common (in fact best practice) to logically isolate AKS clusters for separate teams and projects. To minimize the number of physical AKS clusters we deploy to isolate teams or applications.

With logical isolation, a single AKS cluster can be used for multiple workloads, teams, or environments. Kubernetes Namespaces form the logical isolation boundary for workloads and resources.

When we setup such isolation for multiple teams, it’s crucial to automate the bootstrap of team projects in Azure DevOps– setting up scoped environments, service accounts so teams can’t deploy to namespaces of other teams etc. The need for automation is right there – and that’s all this article is about.

The process I am trying to establish as follows:

  1. Cluster Administrators will provision a namespace for a team (GitOps )
  2. Automatically create an Environment for the team’s namespace and configure approvals
  3. Team 1 will use the environment in their multi-stage pipeline

Let’s do this!

Provision namespace for teams

It all begins with a demand from a team – they need a namespace for development/deployment. The cluster administrators would keep a Git repository that contains the Kubernetes manifest files describing these namespaces. And there is a pipeline that applies them to the cluster each time a new file is added/modified. This repository will be restricted to the Cluster administrators (operation folks) only. Developers could issue a pull request but the PR approvals and commits to master should only be accepted by a cluster administrator or people with similar responsibility.

After that, we will create a service account for each of the namespaces. These are the accounts that will be used later when we will define Azure DevOps environment for each team.

Now the pipeline for this repository essentially applies all the manifests (both for namespaces and services accounts) to the cluster.

trigger:
- master
stages:
- stage: Build
  displayName: Provision namespace and service accounts
  jobs:  
  - job: Build
    displayName: Update namespace and service accounts
    steps:
      <… omitted irrelevant codes …>
      - bash: |
          kubectl apply -f ./namespaces 
        displayName: 'Update namespaces'
      - bash: |
          kubectl apply -f ./ServiceAccounts 
        displayName: 'Update service accounts'   
      - bash: |
          dotnet ado-env-gen.dll
        displayName: 'Provision Azure DevOps Environments'       

At this point, we have service account configured for each namespace that we will use to create the environment, endpoints etc. You might notice that I have created some label for each service account (i.e. purpose=ado-automation), this is to tag along the Azure DevOps Project name to a service account. This will come handy when we will provision environments.

The last task that runs a .net core console app (i.e. ado-env-gen.dll) – which I will described in detail later in this article.

Provisioning Environment in Azure DevOps

NOTE: Provisioning environment via REST api currently is undocumented and might change in coming future – beware of that.

It takes multiple steps to create an Environment to Azure DevOps. The steps are below:

  1. Create a Service endpoint with Kubernetes Service Account
  2. Create an empty environment (with no resources yet)
  3. Connect the service endpoint to the environment as Resource

I’ve used .net (C#) for this, but any REST client technology could do that.

Creating Service Endpoint

Following method creates a service endpoint in Azure DevOps that uses a Service Account scoped to a given namespace.

        public async Task<Endpoint> CreateKubernetesEndpointAsync(
            Guid projectId, string projectName,
            string endpointName, string endpointDescription,
            string clusterApiUri,
            string serviceAccountCertificate, string apiToken)
        {
            return await GetAzureDevOpsDefaultUri()
                .PostRestAsync<Endpoint>(
                $"{projectName}/_apis/serviceendpoint/endpoints?api-version=6.0-preview.4",
                new
                {
                    authorization = new
                    {
                        parameters = new
                        {
                            serviceAccountCertificate,
                            isCreatedFromSecretYaml = true,
                            apitoken = apiToken
                        },
                        scheme = "Token"
                    },
                    data = new
                    {
                        authorizationType = "ServiceAccount"
                    },
                    name = endpointName,
                    owner = "library",
                    type = "kubernetes",
                    url = clusterApiUri,
                    description = endpointDescription,
                    serviceEndpointProjectReferences = new List<Object>
                    {
                        new
                        {
                            description = endpointDescription,
                            name =  endpointName,
                            projectReference = new
                            {
                                id =  projectId,
                                name =  projectName
                            }
                        }
                    }
                }, await GetBearerTokenAsync());
        }

We will find out how to invoke this method in a moment. Before that, Step 2, let’s create the empty environment now.

Creating Environment in Azure DevOps

        public async Task<PipelineEnvironment> CreateEnvironmentAsync(
            string project, string envName, string envDesc)
        {
            var env = await GetAzureDevOpsDefaultUri()
                .PostRestAsync<PipelineEnvironment>(
                $"{project}/_apis/distributedtask/environments?api-version=5.1-preview.1",
                new
                {
                    name = envName,
                    description = envDesc
                },
                await GetBearerTokenAsync());

            return env;
        }

Now we have environment, but it still empty. We need to add a resource into it and that would be the Service Endpoint – so the environment comes to life.

        public async Task<string> CreateKubernetesResourceAsync(
            string projectName, long environmentId, Guid endpointId,
            string kubernetesNamespace, string kubernetesClusterName)
        {
            var link = await GetAzureDevOpsDefaultUri()
                            .PostRestAsync(
                            $"{projectName}/_apis/distributedtask/environments/{environmentId}/providers/kubernetes?api-version=5.0-preview.1",
                            new
                            {
                                name = kubernetesNamespace,
                                @namespace = kubernetesNamespace,
                                clusterName = kubernetesClusterName,
                                serviceEndpointId = endpointId
                            },
                            await GetBearerTokenAsync());
            return link;
        }

Of course, environment needs to have Approval policies configure. The following method configures a Azure DevOps group as Approver to the environment. Hence any pipeline that reference this environment will be paused and wait for approval from one of the members of the group.

        public async Task<string> CreateApprovalPolicyAsync(
            string projectName, Guid groupId, long envId, 
            string instruction = "Please approve the Deployment")
        {
            var response = await GetAzureDevOpsDefaultUri()
                .PostRestAsync(
                $"{projectName}/_apis/pipelines/checks/configurations?api-version=5.2-preview.1",
                new
                {
                    timeout = 43200,
                    type = new
                    {                                   
                        name = "Approval"
                    },
                    settings = new
                    {
                        executionOrder = 1,
                        instructions = instruction,
                        blockedApprovers = new List<object> { },
                        minRequiredApprovers = 0,
                        requesterCannotBeApprover = false,
                        approvers = new List<object> { new { id = groupId } }
                    },
                    resource = new
                    {
                        type = "environment",
                        id = envId.ToString()
                    }
                }, await GetBearerTokenAsync());
            return response;
        }

So far so good. But we need to stich all these together. Before we do so, one last item needs attention. We would want to create a Service connection to the Azure container registry so the teams can push/pull images to that. And we would do that using Service Principals designated to the teams – instead of the Admin keys of ACR.

Creating Container Registry connection

The following snippet allows us provisioning Service Connection to Azure Container Registry with Service principals – which can have fine grained RBAC roles (i.e. ACRPush or ACRPull etc.) that makes sense for the team.

        public async Task<string> CreateAcrConnectionAsync(
            string projectName, string acrName, string name, string description,
            string subscriptionId, string subscriptionName, string resourceGroup,
            string clientId, string secret, string tenantId)
        {
            var response = await GetAzureDevOpsDefaultUri()
                .PostRestAsync(
                $"{projectName}/_apis/serviceendpoint/endpoints?api-version=5.1-preview.2",
                new
                {
                    name,
                    description,
                    type = "dockerregistry",
                    url = $"https://{acrName}.azurecr.io",
                    isShared = false,
                    owner = "library",
                    data = new
                    {
                        registryId = $"/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.ContainerRegistry/registries/{acrName}",
                        registrytype = "ACR",
                        subscriptionId,
                        subscriptionName
                    },
                    authorization = new
                    {
                        scheme = "ServicePrincipal",
                        parameters = new
                        {
                            loginServer = $"{acrName}.azurecr.io",
                            servicePrincipalId = clientId,
                            tenantId,
                            serviceprincipalkey = secret
                        }
                    }
                },
                await GetBearerTokenAsync());
            return response;
        }

We came pretty close to a wrap. We’ll stitch all the methods above together. Plan is to create a simple console application will fix everything (using the above methods). Here’s the pseudo steps:

  1. Find all Service Account created for this purpose
  2. For each Service Account: determining the correct Team Project and
    • Create Service Endpoint with the Account
    • Create Environment
    • Connect Service Endpoint to Environment (adding resource)
    • Configure Approval policies
    • Create Azure Container Registry connection

The first step needs to communicate to the cluster – obviously. I have used the official .net client for Kubernetes for that.

Bringing all together

All the above methods are invoked from a simple C# console application. Below is the relevant part of the main method that brings all the above together:

        private static async Task Main(string [] args)
        {
            var clusterApiUrl = Environment.GetEnvironmentVariable("AKS_URI");
            var adoUrl = Environment.GetEnvironmentVariable("AZDO_ORG_SERVICE_URL");
            var pat = Environment.GetEnvironmentVariable("AZDO_PERSONAL_ACCESS_TOKEN");
            var adoClient = new AdoClient(adoUrl, pat);
            var groups = await adoClient.ListGroupsAsync();

            var config = KubernetesClientConfiguration.BuildConfigFromConfigFile();
            var client = new Kubernetes(config);

We started by collecting some secret and configuration data – all from environment variables – so we can run this console as part of the pipeline task and use pipeline variables at ease.

        var accounts = await client
            .ListServiceAccountForAllNamespacesAsync(labelSelector: "purpose=ado-automation");

This gets us the list of all the service accounts we have provisioned specially for this purpose (filtered using the labels).

            foreach (var account in accounts.Items)
            {
                var project = await GetProjectAsync(account.Metadata.Labels["project"], adoClient);
                var secretName = account.Secrets[0].Name;
                var secret = await client
                    .ReadNamespacedSecretAsync(secretName, account.Metadata.NamespaceProperty);

We are iterating all the accounts and retrieving their secrets from the cluster. Next step, creating the environment with these secrets.

                var endpoint = await adoClient.CreateKubernetesEndpointAsync(
                    project.Id,
                    project.Name,
                    $"Kubernetes-Cluster-Endpoint-{account.Metadata.NamespaceProperty}",
                    $"Service endpoint to the namespace {account.Metadata.NamespaceProperty}",
                    clusterApiUrl,
                    Convert.ToBase64String(secret.Data["ca.crt"]),
                    Convert.ToBase64String(secret.Data["token"]));

                var environment = await adoClient.CreateEnvironmentAsync(project.Name,
                    $"Kubernetes-Environment-{account.Metadata.NamespaceProperty}",
                    $"Environment scoped to the namespace {account.Metadata.NamespaceProperty}");

                await adoClient.CreateKubernetesResourceAsync(project.Name, 
                    environment.Id, endpoint.Id,
                    account.Metadata.NamespaceProperty,
                    account.Metadata.ClusterName);

That will give us the environment – correctly configured with the appropriate Service Accounts. Let’s set up the approval policy now:

                var group = groups.FirstOrDefault(g => g.DisplayName
                    .Equals($"[{project.Name}]\\Release Administrators", StringComparison.OrdinalIgnoreCase));
                await adoClient.CreateApprovalPolicyAsync(project.Name, group.OriginId, environment.Id);

We are taking a designated project group “Release Administrators” and set them as approves.

            await adoClient.CreateAcrConnectionAsync(project.Name, 
                Environment.GetEnvironmentVariable("ACRName"), 
                $"ACR-Connection", "The connection to the ACR",
                Environment.GetEnvironmentVariable("SubId"),
                Environment.GetEnvironmentVariable("SubName"),
                Environment.GetEnvironmentVariable("ResourceGroup"),
                Environment.GetEnvironmentVariable("ClientId"), 
                Environment.GetEnvironmentVariable("Secret"),
                Environment.GetEnvironmentVariable("TenantId"));

Lastly created the ACR connection as well.

The entire project is in GitHub – in case you want to have a read!

Verify everything

We have got our orchestration completed. Every time we add a new team, we create one manifest for their namespace and Service account and create a PR to the repository described above. A cluster admin approves the PR and a pipeline gets kicked off.

The pipeline ensures:

  1. All the namespaces and service accounts are created
  2. An environment with the appropriate service accounts are created in the correct team project.

Now a team can create their own pipeline in their repository – referring to the environment. Voila, all starts working nice. All they need is to refer the name of the environment that’s provisioned for their team (for instance “team-1”), as following example:

- stage: Deploy
  displayName: Deploy stage
  dependsOn: Build
  jobs:
  - deployment: Deploy
    condition: and(succeeded(), not(startsWith(variables['Build.SourceBranch'], 'refs/pull/')))
    displayName: Deploy
    pool:
      vmImage: $(vmImageName)
    environment: 'Kubernetes-Cluster-Environment.team-1'
    strategy:
      runOnce:
        deploy:
          steps:
          - download: current
            artifact: kube-manifests
          - task: KubernetesManifest@0
            displayName: Deploy to Kubernetes cluster
            inputs:
              action: deploy
              manifests: |
                $(Pipeline.Workspace)/kube-manifests/all-template.yaml

Now the multi-stage pipeline knows how to talk to the correct namespace in AKS with approval awaiting.

Conclusion

This might appear an overkill for small-scale projects, as it involves quite some overhead of development and maintenance. However, on multiple occasions (especially within large enterprises), I have experienced the need for orchestrations via REST API to onboard teams in Azure DevOps, bootstrapping configurations across multiple teams’ projects etc. If you’re on the same boat, this article might be an interesting read for you!

Thanks for reading!

Azure AD Pod Identity – password-less app-containers in AKS

Background

I like Azure Managed Identity since its advent. The concept behind Managed Identity is clever, and it adds observable value to any DevOps team. All concerns with password configurations in multiple places, life cycle management of secrets, certificates, and rotation policies suddenly irrelevant (OK, most of the cases).
Leveraging managed identity for application hosted in Azure Virtual machine, Azure web apps, Function apps etc. was straightforward. The Managed Identity sits on top of Azure Instance Metadata Service technology. Azure’s Instance Metadata Service is a REST Endpoint accessible to all IaaS VMs created via the Azure Resource Manager. The endpoint is available at a well-known non-routable IP address (169.254.169.254) that can be accessed only from within the VM. Under the hood Azure VMs, VMSS and Azure PaaS resources (i.e. Web Apps, Function Apps etc.) leverage metadata service to retrieve Azure AD token. Thus VM, Web App kind of establishes their own “Application Identity” (what Managed Identity essentially is) that Azure AD authenticates.

Managed Identity in Azure Kubernetes Service

Managed Identity in Kubernetes, however, is a different ballgame. Typically, multiple applications (often developed by different teams in an organization) running in a single cluster, pods are launching, exiting frequently in different nodes. Hence, Managed Identity associating with VM/VMSS are not sufficient, we needed a way to assign identity to every pods in an application. If pods move to different nodes, the identity must somehow move with them in the new node (VM).

Azure Pod Identity

Good news is Azure Pod Identity offers that capability. Azure Pod Identity is an Open source project in GitHub.

Note: Managed pod identities is an open source project and is not supported by Azure technical support.

An application can use Azure Pod Identity to access Azure resources (i.e. Key Vault, Storage, Azure SQL database etc.) via Managed Identity hence, there’s no secret/password involved anywhere in the process. Pods can directly fetch access tokens scoped to resources directly from Azure Active Directory.

Concept

The following two components are installed in cluster to achieve the pod identity.

1. The Node Management Identity (NMI)

AKS cluster runs this Daemon Set in every node. This intercepts outbound calls from pods requesting access tokens and proxies those calls with predefined Managed Identity.

2. The Managed Identity Controller (MIC)

MIC is a central pod with permissions to query the Kubernetes API server and checks for an Azure identity mapping that corresponds to a pod.

Source: GitHub Project – Azure Pod Identity

When pods request access to an Azure service, network rules redirect the traffic to the Node Management Identity (NMI) server. The NMI server identifies pods that request access to Azure services based on their remote address and queries the Managed Identity Controller (MIC). The MIC checks for Azure identity mappings in the AKS cluster, and the NMI server then requests an access token from Azure Active Directory (AD) based on the pod’s identity mapping. Azure AD provides access to the NMI server, which is returned to the pod. This access token can be used by the pod to then request access to services in Azure.

Microsoft

Azure Pipeline to Bootstrap pod Identity

I have started with an existing rbac-enabled Kubernetes cluster – that I have created before. Azure AD pod identity would setup “Service Account”, “custom resource definitions (CRD)”, Cluster Roles and bindings, DaemonSet for NMI etc. I wanted to do it via Pipeline, so I can repeat the process on-demand. Here are the interesting part of the azure-pod-identity-setup-pipeline.yaml

trigger:
- master
variables:
  tag: '$(Build.BuildId)'
  containerRegistry: $(acr-name).azurecr.io
  vmImageName: 'ubuntu-latest'
stages:
- stage: Build
  displayName: Aad-Pod-Identity-Setup
  jobs:  
  - job: Build
    displayName: Setup Aad-Pod-Identity.
    pool:
      vmImage: $(vmImageName)
    environment: 'Kubernetes-Cluster-Environment.default'
    steps:
      - bash: |
          kubectl apply -f https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/deployment-rbac.yaml
        displayName: 'Setup Service Account, CRD, DaemonSet etc'


I will be using a “User assigned identity” for my sample application. I have written a basic .net app with SQL back-end for this purpose. My end goal is to allow the .net app talk to SQL server with pod identity.
Following instruction Aad-pod-identity project instruction, I have created the user assigned identity.

      - task: AzureCLI@2
        inputs:
          scriptType: 'bash'
          scriptLocation: 'inlineScript'
          inlineScript: 'az identity create -g $(rgp) -n $(uaiName) -o json'

In my repository, created the Azure Identity definition in a file named: aad-pod-identity.yaml

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentity
metadata:
  name: <a-idname>
spec:
  type: 0
  ResourceID: /subscriptions/<sub>/resourcegroups/<resourcegroup>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<name>
  ClientID: <clientId>
---
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
  name: demo1-azure-identity-binding
spec:
  AzureIdentity: <a-idname>
  Selector: managed-identity


And added a task to deploy that too.

      - bash: |
          kubectl apply -f manifests/ aad-pod-identity.yaml
        displayName: 'Setup Azure Identity

Triggered the pipeline and Azure pod Identity was ready to roll.

Deploying application

I have a .net core web app (Razor application) which I would configure to run with pod identity to connect to Azure SQL (a back-end) with Azure Active directory authentication – with no password configured at the application level.
Here’s the manifest file (front-end.yaml) for the application, the crucial part is to define the label (aadpodidbinding: managed-identity) match for binding the pod identity we have defined before:

        apiVersion:apps/v1
	kind: Deployment
	metadata:
	  name: dysnomia-frontend
	spec:
	  replicas: 6
	  selector:
	    matchLabels:
	      app: dysnomia-frontend      
	  strategy:
	    rollingUpdate:
	      maxSurge: 1
	      maxUnavailable: 1
	  minReadySeconds: 5 
	  template:
	    metadata:
	      labels:
	        app: dysnomia-frontend
                aadpodidbinding: managed-identity
	    spec:
	      nodeSelector:
	        "beta.kubernetes.io/os": linux
	      containers:
	      - name: dysnomia-frontend
	        image: #{containerRegistry}#/dysnomia-frontend:#{Build.BuildId}#
	        imagePullPolicy: "Always"

That’s pretty much it, once I have created my application pipeline with the above manifest deployed, the .net application can connect to Azure SQL database with the assigned pod identity.

      - bash: |
          kubectl apply -f manifests/front-end.yaml
        displayName: 'Deploy Front-end'


Of course, I needed to do Role assignment for User Assigned Identity and enable Azure AD authentication in my SQL server, but not describing those steps, I have written about that before.

What about non-Azure resources?

The above holds true for all Azure Resources that supports Managed Identity. That means, our application can connect to Cosmos DB, Storage Account, Service Bus, Key Vaults, and many other Azure resources without configuring any password and secrets anywhere in Kubernetes.
However, there are scenarios where we might want to run a Redis container or a SQL server container in our Kubernetes cluster. And cost wise, it might make sense to run it in Kubernetes (as you already have a cluster) instead of Azure PaaS (i.e. Azure SQL or Azure Redis) for many use-cases. In those cases, we must create the SQL password and configure into our .net app (using Kubernetes Secrets).
I was wondering if I could store my SQL password in an Azure Key Vault and let my SQL container and .net app both collect the password from key vault during launch using Azure pod identity. Kubernetes has a first-class option to handle such scenarios- Kubernetes secrets.

However, today I am playing with Azure AD pod Identity – therefore, I really wanted to use pod identity – for fun ;-). Here’s how I managed to make it work.

SQL container, pod identity and Azure Key vault

I’ve created Key vault and defined SQL server password as a secret there. Configured my .net app to use pod identity to talk to key vault and configured key vault access policy so user-assigned identity created above can grab the SQL password. So far so good.

Now, I wanted to run a SQL server instance in my cluster which should also collect the password from Key vault – same way as it did for .net app. Turned out, SQL 2019 image (mcr.microsoft.com/mssql/server:2019-latest) expects the password as an environment variable during container launch.

docker run -d -p 1433:1433 `
           -e "ACCEPT_EULA=Y" `
           -e "SA_PASSWORD=P@ssw0rD" `
           mcr.microsoft.com/mssql/server:2019-latest

Initially, I thought it would be easy to use an init-container to grab the password from Azure Key vault and then pass it through the application container (SQL) as environment variable. After some failed attempts realized that isn’t trivial. I can of course create volume mounts (e.g. EmptyDir) to convey the password from init-container to application container – but that rather dirty – isn’t it?
Secondly I thought of creating my own Docker image based on the SQL container, then I could run a piece of script that will grab the password form Key vault and set it as environment variable. A simple script with a few curl commands would do the trick – you might think. Well, few small issues. SQL container images are striped down ubuntu core – which do not have apt, curl etc. also not running as root either – for all good reasons.


So, I have written a small program in Go and compiled it to a binary.

package main
import (
    "encoding/json"
    "fmt"
    "io/ioutil"
    "net/http"
    "os"
)
type TokenResponse struct {
    Token_type   string `json:"token_type"`
    Access_token string `json:"access_token"`
}
type SecretResponse struct {
    Value string `json:"value"`
    Id    string `json:"id"`
}
func main() {
    var p TokenResponse
    var s SecretResponse
    imds := "http://169.254.169.254/metadata/identity/oauth2/token" +
            "?api-version=2018-02-01&resource=https%3A%2F%2Fvault.azure.net"
    kvUrl := "https://" + os.Args[1] + "/secrets/" + os.Args[2] + 
             "?api-version=2016-10-01"

    client := &http.Client{}
    req, _ := http.NewRequest("GET", imds , nil)
    req.Header.Set("Metadata", "True")
    res, _ := client.Do(req)
    b, _ := ioutil.ReadAll(res.Body)
    json.Unmarshal(b, &p)

    req, _ = http.NewRequest("GET", kvUrl , nil)
    req.Header.Set("Authorization", p.Token_type+" "+p.Access_token)
    res, _ = client.Do(req)
    b, _ = ioutil.ReadAll(res.Body)
    json.Unmarshal(b, &s)
    fmt.Println(s.Value)
}

This program simply grabs the secret from Azure Key vault using Managed Identity. Created a binary out of it:

Go build -o aadtoken


Next, I have created my SQL container image with following docker file:

FROM mcr.microsoft.com/mssql/server:2019-latest

ENV ACCEPT_EULA=Y
ENV MSSQL_PID=Developer
ENV MSSQL_TCP_PORT=1433 
COPY ./aadtoken /
COPY ./startup.sh /
CMD [ "/bin/bash", "./startup.sh" ] 

You see, I am relying on “startup.sh” bash-script. Here’s it:

echo "Retrieving AAD Token with Managed Identity..."
export SA_PASSWORD=$(./aadtoken $KeyVault $SecretName)
echo "SQL password received and cofigured successfully"
/opt/mssql/bin/sqlservr --accept-eula

Created the image and here’s my SQL manifest to deploy in Kubernetes:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: mssql-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: mssql
        aadpodidbinding: managed-identity
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: mssql
        image: #{ACR.Name}#/sql-server:2019-latest
        ports:
        - containerPort: 1433
        env:
        - name: KeyVault
          value: "#{KeyVault.Name}#"
        - name: SecretName
          value: "SQL_PASSWORD"
        volumeMounts:
        - name: mssqldb
          mountPath: /var/opt/mssql
      volumes:
      - name: mssqldb
        persistentVolumeClaim:
          claimName: mssql-data

Deployed the manifest and voila! All works. SQL pods and .net app pods all are using Managed Identity to connect to Key vault and retrieving the secret, stored centrally in one place and management of the password nice & tidy.

Conclusion

I find Azure pod identity a neat feature and security best practice. Especially when you are using Azure Kubernetes and some Azure Resources (i.e. Cosmos DB, Azure SQL, Key vault, Storage Account, Service Bus etc.). If you didn’t know, hope this makes you more enthusiast to investigate further.

Disclaimer 1: It’s an open source project – so the Azure technical support doesn’t apply.

Disclaimer 2: The SQL container part with Go script is totally a fun learning attempt, don’t take it too seriously!