Azure DevOps Multi-Stage pipelines for Enterprise AKS scenarios

Background

Multi-Stage Azure pipelines enables writing the build (continuous integration) and deploy (continuous delivery) in Pipeline-as-Code (YAML) that gets stored into a version control (Git repository). However, deploying in multiple environments (test, acceptance, production etc.) needs approvals/control gates. Often different stakeholders (product owners/Operations folks) are involved into that process of approvals. In addition to that, restricting secrets/credentials for higher-order stages (i.e. production) from developers are not uncommon.

Good news is Azure DevOps allows doing all that, with notions called Environment and resources. The idea is environment (e.g. Production) are configured with resources (e.g. Kubernetes, Virtual machine etc.) in them, then “approval policy” configured for the environment. When a pipeline targets environment in deployment stage, it pauses with a pending approval from responsible authorities (i.e. groups or users). Azure DevOps offers awesome UI to create environments, setting up approval policies.

The problem begins when we want to automate environment creations to scale the process.

Problem statement

As of today (while writing this article)- provisioning and setting up approve policies for environments via REST API is not documented and publicly unavailable – there is a feature request awaiting.
In this article, I will share some code that can be used to automate provisioning environment, approval policy management.

Scenario

It’s fairly common (in fact best practice) to logically isolate AKS clusters for separate teams and projects. To minimize the number of physical AKS clusters we deploy to isolate teams or applications.

With logical isolation, a single AKS cluster can be used for multiple workloads, teams, or environments. Kubernetes Namespaces form the logical isolation boundary for workloads and resources.

When we setup such isolation for multiple teams, it’s crucial to automate the bootstrap of team projects in Azure DevOps– setting up scoped environments, service accounts so teams can’t deploy to namespaces of other teams etc. The need for automation is right there – and that’s all this article is about.

The process I am trying to establish as follows:

  1. Cluster Administrators will provision a namespace for a team (GitOps )
  2. Automatically create an Environment for the team’s namespace and configure approvals
  3. Team 1 will use the environment in their multi-stage pipeline

Let’s do this!

Provision namespace for teams

It all begins with a demand from a team – they need a namespace for development/deployment. The cluster administrators would keep a Git repository that contains the Kubernetes manifest files describing these namespaces. And there is a pipeline that applies them to the cluster each time a new file is added/modified. This repository will be restricted to the Cluster administrators (operation folks) only. Developers could issue a pull request but the PR approvals and commits to master should only be accepted by a cluster administrator or people with similar responsibility.

After that, we will create a service account for each of the namespaces. These are the accounts that will be used later when we will define Azure DevOps environment for each team.

Now the pipeline for this repository essentially applies all the manifests (both for namespaces and services accounts) to the cluster.

trigger:
- master
stages:
- stage: Build
  displayName: Provision namespace and service accounts
  jobs:  
  - job: Build
    displayName: Update namespace and service accounts
    steps:
      <… omitted irrelevant codes …>
      - bash: |
          kubectl apply -f ./namespaces 
        displayName: 'Update namespaces'
      - bash: |
          kubectl apply -f ./ServiceAccounts 
        displayName: 'Update service accounts'   
      - bash: |
          dotnet ado-env-gen.dll
        displayName: 'Provision Azure DevOps Environments'       

At this point, we have service account configured for each namespace that we will use to create the environment, endpoints etc. You might notice that I have created some label for each service account (i.e. purpose=ado-automation), this is to tag along the Azure DevOps Project name to a service account. This will come handy when we will provision environments.

The last task that runs a .net core console app (i.e. ado-env-gen.dll) – which I will described in detail later in this article.

Provisioning Environment in Azure DevOps

NOTE: Provisioning environment via REST api currently is undocumented and might change in coming future – beware of that.

It takes multiple steps to create an Environment to Azure DevOps. The steps are below:

  1. Create a Service endpoint with Kubernetes Service Account
  2. Create an empty environment (with no resources yet)
  3. Connect the service endpoint to the environment as Resource

I’ve used .net (C#) for this, but any REST client technology could do that.

Creating Service Endpoint

Following method creates a service endpoint in Azure DevOps that uses a Service Account scoped to a given namespace.

        public async Task<Endpoint> CreateKubernetesEndpointAsync(
            Guid projectId, string projectName,
            string endpointName, string endpointDescription,
            string clusterApiUri,
            string serviceAccountCertificate, string apiToken)
        {
            return await GetAzureDevOpsDefaultUri()
                .PostRestAsync<Endpoint>(
                $"{projectName}/_apis/serviceendpoint/endpoints?api-version=6.0-preview.4",
                new
                {
                    authorization = new
                    {
                        parameters = new
                        {
                            serviceAccountCertificate,
                            isCreatedFromSecretYaml = true,
                            apitoken = apiToken
                        },
                        scheme = "Token"
                    },
                    data = new
                    {
                        authorizationType = "ServiceAccount"
                    },
                    name = endpointName,
                    owner = "library",
                    type = "kubernetes",
                    url = clusterApiUri,
                    description = endpointDescription,
                    serviceEndpointProjectReferences = new List<Object>
                    {
                        new
                        {
                            description = endpointDescription,
                            name =  endpointName,
                            projectReference = new
                            {
                                id =  projectId,
                                name =  projectName
                            }
                        }
                    }
                }, await GetBearerTokenAsync());
        }

We will find out how to invoke this method in a moment. Before that, Step 2, let’s create the empty environment now.

Creating Environment in Azure DevOps

        public async Task<PipelineEnvironment> CreateEnvironmentAsync(
            string project, string envName, string envDesc)
        {
            var env = await GetAzureDevOpsDefaultUri()
                .PostRestAsync<PipelineEnvironment>(
                $"{project}/_apis/distributedtask/environments?api-version=5.1-preview.1",
                new
                {
                    name = envName,
                    description = envDesc
                },
                await GetBearerTokenAsync());

            return env;
        }

Now we have environment, but it still empty. We need to add a resource into it and that would be the Service Endpoint – so the environment comes to life.

        public async Task<string> CreateKubernetesResourceAsync(
            string projectName, long environmentId, Guid endpointId,
            string kubernetesNamespace, string kubernetesClusterName)
        {
            var link = await GetAzureDevOpsDefaultUri()
                            .PostRestAsync(
                            $"{projectName}/_apis/distributedtask/environments/{environmentId}/providers/kubernetes?api-version=5.0-preview.1",
                            new
                            {
                                name = kubernetesNamespace,
                                @namespace = kubernetesNamespace,
                                clusterName = kubernetesClusterName,
                                serviceEndpointId = endpointId
                            },
                            await GetBearerTokenAsync());
            return link;
        }

Of course, environment needs to have Approval policies configure. The following method configures a Azure DevOps group as Approver to the environment. Hence any pipeline that reference this environment will be paused and wait for approval from one of the members of the group.

        public async Task<string> CreateApprovalPolicyAsync(
            string projectName, Guid groupId, long envId, 
            string instruction = "Please approve the Deployment")
        {
            var response = await GetAzureDevOpsDefaultUri()
                .PostRestAsync(
                $"{projectName}/_apis/pipelines/checks/configurations?api-version=5.2-preview.1",
                new
                {
                    timeout = 43200,
                    type = new
                    {                                   
                        name = "Approval"
                    },
                    settings = new
                    {
                        executionOrder = 1,
                        instructions = instruction,
                        blockedApprovers = new List<object> { },
                        minRequiredApprovers = 0,
                        requesterCannotBeApprover = false,
                        approvers = new List<object> { new { id = groupId } }
                    },
                    resource = new
                    {
                        type = "environment",
                        id = envId.ToString()
                    }
                }, await GetBearerTokenAsync());
            return response;
        }

So far so good. But we need to stich all these together. Before we do so, one last item needs attention. We would want to create a Service connection to the Azure container registry so the teams can push/pull images to that. And we would do that using Service Principals designated to the teams – instead of the Admin keys of ACR.

Creating Container Registry connection

The following snippet allows us provisioning Service Connection to Azure Container Registry with Service principals – which can have fine grained RBAC roles (i.e. ACRPush or ACRPull etc.) that makes sense for the team.

        public async Task<string> CreateAcrConnectionAsync(
            string projectName, string acrName, string name, string description,
            string subscriptionId, string subscriptionName, string resourceGroup,
            string clientId, string secret, string tenantId)
        {
            var response = await GetAzureDevOpsDefaultUri()
                .PostRestAsync(
                $"{projectName}/_apis/serviceendpoint/endpoints?api-version=5.1-preview.2",
                new
                {
                    name,
                    description,
                    type = "dockerregistry",
                    url = $"https://{acrName}.azurecr.io",
                    isShared = false,
                    owner = "library",
                    data = new
                    {
                        registryId = $"/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.ContainerRegistry/registries/{acrName}",
                        registrytype = "ACR",
                        subscriptionId,
                        subscriptionName
                    },
                    authorization = new
                    {
                        scheme = "ServicePrincipal",
                        parameters = new
                        {
                            loginServer = $"{acrName}.azurecr.io",
                            servicePrincipalId = clientId,
                            tenantId,
                            serviceprincipalkey = secret
                        }
                    }
                },
                await GetBearerTokenAsync());
            return response;
        }

We came pretty close to a wrap. We’ll stitch all the methods above together. Plan is to create a simple console application will fix everything (using the above methods). Here’s the pseudo steps:

  1. Find all Service Account created for this purpose
  2. For each Service Account: determining the correct Team Project and
    • Create Service Endpoint with the Account
    • Create Environment
    • Connect Service Endpoint to Environment (adding resource)
    • Configure Approval policies
    • Create Azure Container Registry connection

The first step needs to communicate to the cluster – obviously. I have used the official .net client for Kubernetes for that.

Bringing all together

All the above methods are invoked from a simple C# console application. Below is the relevant part of the main method that brings all the above together:

        private static async Task Main(string [] args)
        {
            var clusterApiUrl = Environment.GetEnvironmentVariable("AKS_URI");
            var adoUrl = Environment.GetEnvironmentVariable("AZDO_ORG_SERVICE_URL");
            var pat = Environment.GetEnvironmentVariable("AZDO_PERSONAL_ACCESS_TOKEN");
            var adoClient = new AdoClient(adoUrl, pat);
            var groups = await adoClient.ListGroupsAsync();

            var config = KubernetesClientConfiguration.BuildConfigFromConfigFile();
            var client = new Kubernetes(config);

We started by collecting some secret and configuration data – all from environment variables – so we can run this console as part of the pipeline task and use pipeline variables at ease.

        var accounts = await client
            .ListServiceAccountForAllNamespacesAsync(labelSelector: "purpose=ado-automation");

This gets us the list of all the service accounts we have provisioned specially for this purpose (filtered using the labels).

            foreach (var account in accounts.Items)
            {
                var project = await GetProjectAsync(account.Metadata.Labels["project"], adoClient);
                var secretName = account.Secrets[0].Name;
                var secret = await client
                    .ReadNamespacedSecretAsync(secretName, account.Metadata.NamespaceProperty);

We are iterating all the accounts and retrieving their secrets from the cluster. Next step, creating the environment with these secrets.

                var endpoint = await adoClient.CreateKubernetesEndpointAsync(
                    project.Id,
                    project.Name,
                    $"Kubernetes-Cluster-Endpoint-{account.Metadata.NamespaceProperty}",
                    $"Service endpoint to the namespace {account.Metadata.NamespaceProperty}",
                    clusterApiUrl,
                    Convert.ToBase64String(secret.Data["ca.crt"]),
                    Convert.ToBase64String(secret.Data["token"]));

                var environment = await adoClient.CreateEnvironmentAsync(project.Name,
                    $"Kubernetes-Environment-{account.Metadata.NamespaceProperty}",
                    $"Environment scoped to the namespace {account.Metadata.NamespaceProperty}");

                await adoClient.CreateKubernetesResourceAsync(project.Name, 
                    environment.Id, endpoint.Id,
                    account.Metadata.NamespaceProperty,
                    account.Metadata.ClusterName);

That will give us the environment – correctly configured with the appropriate Service Accounts. Let’s set up the approval policy now:

                var group = groups.FirstOrDefault(g => g.DisplayName
                    .Equals($"[{project.Name}]\\Release Administrators", StringComparison.OrdinalIgnoreCase));
                await adoClient.CreateApprovalPolicyAsync(project.Name, group.OriginId, environment.Id);

We are taking a designated project group “Release Administrators” and set them as approves.

            await adoClient.CreateAcrConnectionAsync(project.Name, 
                Environment.GetEnvironmentVariable("ACRName"), 
                $"ACR-Connection", "The connection to the ACR",
                Environment.GetEnvironmentVariable("SubId"),
                Environment.GetEnvironmentVariable("SubName"),
                Environment.GetEnvironmentVariable("ResourceGroup"),
                Environment.GetEnvironmentVariable("ClientId"), 
                Environment.GetEnvironmentVariable("Secret"),
                Environment.GetEnvironmentVariable("TenantId"));

Lastly created the ACR connection as well.

The entire project is in GitHub – in case you want to have a read!

Verify everything

We have got our orchestration completed. Every time we add a new team, we create one manifest for their namespace and Service account and create a PR to the repository described above. A cluster admin approves the PR and a pipeline gets kicked off.

The pipeline ensures:

  1. All the namespaces and service accounts are created
  2. An environment with the appropriate service accounts are created in the correct team project.

Now a team can create their own pipeline in their repository – referring to the environment. Voila, all starts working nice. All they need is to refer the name of the environment that’s provisioned for their team (for instance “team-1”), as following example:

- stage: Deploy
  displayName: Deploy stage
  dependsOn: Build
  jobs:
  - deployment: Deploy
    condition: and(succeeded(), not(startsWith(variables['Build.SourceBranch'], 'refs/pull/')))
    displayName: Deploy
    pool:
      vmImage: $(vmImageName)
    environment: 'Kubernetes-Cluster-Environment.team-1'
    strategy:
      runOnce:
        deploy:
          steps:
          - download: current
            artifact: kube-manifests
          - task: KubernetesManifest@0
            displayName: Deploy to Kubernetes cluster
            inputs:
              action: deploy
              manifests: |
                $(Pipeline.Workspace)/kube-manifests/all-template.yaml

Now the multi-stage pipeline knows how to talk to the correct namespace in AKS with approval awaiting.

Conclusion

This might appear an overkill for small-scale projects, as it involves quite some overhead of development and maintenance. However, on multiple occasions (especially within large enterprises), I have experienced the need for orchestrations via REST API to onboard teams in Azure DevOps, bootstrapping configurations across multiple teams’ projects etc. If you’re on the same boat, this article might be an interesting read for you!

Thanks for reading!

Azure AD Pod Identity – password-less app-containers in AKS

Background

I like Azure Managed Identity since its advent. The concept behind Managed Identity is clever, and it adds observable value to any DevOps team. All concerns with password configurations in multiple places, life cycle management of secrets, certificates, and rotation policies suddenly irrelevant (OK, most of the cases).
Leveraging managed identity for application hosted in Azure Virtual machine, Azure web apps, Function apps etc. was straightforward. The Managed Identity sits on top of Azure Instance Metadata Service technology. Azure’s Instance Metadata Service is a REST Endpoint accessible to all IaaS VMs created via the Azure Resource Manager. The endpoint is available at a well-known non-routable IP address (169.254.169.254) that can be accessed only from within the VM. Under the hood Azure VMs, VMSS and Azure PaaS resources (i.e. Web Apps, Function Apps etc.) leverage metadata service to retrieve Azure AD token. Thus VM, Web App kind of establishes their own “Application Identity” (what Managed Identity essentially is) that Azure AD authenticates.

Managed Identity in Azure Kubernetes Service

Managed Identity in Kubernetes, however, is a different ballgame. Typically, multiple applications (often developed by different teams in an organization) running in a single cluster, pods are launching, exiting frequently in different nodes. Hence, Managed Identity associating with VM/VMSS are not sufficient, we needed a way to assign identity to every pods in an application. If pods move to different nodes, the identity must somehow move with them in the new node (VM).

Azure Pod Identity

Good news is Azure Pod Identity offers that capability. Azure Pod Identity is an Open source project in GitHub.

Note: Managed pod identities is an open source project and is not supported by Azure technical support.

An application can use Azure Pod Identity to access Azure resources (i.e. Key Vault, Storage, Azure SQL database etc.) via Managed Identity hence, there’s no secret/password involved anywhere in the process. Pods can directly fetch access tokens scoped to resources directly from Azure Active Directory.

Concept

The following two components are installed in cluster to achieve the pod identity.

1. The Node Management Identity (NMI)

AKS cluster runs this Daemon Set in every node. This intercepts outbound calls from pods requesting access tokens and proxies those calls with predefined Managed Identity.

2. The Managed Identity Controller (MIC)

MIC is a central pod with permissions to query the Kubernetes API server and checks for an Azure identity mapping that corresponds to a pod.

Source: GitHub Project – Azure Pod Identity

When pods request access to an Azure service, network rules redirect the traffic to the Node Management Identity (NMI) server. The NMI server identifies pods that request access to Azure services based on their remote address and queries the Managed Identity Controller (MIC). The MIC checks for Azure identity mappings in the AKS cluster, and the NMI server then requests an access token from Azure Active Directory (AD) based on the pod’s identity mapping. Azure AD provides access to the NMI server, which is returned to the pod. This access token can be used by the pod to then request access to services in Azure.

Microsoft

Azure Pipeline to Bootstrap pod Identity

I have started with an existing rbac-enabled Kubernetes cluster – that I have created before. Azure AD pod identity would setup “Service Account”, “custom resource definitions (CRD)”, Cluster Roles and bindings, DaemonSet for NMI etc. I wanted to do it via Pipeline, so I can repeat the process on-demand. Here are the interesting part of the azure-pod-identity-setup-pipeline.yaml

trigger:
- master
variables:
  tag: '$(Build.BuildId)'
  containerRegistry: $(acr-name).azurecr.io
  vmImageName: 'ubuntu-latest'
stages:
- stage: Build
  displayName: Aad-Pod-Identity-Setup
  jobs:  
  - job: Build
    displayName: Setup Aad-Pod-Identity.
    pool:
      vmImage: $(vmImageName)
    environment: 'Kubernetes-Cluster-Environment.default'
    steps:
      - bash: |
          kubectl apply -f https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/deployment-rbac.yaml
        displayName: 'Setup Service Account, CRD, DaemonSet etc'


I will be using a “User assigned identity” for my sample application. I have written a basic .net app with SQL back-end for this purpose. My end goal is to allow the .net app talk to SQL server with pod identity.
Following instruction Aad-pod-identity project instruction, I have created the user assigned identity.

      - task: AzureCLI@2
        inputs:
          scriptType: 'bash'
          scriptLocation: 'inlineScript'
          inlineScript: 'az identity create -g $(rgp) -n $(uaiName) -o json'

In my repository, created the Azure Identity definition in a file named: aad-pod-identity.yaml

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentity
metadata:
  name: <a-idname>
spec:
  type: 0
  ResourceID: /subscriptions/<sub>/resourcegroups/<resourcegroup>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<name>
  ClientID: <clientId>
---
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
  name: demo1-azure-identity-binding
spec:
  AzureIdentity: <a-idname>
  Selector: managed-identity


And added a task to deploy that too.

      - bash: |
          kubectl apply -f manifests/ aad-pod-identity.yaml
        displayName: 'Setup Azure Identity

Triggered the pipeline and Azure pod Identity was ready to roll.

Deploying application

I have a .net core web app (Razor application) which I would configure to run with pod identity to connect to Azure SQL (a back-end) with Azure Active directory authentication – with no password configured at the application level.
Here’s the manifest file (front-end.yaml) for the application, the crucial part is to define the label (aadpodidbinding: managed-identity) match for binding the pod identity we have defined before:

        apiVersion:apps/v1
	kind: Deployment
	metadata:
	  name: dysnomia-frontend
	spec:
	  replicas: 6
	  selector:
	    matchLabels:
	      app: dysnomia-frontend      
	  strategy:
	    rollingUpdate:
	      maxSurge: 1
	      maxUnavailable: 1
	  minReadySeconds: 5 
	  template:
	    metadata:
	      labels:
	        app: dysnomia-frontend
                aadpodidbinding: managed-identity
	    spec:
	      nodeSelector:
	        "beta.kubernetes.io/os": linux
	      containers:
	      - name: dysnomia-frontend
	        image: #{containerRegistry}#/dysnomia-frontend:#{Build.BuildId}#
	        imagePullPolicy: "Always"

That’s pretty much it, once I have created my application pipeline with the above manifest deployed, the .net application can connect to Azure SQL database with the assigned pod identity.

      - bash: |
          kubectl apply -f manifests/front-end.yaml
        displayName: 'Deploy Front-end'


Of course, I needed to do Role assignment for User Assigned Identity and enable Azure AD authentication in my SQL server, but not describing those steps, I have written about that before.

What about non-Azure resources?

The above holds true for all Azure Resources that supports Managed Identity. That means, our application can connect to Cosmos DB, Storage Account, Service Bus, Key Vaults, and many other Azure resources without configuring any password and secrets anywhere in Kubernetes.
However, there are scenarios where we might want to run a Redis container or a SQL server container in our Kubernetes cluster. And cost wise, it might make sense to run it in Kubernetes (as you already have a cluster) instead of Azure PaaS (i.e. Azure SQL or Azure Redis) for many use-cases. In those cases, we must create the SQL password and configure into our .net app (using Kubernetes Secrets).
I was wondering if I could store my SQL password in an Azure Key Vault and let my SQL container and .net app both collect the password from key vault during launch using Azure pod identity. Kubernetes has a first-class option to handle such scenarios- Kubernetes secrets.

However, today I am playing with Azure AD pod Identity – therefore, I really wanted to use pod identity – for fun ;-). Here’s how I managed to make it work.

SQL container, pod identity and Azure Key vault

I’ve created Key vault and defined SQL server password as a secret there. Configured my .net app to use pod identity to talk to key vault and configured key vault access policy so user-assigned identity created above can grab the SQL password. So far so good.

Now, I wanted to run a SQL server instance in my cluster which should also collect the password from Key vault – same way as it did for .net app. Turned out, SQL 2019 image (mcr.microsoft.com/mssql/server:2019-latest) expects the password as an environment variable during container launch.

docker run -d -p 1433:1433 `
           -e "ACCEPT_EULA=Y" `
           -e "SA_PASSWORD=P@ssw0rD" `
           mcr.microsoft.com/mssql/server:2019-latest

Initially, I thought it would be easy to use an init-container to grab the password from Azure Key vault and then pass it through the application container (SQL) as environment variable. After some failed attempts realized that isn’t trivial. I can of course create volume mounts (e.g. EmptyDir) to convey the password from init-container to application container – but that rather dirty – isn’t it?
Secondly I thought of creating my own Docker image based on the SQL container, then I could run a piece of script that will grab the password form Key vault and set it as environment variable. A simple script with a few curl commands would do the trick – you might think. Well, few small issues. SQL container images are striped down ubuntu core – which do not have apt, curl etc. also not running as root either – for all good reasons.


So, I have written a small program in Go and compiled it to a binary.

package main
import (
    "encoding/json"
    "fmt"
    "io/ioutil"
    "net/http"
    "os"
)
type TokenResponse struct {
    Token_type   string `json:"token_type"`
    Access_token string `json:"access_token"`
}
type SecretResponse struct {
    Value string `json:"value"`
    Id    string `json:"id"`
}
func main() {
    var p TokenResponse
    var s SecretResponse
    imds := "http://169.254.169.254/metadata/identity/oauth2/token" +
            "?api-version=2018-02-01&resource=https%3A%2F%2Fvault.azure.net"
    kvUrl := "https://" + os.Args[1] + "/secrets/" + os.Args[2] + 
             "?api-version=2016-10-01"

    client := &http.Client{}
    req, _ := http.NewRequest("GET", imds , nil)
    req.Header.Set("Metadata", "True")
    res, _ := client.Do(req)
    b, _ := ioutil.ReadAll(res.Body)
    json.Unmarshal(b, &p)

    req, _ = http.NewRequest("GET", kvUrl , nil)
    req.Header.Set("Authorization", p.Token_type+" "+p.Access_token)
    res, _ = client.Do(req)
    b, _ = ioutil.ReadAll(res.Body)
    json.Unmarshal(b, &s)
    fmt.Println(s.Value)
}

This program simply grabs the secret from Azure Key vault using Managed Identity. Created a binary out of it:

Go build -o aadtoken


Next, I have created my SQL container image with following docker file:

FROM mcr.microsoft.com/mssql/server:2019-latest

ENV ACCEPT_EULA=Y
ENV MSSQL_PID=Developer
ENV MSSQL_TCP_PORT=1433 
COPY ./aadtoken /
COPY ./startup.sh /
CMD [ "/bin/bash", "./startup.sh" ] 

You see, I am relying on “startup.sh” bash-script. Here’s it:

echo "Retrieving AAD Token with Managed Identity..."
export SA_PASSWORD=$(./aadtoken $KeyVault $SecretName)
echo "SQL password received and cofigured successfully"
/opt/mssql/bin/sqlservr --accept-eula

Created the image and here’s my SQL manifest to deploy in Kubernetes:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: mssql-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: mssql
        aadpodidbinding: managed-identity
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: mssql
        image: #{ACR.Name}#/sql-server:2019-latest
        ports:
        - containerPort: 1433
        env:
        - name: KeyVault
          value: "#{KeyVault.Name}#"
        - name: SecretName
          value: "SQL_PASSWORD"
        volumeMounts:
        - name: mssqldb
          mountPath: /var/opt/mssql
      volumes:
      - name: mssqldb
        persistentVolumeClaim:
          claimName: mssql-data

Deployed the manifest and voila! All works. SQL pods and .net app pods all are using Managed Identity to connect to Key vault and retrieving the secret, stored centrally in one place and management of the password nice & tidy.

Conclusion

I find Azure pod identity a neat feature and security best practice. Especially when you are using Azure Kubernetes and some Azure Resources (i.e. Cosmos DB, Azure SQL, Key vault, Storage Account, Service Bus etc.). If you didn’t know, hope this makes you more enthusiast to investigate further.

Disclaimer 1: It’s an open source project – so the Azure technical support doesn’t apply.

Disclaimer 2: The SQL container part with Go script is totally a fun learning attempt, don’t take it too seriously!

Linkerd in Azure Kubernetes Service cluster

In this article I would document my journey on setting up Linkerd Service Mesh on Azure Kubernetes service.

Background

I have a tiny Kubernetes cluster. I run some workload there, some are useful, others are just try-out, fun stuffs. I have few services that need to talk to each other. I do not have a lot of traffic to be honest, but I sometimes curiously run Apache ab to simulate load and see how my services perform under stress. Until very recently I was using a messaging (basically a pub-sub) pattern to create reactive service-to-service communication. Which works great, but often comes with a latency. I can only imagine, if I were to run these service to service communication for a mission critical high-traffic performance-driven scenario (an online game for instance), this model won’t fly well. There comes the need for a service-to-service communication pattern in cluster.

What’s big deal? We can have REST calls between services, even can implement gRPC for that matter. The issue is things behaves different at scale. When many services talks to many others, nodes fail in between, network address of PODs changes, new PODs show up, some goes down, figuring out where the service sits becomes quite a challenging task.

Then Kubernetes comes to rescue, Kubernetes provides “service”, that gives us service discovery out of the box. Which is awesome. Not all issues disappeared though. Services in a cluster need fault-tolerances, traceability and most importantly, “observability”.  Circuit-breakers, retry-logics etc. implementing them for each service is again a challenge. This is exactly the Service Mesh addresses.

Service mesh

From thoughtworks radar:

Service mesh is an approach to operating a secure, fast and reliable microservices ecosystem. It has been an important steppingstone in making it easier to adopt microservices at scale. It offers discovery, security, tracing, monitoring and failure handling. It provides these cross-functional capabilities without the need for a shared asset such as an API gateway or baking libraries into each service. A typical implementation involves lightweight reverse-proxy processes, aka sidecars, deployed alongside each service process in a separate container. Sidecars intercept the inbound and outbound traffic of each service and provide cross-functional capabilities mentioned above.

Some of us might remember Aspect Oriented programming (AOP) – where we used to separate cross cutting concerns from our core-business-concerns. Service mesh is no different. They isolate (in a separate container) these networking and fault-tolerance concerns from the core-capabilities (also running in container).

Linkerd

There are quite several service mesh solutions out there – all suitable to run in Kubernetes. I have used earlier Envoy and Istio. They work great in Kubernetes as well as VM hosted clusters. However, I must admit, I developed a preference for Linkerd since I discovered it. Let’s briefly look at how Linkerd works. Imagine the following two services, Service A and Service B. Service A talks to Service B.

service-2-service

When Linkerd installed, it works like an interceptor between all the communication between services. Linkerd uses sidecar pattern to proxy the communication by updating the KubeProxy IP Table.

Linkerd-architecture.png

Linkerd implants two sidecar containers in our PODs. The init container configures the IP table so the incoming and outgoing TCP traffics flow through the Linkerd Proxy container. The proxy container is the data plane that does the actual interception and all the other fault-tolerance goodies.

Primary reason behind my Linkerd preferences are performance and simplicity. Ivan Sim has done performance benchmarking with Linkerd and Istio:

Both the Linkerd2-meshed setup and Istio-meshed setup experienced higher latency and lower throughput, when compared with the baseline setup. The latency incurred in the Istio-meshed setup was higher than that observed in the Linkerd2-meshed setup. The Linkerd2-meshed setup was able to handle higher HTTP and GRPC ping throughput than the Istio-meshed setup.

Cluster provision

Spinning up AKS is easy as pie these days. We can use Azure Resource Manager Template or Terraform for that. I have used Terraform to generate that.

Service deployment

This is going to take few minutes and then we have a cluster. We will use the canonical emojivoto app (“buoyantio/emojivoto-emoji-svc:v8”) to test our Linkerd installation. Here’s the Kubernetes manifest file for that.

With this IaC – we can run Terraform apply to provision our AKS cluster in Azure.

Azure Pipeline

Let’s create a pipeline for the service deployment. The easiest way to do that is to create a service connection to our AKS cluster. We go to the project settings in Azure DevOps project, pick Service connections and create a new service connection of type “Kubernetes connection”.

Azure DevOps connection

Installing Linkerd

We will create a pipeline that installs Linkerd into the AKS cluster. Azure Pipeline now offers “pipeline-as-code” – which is just an YAML file that describes the steps need to be performed when the pipeline is triggered. We will use the following pipeline-as-code:

We can at this point trigger the pipeline to install Linkerd into the AKS cluster.

Linkerd installation (2)

Deployment of PODs and services

Let’s create another pipeline as code that deploys all the services and deployment resources to AKS using the following Kubernetes manifest file:

In Azure Portal we can already see our services running:

Azure KS

Also in Kubernetes Dashboard:

Kub1

We have got our services running – but they are not really affected by Linkerd yet. We will add another step into the build pipeline to tell Linkerd to do its magic.

Next thing, we trigger the pipeline and put some traffic into the service that we have just deployed. The emoji service is simulating some service to service invocation scenarios and now it’s time for us to open the Linkerd dashboard to inspect all the distributed traces and many other useful matrix to look at.

linkerd-censored

We can also see kind of an application map – in a graphical way to understand which service is calling who and what is request latencies etc.

linkerd-graph

Even fascinating, Linkerd provides some drill-down to the communications in Grafana Dashboard.

ezgif.com-gif-maker.gif

Conclusion

I have enjoyed a lot setting it up and see the outcome and wanted to share my experience with it. If you are looking into Service Mesh and read this post, I strongly encourage to give Linkerd a go, it’s awesome!

Thanks for reading.

Azure template to provision Docker swarm mode cluster

What is a swarm?

The cluster management and orchestration features embedded in the Docker Engine are built using SwarmKit. Docker engines participating in a cluster are running in swarm mode. You enable swarm mode for an engine by either initializing a swarm or joining an existing swarm. A swarm is a cluster of Docker engines, or nodes, where you deploy services. The Docker Engine CLI and API include commands to manage swarm nodes (e.g., add or remove nodes), and deploy and orchestrate services across the swarm.

I was recently trying to come up with a script that generates the docker swarm cluster – ready to take container work loads on Microsoft Azure. I thought, Azure Container Service (ACS) should already have supported that. However, I figured, that’s not the case. Azure doesn’t support docker swarm mode in ACS yet – at least as of today (25th July 2017). Which forced me to come up with my own RM template that does the help.

What’s in it?

The RM template will provision the following resources:

  • A virtual network
  • An availability set for manager nodes
  • 3 virtual machines with the AV set created above. (the numbers, names can be parameterized as per your needs)
  • A load balancer (with public port that round-robins to the 3 VMs on port 80. And allows inbound NAT to the 3 machine via port 5000, 5001 and 5002 to ssh port 22).
  • Configures 3 VMs as docker swarm mode manager.
  • A Virtual machine scale set (VMSS) in the same VNET.
  • 3 Nodes that are joined as worker into the above swarm.
  • Load balancer for VMSS (that allows inbound NATs starts from range 50000 to ssh port 22 on VMSS)

The design can be visualized with the following diagram:

There’s a handly powershell that can help automate provisioing this resources. But you can also just click the “Deploy to Azure” button below.

Thanks!

The entire scripts can be found into this GitHub repo. Feel free to use – as needed!

RabbitMQ High-availability clusters on Azure VM

Background

Recently I had to look into a reliable AMQP solution (publish-subscribe queue model) in order to build a message broker for a large application. I started with the Azure service bus and RabbitMQ. It didn’t took long to understand that RabbitMQ is much more attractive over service bus because of their efficiency and cost comparisons when there are large number of messages. See the image taken from Mariusz Wojcik’s blog.

Setting up RabbitMQ on a windows machine is relatively easy. RabbitMQ web site nicely documented how to do that. However, when it comes to install RabbitMQ cluster on some cloud VMs, I found Linux (Ubuntu) VMs are handier for their faster booting. For quite a long time I haven’t used the *nix OS, so found the journey really interested to write a post about it.

Spin up VMs on Azure

We need two Linux VMs, both will have RabbitMQ installed as server and they will be clustered. The high level picture of the design looks like following:

Login to the Azure portal and create two VM instances based on the Ubuntu Server 14.04 LTS images on Azure VM depot.

I have named them as MUbuntu1 and MUbuntu2. The VMs need to be in the same cloud service and the same availability set, to achieve redundancy and high availability. The availability set ensures that Azure Fabric Controller will recognize this scenario and will not take all the VMs down together when it does maintenance tasks, i.e. OS patch/updates for example.

Once the VM instances are up and running, we need to define some endpoints for RabbitMQ. Also they need to be load balanced. We go to the MUbuntu1 details in management portal and add two endpoints-port 15672 and port 5672 one for RabbitMQ connection from client applications another for RabbitMQ management portal application. Scott Hanselman has described the details how to create load balanced VMs. Once we create them it will look like following:

Now we can SSH into both of these machines, (Azure already mapped the SSH port 22 to a port which can be found on the right side of the dashboard page for the VM).

Install RabbitMQ

Once we SSH into the terminals of both of the machines we can install RabbitMQ by executing the following commands:



sudo add-apt-repository 'deb http://www.rabbitmq.com/debian/ testing main'
sudo apt-get update
sudo apt-get -q -y --force-yes install rabbitmq-server

The above apt-get will install the Erlang and RabbitMQ server on both machines. Erlang nodes use a cookie to determine whether they are allowed to communicate with each other – for two nodes to be able to communicate they must have the same cookie. Erlang will automatically create a random cookie file when the RabbitMQ server starts up. The easiest way to proceed is to allow one node to create the file, and then copy it to all the other nodes in the cluster. On our VMs the cookie will be typically located in /var/lib/rabbitmq/.erlang.cookie

We are going to create the cookie in both machines by executing the following commands



echo 'ERLANGCOOKIEVALUE' | sudo tee /var/lib/rabbitmq/.erlang.cookie
sudo chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
sudo chmod 400 /var/lib/rabbitmq/.erlang.cookie
sudo invoke-rc.d rabbitmq-server start

Install Management portal for RabbitMQ

Now we can also install the RabbitMQ management portal so we can monitor the Queue from a browser. Following commands will install the management plugin:



sudo rabbitmq-plugins enable rabbitmq_management
sudo invoke-rc.d rabbitmq-server stop
sudo invoke-rc.d rabbitmq-server start

So far so good. Now we create a user that we want to use to connect the queue from the clients and monitoring. You can manage users anytime later too.



sudo rabbitmqctl add_user
sudo rabbitmqctl set_user_tags administrator
sudo rabbitmqctl set_permissions -p / '.*' '.*' '.*'

Configuring the cluster

So far we have two RabbitMQ server up and running, it’s time to connect them as cluster. To do so, we need to go to one of the machines and join the cluster. The following command will do that:


sudo rabbitmqctl stop_app
sudo rabbitmqctl join_cluster rabbit@MUbuntu1
sudo rabbitmqctl start_app
sudo rabbitmqctl set_cluster_name RabbitCluster

We can verify if the cluster is configured properly via RabbitMQ management portal:

Or from SSH terminal:

Queues within a RabbitMQ cluster are located on a single node by default. They need to be made mirrored across multiple nodes. Each mirrored queue consists of one master and one or more slaves, with the oldest slave being promoted to the new master if the old master disappears for any reason. Messages published to the queue are replicated to all slaves. Consumers are connected to the master regardless of which node they connect to, with slaves dropping messages that have been acknowledged at the master. Queue mirroring therefore enhances availability, but does not distribute load across nodes (all participating nodes each do all the work). This solution requires a RabbitMQ cluster, which means that it will not cope seamlessly with network partitions within the cluster and, for that reason, is not recommended for use across a WAN (though of course, clients can still connect from as near and as far as needed). Queues have mirroring enabled via policy. Policies can change at any time; it is valid to create a non-mirrored queue, and then make it mirrored at some later point (and vice versa). More on this are documented in RabbitMQ site. For this example, we will replicate all queues by executing this on SSH:


rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'

That should be it. The cluster is now up and running, we can create a quick .NET console application to test this. I have created 2 console applications and a library that has one class as the message contract. VS Solution looks like this:

We will use EasyNetQ to connect to the RabbitMQ, which we can nuget in publisher and subscriber project.

In the contract project (class library), we have following classes in a single code file


namespace Contracts
{
public class RabbitClusterAzure
{
public const string ConnectionString =
@"host=;username=;password=";
}


public class Message
{
public string Body { get; set; }
}
}

The publisher project has the following code in program.cs


namespace Publisher
{
class Program
{
static void Main(string[] args)
{
using (var bus = RabbitHutch.CreateBus(RabbitClusterAzure.ConnectionString))
{
var input = "";
Console.WriteLine("Enter a message. 'Quit' to quit.");
while ((input = Console.ReadLine()) != "Quit")
{
Publish(bus, input);
}
}
}

private static void Publish(IBus bus, string input)
{
bus.Publish(new Contracts.Message
{
Body = input
});
}
}
}

Finally, the subscriber project has the following code in the program.cs


namespace Subscriber
{
class Program
{
static void Main(string[] args)
{
using (var bus = RabbitHutch.CreateBus(RabbitClusterAzure.ConnectionString))
{
var retValue = bus.Subscribe("Sample_Topic", HandleTextMessage);

Console.WriteLine("Listening for messages. Hit to quit.");
Console.ReadLine();
}
}

static void HandleTextMessage(Contracts.Message textMessage)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine("Got message: {0}", textMessage.Body);
Console.ResetColor();
}
}
}

Now we can run the Publisher and multiple instance of subscriber and it will dispatch messages in round-robin (direct exchange). We can also take one of the VM down and it will not lose any messages.

We can also see the traffics to the VMs (cluster instance too) directly from Azure portal.

Conclusion

I have to admit, I found it extremely easy and convenient to configure up and run RabbitMQ clusters. The steps are simple and setting it up just works.