Multi-Stage AKS Deployment in Enterprise scenarios

Background

Multi-Stage Azure pipelines enables writing the build (continuous integration) and deploy (continuous delivery) in Pipeline-as-Code (YAML) that gets stored into a version control (Git repository). However, deploying in multiple environments (test, acceptance, production etc.) needs approvals/control gates. Often different stakeholders (product owners/Operations folks) are involved into that process of approvals. In addition to that, restricting secrets/credentials for higher-order stages (i.e. production) from developers are not uncommon.

Good news is Azure DevOps allows doing all that, with notions called Environment and resources. The idea is environment (e.g. Production) are configured with resources (e.g. Kubernetes, Virtual machine etc.) in them, then “approval policy” configured for the environment. When a pipeline targets environment in deployment stage, it pauses with a pending approval from responsible authorities (i.e. groups or users). Azure DevOps offers awesome UI to create environments, setting up approval policies.

The problem begins when we want to automate environment creations to scale the process.

Problem statement

As of today (while writing this article)- provisioning and setting up approve policies for environments via REST API is not documented and publicly unavailable – there is a feature request awaiting.
In this article, I will share some code that can be used to automate provisioning environment, approval policy management.

Scenario

It’s fairly common (in fact best practice) to logically isolate AKS clusters for separate teams and projects. To minimize the number of physical AKS clusters we deploy to isolate teams or applications.

With logical isolation, a single AKS cluster can be used for multiple workloads, teams, or environments. Kubernetes Namespaces form the logical isolation boundary for workloads and resources.

When we setup such isolation for multiple teams, it’s crucial to automate the bootstrap of team projects in Azure DevOps– setting up scoped environments, service accounts so teams can’t deploy to namespaces of other teams etc. The need for automation is right there – and that’s all this article is about.

The process I am trying to establish as follows:

  1. Cluster Administrators will provision a namespace for a team (GitOps )
  2. Automatically create an Environment for the team’s namespace and configure approvals
  3. Team 1 will use the environment in their multi-stage pipeline

Let’s do this!

Provision namespace for teams

It all begins with a demand from a team – they need a namespace for development/deployment. The cluster administrators would keep a Git repository that contains the Kubernetes manifest files describing these namespaces. And there is a pipeline that applies them to the cluster each time a new file is added/modified. This repository will be restricted to the Cluster administrators (operation folks) only. Developers could issue a pull request but the PR approvals and commits to master should only be accepted by a cluster administrator or people with similar responsibility.

After that, we will create a service account for each of the namespaces. These are the accounts that will be used later when we will define Azure DevOps environment for each team.

Now the pipeline for this repository essentially applies all the manifests (both for namespaces and services accounts) to the cluster.

trigger:
- master
stages:
- stage: Build
  displayName: Provision namespace and service accounts
  jobs:  
  - job: Build
    displayName: Update namespace and service accounts
    steps:
      <… omitted irrelevant codes …>
      - bash: |
          kubectl apply -f ./namespaces 
        displayName: 'Update namespaces'
      - bash: |
          kubectl apply -f ./ServiceAccounts 
        displayName: 'Update service accounts'   
      - bash: |
          dotnet ado-env-gen.dll
        displayName: 'Provision Azure DevOps Environments'       

At this point, we have service account configured for each namespace that we will use to create the environment, endpoints etc. You might notice that I have created some label for each service account (i.e. purpose=ado-automation), this is to tag along the Azure DevOps Project name to a service account. This will come handy when we will provision environments.

The last task that runs a .net core console app (i.e. ado-env-gen.dll) – which I will described in detail later in this article.

Provisioning Environment in Azure DevOps

NOTE: Provisioning environment via REST api currently is undocumented and might change in coming future – beware of that.

It takes multiple steps to create an Environment to Azure DevOps. The steps are below:

  1. Create a Service endpoint with Kubernetes Service Account
  2. Create an empty environment (with no resources yet)
  3. Connect the service endpoint to the environment as Resource

I’ve used .net (C#) for this, but any REST client technology could do that.

Creating Service Endpoint

Following method creates a service endpoint in Azure DevOps that uses a Service Account scoped to a given namespace.

        public async Task<Endpoint> CreateKubernetesEndpointAsync(
            Guid projectId, string projectName,
            string endpointName, string endpointDescription,
            string clusterApiUri,
            string serviceAccountCertificate, string apiToken)
        {
            return await GetAzureDevOpsDefaultUri()
                .PostRestAsync<Endpoint>(
                $"{projectName}/_apis/serviceendpoint/endpoints?api-version=6.0-preview.4",
                new
                {
                    authorization = new
                    {
                        parameters = new
                        {
                            serviceAccountCertificate,
                            isCreatedFromSecretYaml = true,
                            apitoken = apiToken
                        },
                        scheme = "Token"
                    },
                    data = new
                    {
                        authorizationType = "ServiceAccount"
                    },
                    name = endpointName,
                    owner = "library",
                    type = "kubernetes",
                    url = clusterApiUri,
                    description = endpointDescription,
                    serviceEndpointProjectReferences = new List<Object>
                    {
                        new
                        {
                            description = endpointDescription,
                            name =  endpointName,
                            projectReference = new
                            {
                                id =  projectId,
                                name =  projectName
                            }
                        }
                    }
                }, await GetBearerTokenAsync());
        }

We will find out how to invoke this method in a moment. Before that, Step 2, let’s create the empty environment now.

Creating Environment in Azure DevOps

        public async Task<PipelineEnvironment> CreateEnvironmentAsync(
            string project, string envName, string envDesc)
        {
            var env = await GetAzureDevOpsDefaultUri()
                .PostRestAsync<PipelineEnvironment>(
                $"{project}/_apis/distributedtask/environments?api-version=5.1-preview.1",
                new
                {
                    name = envName,
                    description = envDesc
                },
                await GetBearerTokenAsync());

            return env;
        }

Now we have environment, but it still empty. We need to add a resource into it and that would be the Service Endpoint – so the environment comes to life.

        public async Task<string> CreateKubernetesResourceAsync(
            string projectName, long environmentId, Guid endpointId,
            string kubernetesNamespace, string kubernetesClusterName)
        {
            var link = await GetAzureDevOpsDefaultUri()
                            .PostRestAsync(
                            $"{projectName}/_apis/distributedtask/environments/{environmentId}/providers/kubernetes?api-version=5.0-preview.1",
                            new
                            {
                                name = kubernetesNamespace,
                                @namespace = kubernetesNamespace,
                                clusterName = kubernetesClusterName,
                                serviceEndpointId = endpointId
                            },
                            await GetBearerTokenAsync());
            return link;
        }

Of course, environment needs to have Approval policies configure. The following method configures a Azure DevOps group as Approver to the environment. Hence any pipeline that reference this environment will be paused and wait for approval from one of the members of the group.

        public async Task<string> CreateApprovalPolicyAsync(
            string projectName, Guid groupId, long envId, 
            string instruction = "Please approve the Deployment")
        {
            var response = await GetAzureDevOpsDefaultUri()
                .PostRestAsync(
                $"{projectName}/_apis/pipelines/checks/configurations?api-version=5.2-preview.1",
                new
                {
                    timeout = 43200,
                    type = new
                    {                                   
                        name = "Approval"
                    },
                    settings = new
                    {
                        executionOrder = 1,
                        instructions = instruction,
                        blockedApprovers = new List<object> { },
                        minRequiredApprovers = 0,
                        requesterCannotBeApprover = false,
                        approvers = new List<object> { new { id = groupId } }
                    },
                    resource = new
                    {
                        type = "environment",
                        id = envId.ToString()
                    }
                }, await GetBearerTokenAsync());
            return response;
        }

So far so good. But we need to stich all these together. Before we do so, one last item needs attention. We would want to create a Service connection to the Azure container registry so the teams can push/pull images to that. And we would do that using Service Principals designated to the teams – instead of the Admin keys of ACR.

Creating Container Registry connection

The following snippet allows us provisioning Service Connection to Azure Container Registry with Service principals – which can have fine grained RBAC roles (i.e. ACRPush or ACRPull etc.) that makes sense for the team.

        public async Task<string> CreateAcrConnectionAsync(
            string projectName, string acrName, string name, string description,
            string subscriptionId, string subscriptionName, string resourceGroup,
            string clientId, string secret, string tenantId)
        {
            var response = await GetAzureDevOpsDefaultUri()
                .PostRestAsync(
                $"{projectName}/_apis/serviceendpoint/endpoints?api-version=5.1-preview.2",
                new
                {
                    name,
                    description,
                    type = "dockerregistry",
                    url = $"https://{acrName}.azurecr.io",
                    isShared = false,
                    owner = "library",
                    data = new
                    {
                        registryId = $"/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.ContainerRegistry/registries/{acrName}",
                        registrytype = "ACR",
                        subscriptionId,
                        subscriptionName
                    },
                    authorization = new
                    {
                        scheme = "ServicePrincipal",
                        parameters = new
                        {
                            loginServer = $"{acrName}.azurecr.io",
                            servicePrincipalId = clientId,
                            tenantId,
                            serviceprincipalkey = secret
                        }
                    }
                },
                await GetBearerTokenAsync());
            return response;
        }

We came pretty close to a wrap. We’ll stitch all the methods above together. Plan is to create a simple console application will fix everything (using the above methods). Here’s the pseudo steps:

  1. Find all Service Account created for this purpose
  2. For each Service Account: determining the correct Team Project and
    • Create Service Endpoint with the Account
    • Create Environment
    • Connect Service Endpoint to Environment (adding resource)
    • Configure Approval policies
    • Create Azure Container Registry connection

The first step needs to communicate to the cluster – obviously. I have used the official .net client for Kubernetes for that.

Bringing all together

All the above methods are invoked from a simple C# console application. Below is the relevant part of the main method that brings all the above together:

        private static async Task Main(string [] args)
        {
            var clusterApiUrl = Environment.GetEnvironmentVariable("AKS_URI");
            var adoUrl = Environment.GetEnvironmentVariable("AZDO_ORG_SERVICE_URL");
            var pat = Environment.GetEnvironmentVariable("AZDO_PERSONAL_ACCESS_TOKEN");
            var adoClient = new AdoClient(adoUrl, pat);
            var groups = await adoClient.ListGroupsAsync();

            var config = KubernetesClientConfiguration.BuildConfigFromConfigFile();
            var client = new Kubernetes(config);

We started by collecting some secret and configuration data – all from environment variables – so we can run this console as part of the pipeline task and use pipeline variables at ease.

        var accounts = await client
            .ListServiceAccountForAllNamespacesAsync(labelSelector: "purpose=ado-automation");

This gets us the list of all the service accounts we have provisioned specially for this purpose (filtered using the labels).

            foreach (var account in accounts.Items)
            {
                var project = await GetProjectAsync(account.Metadata.Labels["project"], adoClient);
                var secretName = account.Secrets[0].Name;
                var secret = await client
                    .ReadNamespacedSecretAsync(secretName, account.Metadata.NamespaceProperty);

We are iterating all the accounts and retrieving their secrets from the cluster. Next step, creating the environment with these secrets.

                var endpoint = await adoClient.CreateKubernetesEndpointAsync(
                    project.Id,
                    project.Name,
                    $"Kubernetes-Cluster-Endpoint-{account.Metadata.NamespaceProperty}",
                    $"Service endpoint to the namespace {account.Metadata.NamespaceProperty}",
                    clusterApiUrl,
                    Convert.ToBase64String(secret.Data["ca.crt"]),
                    Convert.ToBase64String(secret.Data["token"]));

                var environment = await adoClient.CreateEnvironmentAsync(project.Name,
                    $"Kubernetes-Environment-{account.Metadata.NamespaceProperty}",
                    $"Environment scoped to the namespace {account.Metadata.NamespaceProperty}");

                await adoClient.CreateKubernetesResourceAsync(project.Name, 
                    environment.Id, endpoint.Id,
                    account.Metadata.NamespaceProperty,
                    account.Metadata.ClusterName);

That will give us the environment – correctly configured with the appropriate Service Accounts. Let’s set up the approval policy now:

                var group = groups.FirstOrDefault(g => g.DisplayName
                    .Equals($"[{project.Name}]\\Release Administrators", StringComparison.OrdinalIgnoreCase));
                await adoClient.CreateApprovalPolicyAsync(project.Name, group.OriginId, environment.Id);

We are taking a designated project group “Release Administrators” and set them as approves.

            await adoClient.CreateAcrConnectionAsync(project.Name, 
                Environment.GetEnvironmentVariable("ACRName"), 
                $"ACR-Connection", "The connection to the ACR",
                Environment.GetEnvironmentVariable("SubId"),
                Environment.GetEnvironmentVariable("SubName"),
                Environment.GetEnvironmentVariable("ResourceGroup"),
                Environment.GetEnvironmentVariable("ClientId"), 
                Environment.GetEnvironmentVariable("Secret"),
                Environment.GetEnvironmentVariable("TenantId"));

Lastly created the ACR connection as well.

The entire project is in GitHub – in case you want to have a read!

Verify everything

We have got our orchestration completed. Every time we add a new team, we create one manifest for their namespace and Service account and create a PR to the repository described above. A cluster admin approves the PR and a pipeline gets kicked off.

The pipeline ensures:

  1. All the namespaces and service accounts are created
  2. An environment with the appropriate service accounts are created in the correct team project.

Now a team can create their own pipeline in their repository – referring to the environment. Voila, all starts working nice. All they need is to refer the name of the environment that’s provisioned for their team (for instance “team-1”), as following example:

- stage: Deploy
  displayName: Deploy stage
  dependsOn: Build
  jobs:
  - deployment: Deploy
    condition: and(succeeded(), not(startsWith(variables['Build.SourceBranch'], 'refs/pull/')))
    displayName: Deploy
    pool:
      vmImage: $(vmImageName)
    environment: 'Kubernetes-Cluster-Environment.team-1'
    strategy:
      runOnce:
        deploy:
          steps:
          - download: current
            artifact: kube-manifests
          - task: KubernetesManifest@0
            displayName: Deploy to Kubernetes cluster
            inputs:
              action: deploy
              manifests: |
                $(Pipeline.Workspace)/kube-manifests/all-template.yaml

Now the multi-stage pipeline knows how to talk to the correct namespace in AKS with approval awaiting.

Conclusion

This might appear an overkill for small-scale projects, as it involves quite some overhead of development and maintenance. However, on multiple occasions (especially within large enterprises), I have experienced the need for orchestrations via REST API to onboard teams in Azure DevOps, bootstrapping configurations across multiple teams’ projects etc. If you’re on the same boat, this article might be an interesting read for you!

Thanks for reading!