.net-core · AKS · Architecture · Azure · AzureDevOps · C# · docker · Kubernetes · Pipeline

Elastic self-hosted pool for Azure DevOps (on Kubernetes)

Update

There is a follow up post with some updates, you can read here.

Introduction

If you are using Azure Pipelines, then you surely have used Microsoft-hosted agent. With Microsoft-hosted agents, maintenance and upgrades are taken care of for you. However, there are times when self-hosted agents are needed (i.e. customized images, network connectivity requirements etc.). Pipeline agents can be hosted as stand-alone, on Azure virtual machine scale-sets, as Docker containers. Container based agents are amazingly fast to spin up. This led many to run self-hosted agents on Kubernetes cluster. I am sure, many has done this before, but I didn’t find a complete solution in my search. Therefore, I have decided to build one.

Demo

Architecture

The architecture can be drawn as following:

Architecture diagram

There is a controller in a designated namespace that keeps an eye on to the agent pool in Azure DevOps and as soon as it sees new job requests queued, it spins up a container agent. It also listens for Kubernetes events when Pods are completed executing a pipeline-job, when such events raise, the controller cleans up the pod and unregister the agent from Azure DevOps. Azure DevOps unfortunately doesn’t have a service hook event, i.e. “Job queued” or such. Therefore, the controller uses REST API to look for incoming job requests. Because there is a latency involved into the process, the controller always keeps N number of “standby” agents on Kubernetes. The standby count you can configure as needed.

How to use

Installing the controller is straight-forward. the controller dynamically spins pods, deletes completed pods etc. Therefore, it requires cluster-role. It also uses a Custom Resource Definition to isolate the agent pod specifications from the controller.

Install controller

The following manifest will install all required CRDs, Cluster Role, Service Account and Cluster Role bindings in a separate namespace called “octolamp-system“.

# install controller, CRD from GitHub
kubectl apply -f https://raw.githubusercontent.com/cloudoven/azdo-k8s-agents/main/src/kubernetes/install.yaml

###

Configure agent namespace

You need to create a separate namespace where the Azure DevOps agent pods would be created and observed.

kubectl create namespace azdo-agents


Next, we need to create the container specification for the Azure DevOps Agents. This is the container image that Microsoft documented how to create them. I have created my own, however, you should create your own image and install the necessary tools according to your CI/CD needs. We will define these as a Custom Resource Definition. Let’s create a file named agent-spec.yaml with the following content:

apiVersion: "azdo.octolamp.nl/v1"
kind: AgentSpec
metadata:
  namespace: azdo-agents
  name: cloudovenagent
spec:
  prefix: "container-agent"
  image: moimhossain/azdo-agent-linux-x64:latest

The image field needs to point to the container image that you want to use as pipeline agent. Then apply this manifest to Kubernetes:

kubectl apply -f agent-spec.yaml

Next, we will deploy the controller with the details of Azure DevOps organization. Let’s create a file controller.yaml with following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: octolamp-agent-controller
  namespace: octolamp-system
spec:
  replicas: 1
  selector:
    matchLabels:
      run: octolamp-agent-controller
  template:
    metadata:
      labels:
        run: octolamp-agent-controller
    spec:
      serviceAccountName: octolamp-service-account
      containers:
      - env:
        - name: AZDO_ORG_URI
          value: https://dev.azure.com/<Organization Name>
        - name: AZDO_TOKEN
          value: <A PAT token that can manage agent pool>
        - name: AZDO_POOLNAME
          value: "k8s-pool"
        - name: TARGET_NAMESPACE
          value: "azdo-agents"
        - name: STANDBY_AGENT_COUNT
          value: "2"
        - name: MAX_AGENT_COUNT
          value: "25"
        - name: APPINSIGHT_CONN_STR
          value: < Application Insight connection string (not instrumentation key!!)>
        image: moimhossain/octolamp-agent-controller:net6-v1.0.0
        imagePullPolicy: Always
        name: octolamp-agent-controller
        resources:
          limits:
            cpu: 100m
            memory: 100Mi

This file needs to be updated according to your Azure DevOps organization URL, a personal access token, a pool that is exist in your Azure DevOps organization (create one from Organization Settings > pipeline settings > agent pools > Add pool).

Other properties taht you can configure, is to define a number for Standby agent count and maximum agents that it should create (limiting to a threshold).

Now, apply these changes to Kubernetes cluster.

kubectl apply -f controller.yaml

That’s it. At this point, you should see that container agents are spinning up and they will show up on Azure DevOps agent pool UI.

Elastic scale

Scale-out

I am using Azure Kubernetes service for this example, and AKS supports autoscaling feature for node-pools. That means, when the controller spins up too many agents that the AKS node pool doesn’t have capacity for, AKS will spin up new nodes into the pool which in-terns add capacity to the controller to spin up more agents (assuming you have lot of pipelines run in a brief time window).

Scale-down

The controller keeps track of pod completed events in Kubernetes and whenever a pod completes a pipeline run, it removes the pod from the cluster and unregisters it from the Azure DevOps agent pool list. Therefore, if there are no pipelines awaiting, the controller will scale down all the agents back to the standby count – within ~2 mins. Which will trigger AKS auto-scaler eventually and nodes will be scaled down with in ~10 mins.

Windows agents?

This article doesn’t demonstrate the windows-based agents. However, as you have seen, the controller allows you to change the agent image and image spec (with CRDs) – you should be able to make that work without much effort.

Conclusion

The entire code can be found in GitHub. This source code is MIT licensed, provided as-is (without any warranties) and you can use, modify without issues. However, I would appreciate if you acknowledged the author. Also, more than welcome to contribute directly to GitHub.

Great weekend!

4 thoughts on “Elastic self-hosted pool for Azure DevOps (on Kubernetes)

  1. Thanks for nice blog.
    I replicated the setup as you described when I try to build docker images inside agent pod using pipeline.I get below error :
    “Cannot connect to the Docker daemon at unix://var/run/docker.sock. Is the docker daemon running ?”

    It seems like a case of docker inside docker. I just need to build docker (or execute some commands).can you suggest how can it be done in your setup. I want to mount sock file .

    Like

  2. Hi, i like this idea..but would this work with API Whitelisting enabled on the cluster? If access to the API is restricted to an office network how does the ADO communicate with the cluster and then what if the pipeline i am running is making changes to the same cluster that the agent sits on?

    Like

    1. AzDO DOES NOT communicate to the K8s API, in this scenario. The controller (container) calls AzDO API. All you need it to make sure 443 outbound traffics are allowed from containers.
      The pipeline running on these agent can deploy to the same cluster as well.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s