Linkerd in Azure Kubernetes Service cluster

In this article I would document my journey on setting up Linkerd Service Mesh on Azure Kubernetes service.

Background

I have a tiny Kubernetes cluster. I run some workload there, some are useful, others are just try-out, fun stuffs. I have few services that need to talk to each other. I do not have a lot of traffic to be honest, but I sometimes curiously run Apache ab to simulate load and see how my services perform under stress. Until very recently I was using a messaging (basically a pub-sub) pattern to create reactive service-to-service communication. Which works great, but often comes with a latency. I can only imagine, if I were to run these service to service communication for a mission critical high-traffic performance-driven scenario (an online game for instance), this model won’t fly well. There comes the need for a service-to-service communication pattern in cluster.

What’s big deal? We can have REST calls between services, even can implement gRPC for that matter. The issue is things behaves different at scale. When many services talks to many others, nodes fail in between, network address of PODs changes, new PODs show up, some goes down, figuring out where the service sits becomes quite a challenging task.

Then Kubernetes comes to rescue, Kubernetes provides “service”, that gives us service discovery out of the box. Which is awesome. Not all issues disappeared though. Services in a cluster need fault-tolerances, traceability and most importantly, “observability”.  Circuit-breakers, retry-logics etc. implementing them for each service is again a challenge. This is exactly the Service Mesh addresses.

Service mesh

From thoughtworks radar:

Service mesh is an approach to operating a secure, fast and reliable microservices ecosystem. It has been an important steppingstone in making it easier to adopt microservices at scale. It offers discovery, security, tracing, monitoring and failure handling. It provides these cross-functional capabilities without the need for a shared asset such as an API gateway or baking libraries into each service. A typical implementation involves lightweight reverse-proxy processes, aka sidecars, deployed alongside each service process in a separate container. Sidecars intercept the inbound and outbound traffic of each service and provide cross-functional capabilities mentioned above.

Some of us might remember Aspect Oriented programming (AOP) – where we used to separate cross cutting concerns from our core-business-concerns. Service mesh is no different. They isolate (in a separate container) these networking and fault-tolerance concerns from the core-capabilities (also running in container).

Linkerd

There are quite several service mesh solutions out there – all suitable to run in Kubernetes. I have used earlier Envoy and Istio. They work great in Kubernetes as well as VM hosted clusters. However, I must admit, I developed a preference for Linkerd since I discovered it. Let’s briefly look at how Linkerd works. Imagine the following two services, Service A and Service B. Service A talks to Service B.

service-2-service

When Linkerd installed, it works like an interceptor between all the communication between services. Linkerd uses sidecar pattern to proxy the communication by updating the KubeProxy IP Table.

Linkerd-architecture.png

Linkerd implants two sidecar containers in our PODs. The init container configures the IP table so the incoming and outgoing TCP traffics flow through the Linkerd Proxy container. The proxy container is the data plane that does the actual interception and all the other fault-tolerance goodies.

Primary reason behind my Linkerd preferences are performance and simplicity. Ivan Sim has done performance benchmarking with Linkerd and Istio:

Both the Linkerd2-meshed setup and Istio-meshed setup experienced higher latency and lower throughput, when compared with the baseline setup. The latency incurred in the Istio-meshed setup was higher than that observed in the Linkerd2-meshed setup. The Linkerd2-meshed setup was able to handle higher HTTP and GRPC ping throughput than the Istio-meshed setup.

Cluster provision

Spinning up AKS is easy as pie these days. We can use Azure Resource Manager Template or Terraform for that. I have used Terraform to generate that.

Service deployment

This is going to take few minutes and then we have a cluster. We will use the canonical emojivoto app (“buoyantio/emojivoto-emoji-svc:v8”) to test our Linkerd installation. Here’s the Kubernetes manifest file for that.

With this IaC – we can run Terraform apply to provision our AKS cluster in Azure.

Azure Pipeline

Let’s create a pipeline for the service deployment. The easiest way to do that is to create a service connection to our AKS cluster. We go to the project settings in Azure DevOps project, pick Service connections and create a new service connection of type “Kubernetes connection”.

Azure DevOps connection

Installing Linkerd

We will create a pipeline that installs Linkerd into the AKS cluster. Azure Pipeline now offers “pipeline-as-code” – which is just an YAML file that describes the steps need to be performed when the pipeline is triggered. We will use the following pipeline-as-code:

We can at this point trigger the pipeline to install Linkerd into the AKS cluster.

Linkerd installation (2)

Deployment of PODs and services

Let’s create another pipeline as code that deploys all the services and deployment resources to AKS using the following Kubernetes manifest file:

In Azure Portal we can already see our services running:

Azure KS

Also in Kubernetes Dashboard:

Kub1

We have got our services running – but they are not really affected by Linkerd yet. We will add another step into the build pipeline to tell Linkerd to do its magic.

Next thing, we trigger the pipeline and put some traffic into the service that we have just deployed. The emoji service is simulating some service to service invocation scenarios and now it’s time for us to open the Linkerd dashboard to inspect all the distributed traces and many other useful matrix to look at.

linkerd-censored

We can also see kind of an application map – in a graphical way to understand which service is calling who and what is request latencies etc.

linkerd-graph

Even fascinating, Linkerd provides some drill-down to the communications in Grafana Dashboard.

ezgif.com-gif-maker.gif

Conclusion

I have enjoyed a lot setting it up and see the outcome and wanted to share my experience with it. If you are looking into Service Mesh and read this post, I strongly encourage to give Linkerd a go, it’s awesome!

Thanks for reading.

CloudOven – Terraform at ease!

TL;DR:

  • URL: CloudOven 

  • Use Google account or sign-up 
  • Google Chrome please! (I’ve not tested on other browsers yet)

e2e

Background

In recent years I have spent fair amount of time in design and implementation of Infrastructure as code in larger enterprise context. Terraform seemed to be a tool of choice when it comes to preserve the uniformity in Infrastructure as code targeting multiple cloud providers. It is rapidly becoming a de facto choice for creating and managing cloud infrastructures by writing declarative definitions. It’s popular because the syntax of its files is quite readable and because it supports several cloud providers while making no attempt to provide an artificial abstraction across those providers. The active community will add support for the latest features from most cloud providers.

However, rolling out Terraform in many enterprises has its own barrier to face. Albeit the syntax (HCL) is neat, but not every developers or Infrastructure operators in organizations finds it easy. There’s a learning curve and often many of us lose momentum discovering the learning effort. I believe if we could make the initial ramp-up easier more people would play with it.

That’s one of my motivation for this post, following is the other one.

Blazor meets Terraform

Lately I was learning Blazor – the new client-side technology from Microsoft. Like many others, I find one effective way learning a new technology by creating/building solution to a problem. I have decided to build a user interface that will help creating terraform scripts easier. I will share my journey in this post.

Resource Discovery in Terraform Providers

Terraform is powerful for its providers. You will find Terraform providers for all major cloud providers (Azure, AWS, Google etc.). The providers then allow us to define “resource” and “data source” in Terraform scripts. These resource and data source have arguments and attributes that one must know while creating terraform files. Luckily, they are documented nicely in Terraform site. However, it still requires us to jump back and forth to the documentation site and terraform file editor (i.e. VSCode).

Azure-Discovery

To make this experience easier, I wrote a crawler application that downloads the terraform providers (I am doing it for Azure, AWS and google for now) and discovers the attributes and arguments for each and every resource and data source. I also try to extract the documentation for every attributes and arguments from the terraform documentation site with a layman parsing (not 100% accurate but works for majority. Something I will improve soon).

GoogleAWS-discovery

This process generates JSON structure for each resource and data source, enriches them with the documentation and stores them in an Azure Blob Storage.

Building Infrastructure as code

Now that I have a structured data store with all resources and data sources for any terraform provider, I can leverage that building a user interface on top of it. To keep things a bit organized, I started with a concept of “project”.

workflow
The workflow

Project

I can start by creating a project (well, it can be a product too, but let’s not get to that debate). Project is merely a logical boundary here.

Blueprint

Within a project I can create Blueprint(s). Blueprint(s) are the entity that retains the elements of the infrastructure that we are aiming to create. For instance, a Blueprint targets to a Cloud provider (i.e. Azure). Then I can create the elements (resource and data sources) within the blueprint (i.e. Azure Web App, Cosmos DB etc.).

provider-configuration

Blueprints keeps the base structure of all the infrastructure elements. It allows defining variables (plain and simple terraform variables) so the actual values can vary in different environments (dev, test, pre-production, production etc.).

Once I am happy with the blueprint, I can download them as a zip – that contains the terraform scripts (main.tf and variable.tf). That’s it, we have our infrastructure as code in Terraform. I can execute them on a local development machine or check them in to source control – whatever I prefer.

storage_account

One can stop here and keep using the blueprint feature to generate Infrastructure as code. That’s what it is for. However, the next features are just to make the overall experience of running terraform a bit easier.

Environments

Next to blueprint, we can create as many environments we want. Again, just a logical entity to keep isolation of actual deployment for different environments.

Deployments

Deployment entity is the glue that ties a blueprint to a specific environment. For instance, I can define a blueprint for “order management” service (or micro-service maybe?), create an environment as “test” and then create a “deployment” for “order management” on “test”. This is where I can define constant values to the blueprint variable that are specific to the test environment.

Terraform State

Perhaps the most important aspect the deployment entity holds is the terraform state management. Terraform must store state about your managed infrastructure and configuration. This state is used by Terraform to map real world resources to your configuration, keep track of metadata, and to improve performance for large infrastructures. This state is stored by default in a local file named “terraform.tfstate”, but it can also be stored remotely, which works better in a team environment. Defining the state properties (varies in different cloud providers) in deployment entity makes the remote state management easier – specifically in team environment. It will configure the remote state to the appropriate remote backend. For instance, when the blueprint cloud provider is set to Azure, it will configure Azure Storage account as terraform state remote backend, for AWS it will pick S3 automatically.

e2e

Terraform plan

Once we have deployment entity configured, we can directly from the user interface run “terraform plan”. The terraform plan command creates an execution plan. Unless explicitly disabled, it performs a refresh, and then determines what actions are necessary to achieve the desired state specified in the blueprint. This command is a convenient way to check whether the execution plan for a set of changes matches your expectations without making any changes to real resources or to the state. For example, terraform plan might be run before committing a change to version control, to create confidence that it will behave as expected.

Terraform apply

The terraform apply command is used to apply the changes required to reach the desired state of the configuration, or the pre-determined set of actions generated by a terraform plan execution plan. Like “plan”, the “apply” command can also be issued directly from the user interface.

Terraform plan and apply both are issued in an isolated docker container and the output is captured and displayed back to the user interface. However, there’s a cost associated running docker containers on cloud, therefore, it’s disabled in the public site.

Final thoughts

It was fun to write a tool like this. I recommend you give it a go. Especially if you are stepping into Terraform. It can also be helpful for experienced Terraform developers – specifically with the on-screen documenation, type inferance and discovery features.

Some features, I have working progress:

  • Ability to define policy for each resources and data types
  • Save a Blueprint as custom module

Stay tuned!

 

Azure template to provision Docker swarm mode cluster

What is a swarm?

The cluster management and orchestration features embedded in the Docker Engine are built using SwarmKit. Docker engines participating in a cluster are running in swarm mode. You enable swarm mode for an engine by either initializing a swarm or joining an existing swarm. A swarm is a cluster of Docker engines, or nodes, where you deploy services. The Docker Engine CLI and API include commands to manage swarm nodes (e.g., add or remove nodes), and deploy and orchestrate services across the swarm.

I was recently trying to come up with a script that generates the docker swarm cluster – ready to take container work loads on Microsoft Azure. I thought, Azure Container Service (ACS) should already have supported that. However, I figured, that’s not the case. Azure doesn’t support docker swarm mode in ACS yet – at least as of today (25th July 2017). Which forced me to come up with my own RM template that does the help.

What’s in it?

The RM template will provision the following resources:

  • A virtual network
  • An availability set for manager nodes
  • 3 virtual machines with the AV set created above. (the numbers, names can be parameterized as per your needs)
  • A load balancer (with public port that round-robins to the 3 VMs on port 80. And allows inbound NAT to the 3 machine via port 5000, 5001 and 5002 to ssh port 22).
  • Configures 3 VMs as docker swarm mode manager.
  • A Virtual machine scale set (VMSS) in the same VNET.
  • 3 Nodes that are joined as worker into the above swarm.
  • Load balancer for VMSS (that allows inbound NATs starts from range 50000 to ssh port 22 on VMSS)

The design can be visualized with the following diagram:

There’s a handly powershell that can help automate provisioing this resources. But you can also just click the “Deploy to Azure” button below.

Thanks!

The entire scripts can be found into this GitHub repo. Feel free to use – as needed!

IAC – Using Azure RM templates

As cloud Software development heavily leverages virtualized systems and developers have started using Continuous Integration (CI), many things have started to change. The number of environment developers have to deal with has gone up significantly. Developers now release much frequently, in many cases, multiple times in a single day. All these releases has to be tested, validated. This brings up a new requirement to spin up an environment fast, which is identical to production.

The need for an automated way of provisioning such environments fast (in a repeatable manner) become obvious and hence IAC (stands for Infrastructure as Code) kicked in.

There are numerous tools (Puppet, Ansible, Vagrant etc.) that help building such coded-environment. Azure Resource Manager Template brings a new way of doing IAC when an application is targeted to build and run on Azure. Most of these tools (including RM template) are even idempotent, which ensures that you can run the same configuration multiple times while achieving the same result.

From Microsoft Azure web site:

Azure applications typically require a combination of resources (such as a database server, database, or website) to meet the desired goals. Rather than deploying and managing each resource separately, you can create an Azure Resource Manager template that deploys and provisions all of the resources for your application in a single, coordinated operation. In the template, you define the resources that are needed for the application and specify deployment parameters to input values for different environments. The template consists of JSON and expressions which you can use to construct values for your deployment.

I was excited the first time I saw this in action in one of the Channel9 Videos. Couldn’t wait to give it a go. The idea of having a template that describes all the Azure resources (Service Bus, SQL Azure, VMs, WebApps etc.) in a template file and having the capability to parameterized it with different values that varies over different environments could be very handy for a CI/CD scenarios. The templates can be nested, which also makes them more modularized and more manageable.

Lately I had the pleasure to dig deeper in Azure RM templates, as we are using it for the project I am working these days. I wanted to come up with a sample template that shows how to use RM template to construct resources that allows me to share my learnings. The Scripts can be found into this GitHub Repo.

One problem that I didn’t know how to handle yet, was the credentials that needed in order to provision the infrastructures. For instance, the VM passwords, SQL passwords etc. I don’t think anybody wants to check-in their passwords, into the source control systems visible in Azure RM parameter JSON files. To address this issue, the solution I came up with for now is, I uploaded the RM parameter JSON files into a private container of a Blob Storage (Note that, the storage account is into the same Azure Subscription where the Infrastructure I intend to provision in). A PowerShell script then download the Shared Access Signature (SAS) token for that Blob storage container and uses that to download the parameters JSON Blob into a PSCustomObject and removes the locally downloaded JSON file. Next step, it converts the PSCustomObject into a Hash Table which is passed through the Azure RM Cmdlet to kick of the provision process. That way, there is no need to have a file checked in to the Source control system that has credentials. Also the Administrators who manages the Azure subscription can Crete a private Blob storage and use the Azure Storage Explorer to create and update his credentials into the RM parameters JSON file. A CI process can download the parameters files just in time before provisioning infrastructures.