I have a tiny Kubernetes cluster. I run some workload there, some are useful, others are just try-out, fun stuffs. I have few services that need to talk to each other. I do not have a lot of traffic to be honest, but I sometimes curiously run Apache ab to simulate load and see how my services perform under stress. Until very recently I was using a messaging (basically a pub-sub) pattern to create reactive service-to-service communication. Which works great, but often comes with a latency. I can only imagine, if I were to run these service to service communication for a mission critical high-traffic performance-driven scenario (an online game for instance), this model won’t fly well. There comes the need for a service-to-service communication pattern in cluster.
What’s big deal? We can have REST calls between services, even can implement gRPC for that matter. The issue is things behaves different at scale. When many services talks to many others, nodes fail in between, network address of PODs changes, new PODs show up, some goes down, figuring out where the service sits becomes quite a challenging task.
Then Kubernetes comes to rescue, Kubernetes provides “service”, that gives us service discovery out of the box. Which is awesome. Not all issues disappeared though. Services in a cluster need fault-tolerances, traceability and most importantly, “observability”. Circuit-breakers, retry-logics etc. implementing them for each service is again a challenge. This is exactly the Service Mesh addresses.
From thoughtworks radar:
Service mesh is an approach to operating a secure, fast and reliable microservices ecosystem. It has been an important steppingstone in making it easier to adopt microservices at scale. It offers discovery, security, tracing, monitoring and failure handling. It provides these cross-functional capabilities without the need for a shared asset such as an API gateway or baking libraries into each service. A typical implementation involves lightweight reverse-proxy processes, aka sidecars, deployed alongside each service process in a separate container. Sidecars intercept the inbound and outbound traffic of each service and provide cross-functional capabilities mentioned above.
Some of us might remember Aspect Oriented programming (AOP) – where we used to separate cross cutting concerns from our core-business-concerns. Service mesh is no different. They isolate (in a separate container) these networking and fault-tolerance concerns from the core-capabilities (also running in container).
There are quite several service mesh solutions out there – all suitable to run in Kubernetes. I have used earlier Envoy and Istio. They work great in Kubernetes as well as VM hosted clusters. However, I must admit, I developed a preference for Linkerd since I discovered it. Let’s briefly look at how Linkerd works. Imagine the following two services, Service A and Service B. Service A talks to Service B.
Linkerd implants two sidecar containers in our PODs. The init container configures the IP table so the incoming and outgoing TCP traffics flow through the Linkerd Proxy container. The proxy container is the data plane that does the actual interception and all the other fault-tolerance goodies.
Primary reason behind my Linkerd preferences are performance and simplicity. Ivan Sim has done performance benchmarking with Linkerd and Istio:
Both the Linkerd2-meshed setup and Istio-meshed setup experienced higher latency and lower throughput, when compared with the baseline setup. The latency incurred in the Istio-meshed setup was higher than that observed in the Linkerd2-meshed setup. The Linkerd2-meshed setup was able to handle higher HTTP and GRPC ping throughput than the Istio-meshed setup.
This is going to take few minutes and then we have a cluster. We will use the canonical emojivoto app (“buoyantio/emojivoto-emoji-svc:v8”) to test our Linkerd installation. Here’s the Kubernetes manifest file for that.
With this IaC – we can run Terraform apply to provision our AKS cluster in Azure.
Let’s create a pipeline for the service deployment. The easiest way to do that is to create a service connection to our AKS cluster. We go to the project settings in Azure DevOps project, pick Service connections and create a new service connection of type “Kubernetes connection”.
We will create a pipeline that installs Linkerd into the AKS cluster. Azure Pipeline now offers “pipeline-as-code” – which is just an YAML file that describes the steps need to be performed when the pipeline is triggered. We will use the following pipeline-as-code:
We can at this point trigger the pipeline to install Linkerd into the AKS cluster.
Deployment of PODs and services
Let’s create another pipeline as code that deploys all the services and deployment resources to AKS using the following Kubernetes manifest file:
In Azure Portal we can already see our services running:
Also in Kubernetes Dashboard:
We have got our services running – but they are not really affected by Linkerd yet. We will add another step into the build pipeline to tell Linkerd to do its magic.
Next thing, we trigger the pipeline and put some traffic into the service that we have just deployed. The emoji service is simulating some service to service invocation scenarios and now it’s time for us to open the Linkerd dashboard to inspect all the distributed traces and many other useful matrix to look at.
We can also see kind of an application map – in a graphical way to understand which service is calling who and what is request latencies etc.
Even fascinating, Linkerd provides some drill-down to the communications in Grafana Dashboard.
I have enjoyed a lot setting it up and see the outcome and wanted to share my experience with it. If you are looking into Service Mesh and read this post, I strongly encourage to give Linkerd a go, it’s awesome!
Thanks for reading.