Last Updated on August 2, 2021
We’re going to compare every Kubernetes service mesh available today and work out who the winner is. You may have already read our Top10 list of Kubernetes applications in which case the result may be somewhat predictable.
If you’ve arrived on this page you probably already understand what a service mesh does. If you don’t then go and quickly read this article and then come back.
In this blog we’ll hopefully help you to choose from the four options available today. Like all of the content on this site it will be 100% Kubernetes specific.
To help inform people about what service mesh to choose I’ve put together a quick table of features.
Many of these are so new that the documentation is lagging a little behind. If you can help fill in the gaps please add a comment to this spreadsheet and I’ll update this blog accordingly.
I used Linkerd extensively on DC/OS and absolutely loved it. However, times have changed and there are a couple of fundamental problems that have caused this to be a total dead-end on Kubernetes.
Linkerd is written in a JVM language which means a footprint of 110mb+ memory usage per node agent. This isn’t too bad when you just run one node agent per host, but the world is moving to per pod proxy sidecars, and I think everyone realised this is too much overhead.
Linkerd also doesn’t proxy TCP requests and doesn’t support websockets.
On the positive end of the scale Linkerd has absolutely amazing traffic control. Read some of the documentation around Namerd and you’ll see just how advanced and powerful it is. It’s also one of the two service meshes that supports connections outside of the cluster.
So in summary I’d say if you only have Kubernetes to worry about then give Linkerd a miss. If you have Linkerd already in other areas and need to connect services on your Kubernetes cluster to them then it may be a valid option.
Linkerd2 is a total rewrite of Linkerd in Golang and Rust specifically for Kubernetes. Unfortunately, as with every rewrite, you start back at the beginning again from a feature and stability perspective. Although I’m sure there are more than a few lessons learned.
Moving to Rust for the data plane proxy sidecars should help mitigate some of the bugs and should also solve the memory issues. It also supports all of the major protocols now which is a big step forward.
One interesting difference compared to other service mesh designs is the tight default coupling between the data plane and control plane services. This simplifies the configuration which I see as a positive. I also like how there is a focus on keeping the data plane latency P99’s extremely low.
As mentioned at the beginning though the project just isn’t at a level from a feature perspective where it can compete with something like Istio. To give just one example of something I’d consider to be fundamentally required from a service mesh: distributed tracing. This is still in in the planning stage for Linkerd2. There are many other features that other blogs have called ‘table stakes’ that seem to be still in the RFC stage.
Having said this, if you try Linkerd2 and are happy with the current feature set then this seems like a good investment for the future. Many people hate the high complexity of Istio and so I think over time this may become the most compelling option if it remains simple.
Update: Recent updates include sidecar injection, timeouts and retries.
The latest version of Consul now comes with the ‘connect‘ feature which can be enabled on existing clusters. Like with most of the Hashicorp tools Consul is a single Go binary that includes both the data and control plane. The main unique selling point seems to be that you can enable connect across services on Kubernetes and join them to services on vm’s outside that also run Consul. This might be attractive to some organisations. However, I don’t really see it as a big advantage based on work I’ve done in the past. Usually we leave the legacy alone and let it die, then work on new projects or migrate stuff onto Kubernetes.
Consul does seem to have a slight architectural advantage in that it operates as a full mesh with no centralised control plane services that could theoretically act as a performance bottleneck.
There is also a neat separation between layer4 and layer7. I think this separation may keep the Consul service mesh design simple while still allowing for the data layer to be split out. You can currently switch out the default data layer proxy with Envoy if you need more layer 7 features.
The default proxy is however quite lacking in features. To get tracing support, or many of the more advanced layer 7 features, you’ll need to swap out the proxy for something like Envoy. This isn’t very well documented online.
The other part to keep an eye on is the control plane configuration. Istio is notoriously complicated to configure at this layer and I see Consul has a simple ‘service access graph’ feature.
Hashicorp have blogged about differentiating in the area of security. Consul ACL’s providing host to host security is a very nice feature. Especially if you want to connect pods from inside Kubernetes outside the cluster in a secure way. The agent caching, especially for auth, apparently makes the communication performance excellent.
So just like Linkerd2 this is another one to watch. Consul connect was only released a few weeks ago and so there really aren’t many howto guides online. If you’re already highly invested in the Hashicorp toolchain then I’d trial this and perhaps learn about how to swap out the default proxy with Envoy.
Istio is stable and feature rich. At the time of writing Istio has 11.5k Github stars, 244 contributors and is backed by Lyft, Google and IBM. Istio has pioneered many of the ideas currently being emulated by other service meshes.
One such stand-out-feature is the automatic sidecar injection which works amazingly well with Helm charts.
There are of course some negatives which are all to do with modularity, plug-ability and ultimately complexity. You can switch out almost any component of Istio and integrate it with other systems. This all comes at the cost of a steep learning curve and plenty of scope to shoot yourself in the foot.
However, surprisingly, you can get up and running with Istio very quickly if you stick to the defaults. Configuring a test instance with minikube, helm and Istio on your laptop is less than 5 minutes of work. There are also thousands of articles online for how to configure other integrations. This is a stark contrast to the other service meshes.
It’s close but I’d say if you’re starting from scratch on Kubernetes which many people are then Istio is probably the best service mesh right now. The complexity is high, but not massively high when compared to what you have to manage with Kubernetes already. It has the most features and went version 1 and production ready a few months ago. It has also got the backing of Google and a massive community churning out cool blogs and integrations.
Edit: As of 2021 I’ve been using Linkerd2 in production successfully. Each time we have recently evaluated Istio the operational overhead and steep learning curve was a concern to everyone on the team. Linkerd2 is simple, is now production ready, and we have had no problems with it at all.
Or probably not. Perhaps this comparison was too premature. It’s nice to have a competitive landscape with software and hopefully one day I can revisit this list and crown a new winner.
If anything I’ve written is technically inaccurate please drop me a message below.
To keep updated as new service meshes are released sign up for our news letter and visit our service mesh category.
Tell us about a new Kubernetes application
Never miss a thing! Sign up for our newsletter to stay updated.
Discover and learn about everything Kubernetes