Kubernetes Operators

Have you ever been asked to provision a database, or other piece of software, for which you don’t have an expert level of knowledge? I believe this happens all of the time. What you do in response is usually one of the following.

  • Hope there’s a cloud service you can use
  • Read all of the docs and build competence through failure
  • Try to hire somebody with the skills
  • Purchase a support contract

When there’s no cloud service you’re usually forced to learn. This is expensive in terms of time and risk. I’ve had to do this in the past with Cassandra, Kafka, Couchbase, RabbitMQ, Hadoop, Solr, actually the list is almost endless. With each of these services we got to a point where we were proficient, but by no means expert. This was a proficiency in terms of housekeeping tasks, scaling, backups and fixing any issues that arose.

What if you could install something like Cassandra on your Kubernetes cluster and not have to worry about becoming an expert? Kubernetes Operators are designed to encapsulate all of the domain specific expertise within a controller. There could be a utopian future where all services are installed with a one line Helm command and then totally manage themselves.

The more likely scenario is that you’ll need to become an expert in Linux, Kubernetes, Golang, an Operator SDK and then the service itself. Certainly this is true for the Operator pioneers today but I hold a glimmer of hope for that utopian future.

Imagine that the human race has been wiped out by some kind of zombie virus and yet Kafka is still running on a solar powered Raspberry Pi Kubernetes cluster automatically balancing partitions.

This Operator stuff also begs the question: should I hire more support staff, or should I just hire a few SRE’s who know how to write and troubleshoot operators? I suspect the latter will become more cost effective over time.

In this blog we’re going to dig a bit deeper into the history, context and use cases for Operators so you get a flavor of their full power. Then we’ll have a look at the 120+ Operators currently available. Finally, we’ll compare ways to create Operators by looking at kubebuilder vs operator sdk vs metacontroller.

History, context and use cases for Operators

It was back in 2016 when CoreOs first announced Operators and provided examples for etcd and Prometheus. The Tectonic platform was based on this model and it was apparently working extremely well. It wasn’t long after that when people started to build their own.

You may be thinking what’s the difference between an Operator and a controller in Kubernetes? This Github Issue does a good job of explaining the subtleties.

By the beginning of 2017 we saw more companies moving to this model in order to package up administrative tasks alongside the software being deployed. An interesting article by Giant Swarm discusses how they provision Kubernetes clusters On-Prem using an Operator for KVM.

Operators are extremely valuable for managed service providers looking to standardise and automate application management. Presslabs recently wrote about how they are moving to this model to support WordPress installations. Similarly, Cloudark are doing the same thing for Moodle.

Services which were once a bit of a headache to run on Kubernetes also started to get Operators. Vault in particular was always a pain to run and this is now largely painless since the kind people at Banzai Cloud released their Vault Operator.

We are actually spoilt for choice when it comes to Operators for MySQL for which at least 3 options exist today. In October 2018 the ElasticSearch Operator was released and the comment in the launch blog highlights the power of Operators

I’m afraid we might not see an official operator from Elastic since it would “compete” with their Elastic Cloud Enterprise product.

The ElasticSearch Operator supports multiple clusters. This means if you need another cluster you simply need to duplicate a Yaml file and apply it. A lot of the Operators work this way. Often you’ll just need to update a resource to add a new user, or database, or other kind of object. The Operator pattern will notice the change and update the existing cluster.

Some of the cleverer companies like Couchbase have released their own proprietary Operators to manage their databases on Kubernetes.

It has really only been since the beginning of 2018 that we’ve seen the ecosystem for Operators start to pick up pace.

The other launch in October 2018 was the AWS service Operator. This could be an absolute game changer for anyone running Kubernetes on AWS. Including people using EKS.

If you look at the Github Issues for the AWS service Operator you can see just how amazing this will be. In the future your EKS control plane could come pre-installed with the AWS service operator. You could update a CRD to configure your worker node pools and the AWS service provider would spin up new EC2 instances. Perhaps this explains why there are no managed workers in EKS. If you have the control plane you could use it to configure all of your AWS resources.

In fact, this aligns with what Tim Hockin presented at the cloud native Kubernetes meetup in Warsaw last month. He talked about Kubernetes as an extensible platform that would stay cloud agnostic by moving that logic out into Controllers.

With so much development going on in this space a growing number of frameworks are emerging that we’ll discuss at the end of this blog.

Another interesting model that is emerging from the Operator pattern is GitOps application deployment. In this model you update a manifest in a Git repository and an Operator running on your cluster pulls down the appropriate Helm chart and performs the deployment. Weave Flux and Keel are very popular examples of deployment Operators. This pull based deployment approach creates a useful separation between CI tools that build and test, and the source of truth for what versions run in an environment, which is updated in Git on each successful PR merge. Another benefit is that deployment Operators often encapsulate rollback functionality.

Every Operator available

The credit for most of this list should go to the awesome operators page. Usually I’d just have linked directly but I wanted to add another column for status and add other Operators as I find them. I was interested to see which Operators could potentially be used in production today.

The table on this page is dynamically generated from this Google sheet. If you would like to add an operator or change the status leave a comment on the cell and I’ll update it, or alternatively request access.

Production ready projects have a release version > 1.0 and often declare themselves as such in the readme. Some projects are approaching production readiness but aren’t quite there yet so they are marked as beta. Many projects have self declared their status as ‘alpha’ in the readme so I have marked them as such.

Out of almost 120 projects only 9 are production ready. It is a bit disappointing to see that even the original two operators (etcd and Prometheus) still aren’t > version 1. It is however promising to see a couple of Postgres Operators available today, as well as Jaeger. It’s still early days for Operators so I’m hopeful.

There are a few that I think are close to production ready. ArrangoDB is production ready on GKE, EKS and PKS but not on anything else. Etcd and Prometheus look stable but lack the full set of feature coverage.

Many people are using cert-manager even though it’s in beta. I suspect a few people will be using the vault Operators too. It’s worth evaluating each of these projects individually for yourself.

Creating an Operator

You may be wondering how you would create your own Operator. The best place to start is the official Kubernetes sample controller and especially the /docs directory.

For those looking for a way to speed up development there are a few options.

A very good blog comparing all three options was written by Adrien Trouillaud and has been kept updated since.

KubeBuilder has some really awesome documentation and is backed by a Kubernetes SIG so if you do decide to pick a framework then this may be the best choice long term.

Summary and additional reading

The potential awesomeness of Kubernetes Operators should be clear. We have a handful of real world examples that could be considered production ready today. However, the vast majority of Operators that exist need a lot of work. You’ll need to up-skill your team in order to create and iterate high quality Operators but the pay-off should in theory result in far less manual system administration activities.

The recent release of KubeBuilder and work ongoing in the sig-api-machinery group to create a standard platform SDK should help a lot.

If you’d like to continue reading here’s a sheet with some links and descriptions.

Related

As far as I know this is the complete list of Ingresses available for Kubernetes. Technically ambassador isn't an…

Read more

Blog Updates At the beginning of this week I saw the usual monitoring related threads on Reddit. They are always the…

  • Blog
  • 1.4K
Read more

We've investigated options for running Kubernetes in the cloud and Kubernetes On-Prem. What about options for running…

Read more

Tell us about a new Kubernetes application

Newsletter

Never miss a thing! Sign up for our newsletter to stay updated.

About

Discover and learn about everything Kubernetes

Navigation