Have you ever been asked to provision a database, or other piece of software, for which you don’t have an expert level of knowledge? I believe this happens all of the time. What you do in response is usually one of the following.
When there’s no cloud service you’re usually forced to learn. This is expensive in terms of time and risk. I’ve had to do this in the past with Cassandra, Kafka, Couchbase, RabbitMQ, Hadoop, Solr, actually the list is almost endless. With each of these services we got to a point where we were proficient, but by no means expert. This was a proficiency in terms of housekeeping tasks, scaling, backups and fixing any issues that arose.
What if you could install something like Cassandra on your Kubernetes cluster and not have to worry about becoming an expert? Kubernetes Operators are designed to encapsulate all of the domain specific expertise within a controller. There could be a utopian future where all services are installed with a one line Helm command and then totally manage themselves.
The more likely scenario is that you’ll need to become an expert in Linux, Kubernetes, Golang, an Operator SDK and then the service itself. Certainly this is true for the Operator pioneers today but I hold a glimmer of hope for that utopian future.
Imagine that the human race has been wiped out by some kind of zombie virus and yet Kafka is still running on a solar powered Raspberry Pi Kubernetes cluster automatically balancing partitions.
This Operator stuff also begs the question: should I hire more support staff, or should I just hire a few SRE’s who know how to write and troubleshoot operators? I suspect the latter will become more cost effective over time.
In this blog we’re going to dig a bit deeper into the history, context and use cases for Operators so you get a flavor of their full power. Then we’ll have a look at the 120+ Operators currently available. Finally, we’ll compare ways to create Operators by looking at kubebuilder vs operator sdk vs metacontroller.
It was back in 2016 when CoreOs first announced Operators and provided examples for etcd and Prometheus. The Tectonic platform was based on this model and it was apparently working extremely well. It wasn’t long after that when people started to build their own.
You may be thinking what’s the difference between an Operator and a controller in Kubernetes? This Github Issue does a good job of explaining the subtleties.
By the beginning of 2017 we saw more companies moving to this model in order to package up administrative tasks alongside the software being deployed. An interesting article by Giant Swarm discusses how they provision Kubernetes clusters On-Prem using an Operator for KVM.
Operators are extremely valuable for managed service providers looking to standardise and automate application management. Presslabs recently wrote about how they are moving to this model to support WordPress installations. Similarly, Cloudark are doing the same thing for Moodle.
Services which were once a bit of a headache to run on Kubernetes also started to get Operators. Vault in particular was always a pain to run and this is now largely painless since the kind people at Banzai Cloud released their Vault Operator.
We are actually spoilt for choice when it comes to Operators for MySQL for which at least 3 options exist today. In October 2018 the ElasticSearch Operator was released and the comment in the launch blog highlights the power of Operators
I’m afraid we might not see an official operator from Elastic since it would “compete” with their Elastic Cloud Enterprise product.
The ElasticSearch Operator supports multiple clusters. This means if you need another cluster you simply need to duplicate a Yaml file and apply it. A lot of the Operators work this way. Often you’ll just need to update a resource to add a new user, or database, or other kind of object. The Operator pattern will notice the change and update the existing cluster.
Some of the cleverer companies like Couchbase have released their own proprietary Operators to manage their databases on Kubernetes.
It has really only been since the beginning of 2018 that we’ve seen the ecosystem for Operators start to pick up pace.
The other launch in October 2018 was the AWS service Operator. This could be an absolute game changer for anyone running Kubernetes on AWS. Including people using EKS.
If you look at the Github Issues for the AWS service Operator you can see just how amazing this will be. In the future your EKS control plane could come pre-installed with the AWS service operator. You could update a CRD to configure your worker node pools and the AWS service provider would spin up new EC2 instances. Perhaps this explains why there are no managed workers in EKS. If you have the control plane you could use it to configure all of your AWS resources.
In fact, this aligns with what Tim Hockin presented at the cloud native Kubernetes meetup in Warsaw last month. He talked about Kubernetes as an extensible platform that would stay cloud agnostic by moving that logic out into Controllers.
With so much development going on in this space a growing number of frameworks are emerging that we’ll discuss at the end of this blog.
Another interesting model that is emerging from the Operator pattern is GitOps application deployment. In this model you update a manifest in a Git repository and an Operator running on your cluster pulls down the appropriate Helm chart and performs the deployment. Weave Flux and Keel are very popular examples of deployment Operators. This pull based deployment approach creates a useful separation between CI tools that build and test, and the source of truth for what versions run in an environment, which is updated in Git on each successful PR merge. Another benefit is that deployment Operators often encapsulate rollback functionality.
The credit for most of this list should go to the awesome operators page. Usually I’d just have linked directly but I wanted to add another column for status and add other Operators as I find them. I was interested to see which Operators could potentially be used in production today.
The table on this page is dynamically generated from this Google sheet. If you would like to add an operator or change the status leave a comment on the cell and I’ll update it, or alternatively request access.
Production ready projects have a release version > 1.0 and often declare themselves as such in the readme. Some projects are approaching production readiness but aren’t quite there yet so they are marked as beta. Many projects have self declared their status as ‘alpha’ in the readme so I have marked them as such.
|Autoscaling||HPA Operator||Beta||Horizontal Pod Autoscaler operator for Kubernetes. Annotate, and let HPA operator do the rest.|
|Autoscaling||Cluster Autoscaler||Beta||Manage Kubernetes cluster-autoscaler deployments|
|Backup||Ark||Beta||Ark is a utility for managing disaster recovery, this operator manages the backup and restoration of cluster components (pv,pvc,deployments, etc.) to aid in disaster recovery|
|Backup||Kanister||Beta||Kanister is an extensible framework for application-level data management on Kubernetes|
|Big Data||Airflow||Alpha||A Kubernetes operator to manage Apache Airflow.|
|Big Data||MXNet||Alpha||Apache MXNet is a modern open-source deep learning framework used to train, and deploy deep neural networks. This operator manages the tools for ML/MXNet on Kubernetes.|
|Big Data||Spark (GCP)||Alpha||Kubernetes CRD operator for specifying and running Apache Spark applications idiomatically on Kubernetes.|
|Big Data||Spark (radanalytics.io)||Beta||ConfigMap-based operator for deploying ephemeral Apache Spark clusters and intelligent applications that spawn their own Spark clusters natively on Kubernetes and OpenShift.|
|Big Data||Tensorflow||Beta||Tools for ML/Tensorflow on Kubernetes.|
|Big Data||PyTorch||Beta||PyTorch on Kubernetes|
|Column Database||Cassandra (instaclustr)||Alpha||Cassandra is a free and open-source distributed wide column store NoSQL database management system designed to handle large amounts of data. This is a Kubernetes operator for Apache Cassandra.|
|Column Database||Cassandra (vgkowski)||Alpha||Cassandra is a free and open-source distributed wide column store NoSQL database management system designed to handle large amounts of data. This ia a Kubernetes operator for Cassandra cluster automation.|
|Column Database||DynamoDB||Alpha||Amazon DynamoDB is a proprietary NoSQL database service that supports key-value and document data structures. This is a Kubernetes operator for DynamoDB|
|Deployment||Keel||Beta||Kubernetes Operator to automate Helm, DaemonSet, StatefulSet & Deployment updates|
|Deployment||Flux||Beta||The GitOps Kubernetes operator|
|Deployment||Environment Operator||Beta||Kubernetes Environment Management|
|Deployment||Chart Operator||Alpha||The chart-operator deploys Helm charts by reconciling against a CNR registry.|
|Document Database||CouchDB||Alpha||Prototype Kubernetes operator for couchDB|
|Document Database||Mongo (kbst)||Alpha||MongoDB Operator for Kubernetes|
|Document Database||Mongo (ultimaker)||Beta||MongoDB Operator for MongoDB Replica Sets and Backups|
|Document Database||MongoDB (Official)||Beta||MongoDB Enterprise Operator for Kubernetes|
|Document Database||MongoDB (Percona)||Beta||A Kubernetes operator for Percona Server for MongoDB|
|Document Database||RethinkDB||Beta||RethinkDB is a free and open-source, distributed document-oriented database. This is a Kubernetes operator to manage RethinkDB instances.|
|Document Database||Couchbase (official)||Production||This is a paid product from Couchbase|
|Graph Database||ArangoDB||Beta||ArangoDB Kubernetes Operator. Start ArangoDB on Kubernetes in 5min.|
|IaaS||svcat||Beta||Service Catalog is a Kubernetes extension API that enables applications running on Kubernetes clusters to connect with service brokers and easily use external managed software offerings. For example, Service Catalog can connect to the Google Cloud Platform (GCP) Service Broker and easily provision GCP services.|
|IaaS||AWS (Giant Swarm)||Production||Manages Giantnetes Kubernetes clusters running on Amazon Web Services|
|IaaS||CloudFormation||Alpha||AWS CloudFormation is a service that helps you model and set up your Amazon Web Services resources. Using this operator run and manage CloudFormation stacks and manage AWS resources from Kubernetes.|
|IaaS||KubeVirt||Beta||Kubernetes Virtualization Operator with API and runtime in order to define and manage virtual machines|
|IaaS||OpenStack||Beta||SAP OpenStack operator creates various resources in OpenStack|
|IaaS||VPC Peering||Beta||A Kubernetes Operator to manage the lifecycle of AWS VPC Peering Connections|
|IaaS||AWS Service Operator (official)||Alpha||AWS Service Operator allows you to create AWS resources using kubectl|
|IaaS||Azure Operator (Giant Swarm)||Production||Azure operator manages Kubernetes clusters running in Giantnetes on Azure|
|IaaS||GCP Operator (paulczar)||Alpha||GCP operator for Kubernetes|
|Other||KVM||Production||Handles Kubernetes clusters running on a Kubernetes cluster with workers and masters in KVMs on bare metal|
|Java||WebLogic||Production||Oracle Weblogic Server Kubernetes Operator|
|Java||WildFly||Alpha||Wildfly Operator let's you describe and deploy JEE application on Wildfly server by creating a Custom Resource Definitions in Kubernetes.|
|KV Database||Aerospike||Alpha||Aerospike is a NoSQL distributed database. This Operator manages Aerospike clusters atop Kubernetes, automating their creation and administration.|
|KV Database||Consul||Alpha||A Kubernetes operator for consul|
|KV Database||etcd||Beta||etcd is a distributed key-value (k/v) store. This operator manages etcd k/v database clusters on Kubernetes.|
|KV Database||Infinispan||Alpha||Infinispan is a distributed in-memory key/value data store. This operator deploys and runs an Infinispan cache cluster.|
|KV Database||Memcached||Alpha||A Kubernetes operator for memcached|
|KV Database||Redis (spotahome)||Alpha||Redis Operator creates/configures/manages redis clusters atop Kubernetes|
|KV Database||Redis (jw-s)||Beta||Redis operator for Kubernetes|
|KV Database||Redis Cluster||Alpha||A Kubernetes operator for running Redis in Cluster mode|
|Lucene Database||ElasticSearch||Beta||Elasticsearch is a distributed, RESTful search and analytics engine. This operator manages one or more elastic search clusters on Kubernetes.|
|Mobile||Android SDK||Alpha||A Kubernetes operator to manage android sdk packages syncronization in a persistent volume|
|Mobile||Unifiedpush||Beta||UnifiedPush Server is a server that allows sending push notifications to different (mobile) platforms|
|Monitoring||Icinga2 Operator||Production||Icinga is an open source computer system and network monitoring application. This operator provide alerts for Kubernetes|
|Monitoring||Jaeger (official)||Production||Jaeger Operator for Kubernetes|
|Monitoring||Logging Operator||Alpha||Logging operator for Kubernetes based on Fluentd and Fluent-bit|
|Monitoring||FluentD Operator||Production||Auto-configuration of Fluentd daemon-set based on Kubernetes metadata|
|Monitoring||Prometheus||Beta||Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true. The Prometheus Operator for Kubernetes provides easy monitoring definitions for Kubernetes services and deployment and management of Prometheus instances.|
|Monitoring||Prometheus Jmx Exporter||Alpha||This operator using Jmx Exporter enables Java processes running ok Kubernetes Pods to expose metrics collected form mBeans via JMX to Prometheus.|
|Monitoring||InfluxDB (official)||Alpha||InfluxDB is an open-source time series database. This is the Kubernetes operator for InfluxDB and the TICK stack.|
|Monitoring||Sens8||Beta||Kubernetes controller for Sensu checks|
|Monitoring||Alerting Rules||Beta||Kubernetes Operator for teams to create there own alerting rules as CRDs|
|Monitoring||Grafana||Alpha||Grafana Operator for Kubernetes|
|Monitoring||Chronologist||Beta||Continuously annotate Helm releases in Grafana|
|Monitoring||Ingress Monitor||Beta||Kubenetes Operator to create external monitors for selected Ingresses e.g. with Statuscake|
|Monitoring||M3DB (official)||Alpha||Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Metrics Platform|
|Monitoring||Dynatrace||Beta||Kubernetes/Openshift Operator for managing Dynatrace OneAgent deployments|
|Networking||IPAM Operator||Alpha||An operator to assign IP for Kubernetes Namespace.|
|Networking||External DNS||Beta||Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services|
|Networking||Flannel||Beta||The flannel-operator handles flannel for Kubernetes clusters running on Giantnetes|
|Other||KubeDB||Beta||KubeDB Operator. Managed a lot of major databases on Kubernetes|
|Other||Habitat||Beta||A Kubernetes operator for Habitat services|
|Other||MultiCluster-Controller||Beta||Multicluster-controller is a Go library for building Kubernetes controllers that need to watch resources in multiple clusters|
|Other||Pod Reaper||Beta||A kubernetes operator that reaps pods that have reached their lifetime|
|Other||Kube Cleanup Operator||Beta||Kubernetes Operator to automatically delete completed Jobs and their Pods|
|Other||Service Level Operator||Beta||Manage application's SLI and SLO's easily with the application lifecycle inside a Kubernetes cluster|
|Other||Kube Replay||Alpha||Seamless integration of goReplay and Kubernetes|
|Other||Feature Branch Janitor||Alpha||Kubernetes operator used to cleanup feature branch deployments when branch has been deleted|
|Other||Kubechain||Beta||A simple blockchain implementation on top of kubernetes|
|Other||Delete NS||Beta||Delete a Kubernetes Namespace thats older than some hours|
|Other||OpenFaaS||Beta||Operator for OpenFaaS Functions on Kubernetes|
|Other||Flagger||Beta||Istio progressive delivery Kubernetes operator|
|Other||RSS Operator||Beta||A Kubernetes Operator for managing and recovering replicated workflows|
|Other||Jenkins||Alpha||Kubernetes operator for Jenkins CI|
|Other||Jira||Alpha||A Kubernetes operator to manage JIRA instances.|
|Other||Gitea||Beta||An Operator that installs Gitea.|
|Proxy||Envoy||Beta||Envoy is a Microservice Abstraction Layer (also known as an API Gateway, API Middleware or in some cases Service Mesh)Run and manage Envoy on Kubernetes simply and securely.|
|Proxy||Kong||Alpha||Manages Kong clusters on Kubernetes.|
|Queue||Kafka (krallistic)||Alpha||A Kafka Operator for Kubernetes|
|Queue||Kafka (strimzi)||Beta||Operator for running Kafka and Kafka Connect on Kubernetes and OpenShift|
|Queue||NATS||Beta||NATS is an open-source, high-performance, lightweight and secure cloud native messaging system. This operator manages NATS clusters atop Kubernetes, automating their creation and administration.|
|Queue||RocketMQ||Alpha||Create, operate and scale self-healing Rocketmq clusters on Kubernetes.|
|Relational Database||MySQL (grtl)||Alpha||MySQL is an Open Source SQL database management system. This creates a Kubernetes Custom Resource for MySQL.|
|Relational Database||MySQL (Oracle)||Alpha||MySQL is an Open Source SQL database management system. This operator creates, operates, and scales self-healing MySQL clusters in Kubernetes|
|Relational Database||MySQL (Presslabs)||Beta||MySQL is an Open Source SQL database management system. This operator manages all the necessary resources for deploying and managing a highly available MySQL cluster. It provides efortless backups, while keeping the cluster highly-available.|
|Relational Database||MySQL Operator (Banzai Cloud)||Alpha||Create, operate and scale self-healing MySQL clusters in Kubernetes.|
|Relational Database||PostgreSQL (Crunchy Data)||Production||PostgreSQL Operator Creates/Configures/Manages PostgreSQL Clusters on Kubernetes|
|Relational Database||PostgreSQL (Zalando)||Production||Create and manage PostgreSQL HA clusters on Kubernetes using Patroni|
|Relational Database||RDS||Alpha||Operator to control RDS DBs in AWS|
|Relational Database||TiDB (aliyx)||Alpha||Tidb-operator creates/configures/manages tidb clusters atop Kubernetes|
|Relational Database||TiDB (Pingcap)||Beta||TiDB operator creates and manages TiDB clusters running in Kubernetes|
|Relational Database||Vitess||Beta||Vitess Operator provides automation that simplifies the administration of Vitess clusters on Kubernetes|
|Relational Database||Percona XtraDB Cluster||Alpha||A Kubernetes operator for Percona XtraDB Cluster based on the Operator SDK|
|Security||cert-manager||Beta||Automatically provision and manage TLS certificates in Kubernetes|
|Security||rbacsync||Beta||Automatically sync groups in Googkle Cloud into Kubernetes RBAC|
|Security||RBAC Manager||Beta||This operator simplifies the management of RBAC Role Bindings in Kubernetes.|
|Security||Vault (CoreOS)||Beta||*This is now unmaintained*. See https://github.com/coreos/vault-operator/issues/332|
|Security||Vault (Banzai Cloud)||Beta||Vault secures, stores, and tightly controls access to tokens, passwords, certificates, API keys, and other secrets. This operator offers a feature rich HA Vault operator with TLS, external API based re/configuration, several/automatic unseal options and more.|
|Security||Gatekeeper||Beta||Kubernetes Operator to manage Dynamic Admission Controllers using Open Policy Agent|
|Security||Network Policy||Beta||Auto create common network policy in all namespaces|
|Security||Regsecret||Beta||Kubernetes operator to automate imagePullSecrets creation|
|Security||Replicator||Alpha||Kubernetes operator that copy secrets in all namespaces|
|Security||Random Secret Operator||Beta||Creates Secrets containing random data|
|Security||lessor||Alpha||A Kubernetes Operator for managing multi-tenant workloads|
|Security||Cert Operator||Beta||cert-operator creates/manages certificates for Kubernetes clusters running on Giantnetes|
|Security||SSO Operator||Production||Single Sign-On Kubernetes operator for Dex identity provider|
|Security||Dex||Alpha||A Kubernetes operator for Dex|
|Security||Falco||Alpha||Kubernetes operator for Sysdig Falco that allows developers to manage rules for detecting intruders and backdoors|
|Security||AWS Secret Operator||Alpha||A Kubernetes operator that automatically creates and updates Kubernetes secrets according to what are stored in AWS Secrets Manager|
|Security||Externalconfig Operator||Alpha||An operator to fetch configuration data from cloud services and inject it in Kubernetes|
|Security||Certmerge||Alpha||a Kubernetes Operator that can merge many TLS secrets inside one Opaque secrets|
|Service Discovery||ZooKeeper (Nuance Mobility)||Alpha||ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. This is an operator for ZooKeeper 3.5.x|
|Service Discovery||ZooKeeper (pravega)||Alpha||This operator runs a Zookeeper 3.5 cluster, and uses Zookeeper dynamic reconfiguration to handle node membership.|
|Storage||PVC||Beta||This operator helps to use Kubernetes Persistent Volumes easier on cloud providers by dynamically creating the required accounts, classes and more.|
|Storage||Quobyte||Alpha||Quobyte’s next-generation file system unifies file, block and object storage for enterprise and scientific applications.|
|Storage||Rook||Alpha||File, Block, and Object Storage Services for your Cloud-Native Environment|
|Storage||Gluster||Alpha||A Kubernetes/OpenShift operator to manage Gluster clusters|
|Storage||Minio||Alpha||Minio Operator for k8s|
|Storage||Pravega||Alpha||Pravega Kubernetes Operator|
|Testing||Netperf||Alpha||This is a very simple operator that can be used to test network performance between 2 pods using the netperf tool. It is also a good operator for learning puposes, as the code base is pretty small and it's described in detail in this blog post.|
|Testing||Kaos||Alpha||Kinda Chaos Monkey for Kubernetes|
Out of almost 120 projects only 9 are production ready. It is a bit disappointing to see that even the original two operators (etcd and Prometheus) still aren’t > version 1. It is however promising to see a couple of Postgres Operators available today, as well as Jaeger. It’s still early days for Operators so I’m hopeful.
There are a few that I think are close to production ready. ArrangoDB is production ready on GKE, EKS and PKS but not on anything else. Etcd and Prometheus look stable but lack the full set of feature coverage.
Many people are using cert-manager even though it’s in beta. I suspect a few people will be using the vault Operators too. It’s worth evaluating each of these projects individually for yourself.
You may be wondering how you would create your own Operator. The best place to start is the official Kubernetes sample controller and especially the /docs directory.
For those looking for a way to speed up development there are a few options.
|Launched||August 2018||May 2018||August 2017|
A very good blog comparing all three options was written by Adrien Trouillaud and has been kept updated since.
KubeBuilder has some really awesome documentation and is backed by a Kubernetes SIG so if you do decide to pick a framework then this may be the best choice long term.
The potential awesomeness of Kubernetes Operators should be clear. We have a handful of real world examples that could be considered production ready today. However, the vast majority of Operators that exist need a lot of work. You’ll need to up-skill your team in order to create and iterate high quality Operators but the pay-off should in theory result in far less manual system administration activities.
The recent release of KubeBuilder and work ongoing in the sig-api-machinery group to create a standard platform SDK should help a lot.
If you’d like to continue reading here’s a sheet with some links and descriptions.
Tell us about a new Kubernetes application
Never miss a thing! Sign up for our newsletter to stay updated.
Discover and learn about everything Kubernetes