We’ll compare Docker Engine vs CRI-O vs CRI Containerd vs gVisor vs CRI-O Kata Containers. The first three are traditional container runtimes that start containers in their own namespace. The latter two are new runtimes that provide extra isolation. Here’s a quick overview of the differences.
The term container runtime itself is a little ambiguous. For practicality purposes we’re going to define a container runtime as ‘the option somebody selects’ for their Kubernetes cluster. If you’d like to dig a little deeper and understand the subtle differences this is a good blog to read. Essentially a container runtime is responsible for pulling down images from a registry, expanding them on disk and then executing them in a namespace.
In this article we’ll divide up the comparison into three. First we’ll compare Docker Engine vs CRI-O vs CRI Containerd since these all use the same underlying technology (runC) to start containers.
Then we’ll take a look at gVisor and CRI-O Kata Containers separately as these work in a different way.
I’d hazard a guess that almost everyone reading this article is using Docker Engine for their container runtime. It’s not a bad runtime and has certainly matured over the past few years. Everyone is familiar with Docker.
However, the days of Docker Engine as a native runtime on Kubernetes are coming to an end. Kubernetes is pushing the CRI plugin system. A few distributions have already started to migrate to CRI-O which is a stripped down runtime created by Redhat. There’s no big Docker daemon in CRI-O, although it uses runC under the covers which Docker created and maintains.
OpenShift is the most notable of the Kubernetes distributions to move to CRI-O. This probably isn’t surprising since Redhat created both OpenShift and CRI-O. All reviews that I’ve read have been positive. The reviews state that it’s easy to install and is stable. As with all of the CRI plugins you’ll need to use the new command line utility ‘crictl’ to interact with containers on the workers. The arguments are mostly the same as with Docker and it comes with some extra commands that let you list pods and other Kubernetes native entities.
Some might say that the CRI Containerd plugin is the natural upgrade path from Docker Engine. The latest version of Docker Engine ships with the CRI plugin by default. You can simply install CRI along with the latest Docker Engine on your workers and point your Kubelets at the Containerd plugin. Docker on the minion works with both Kubernetes via the Kubelet and you can start Docker containers on the host as usual. It does some clever namespacing stuff to keep both methods of starting a container working at the same time.
Google seem to be supporting the Containerd project and have recently announced beta support on GKE. Containerd was started by Docker but has been spun out as a completely independent entity. It no longer requires the Docker daemon to run on the worker nodes.
Some interesting benchmarks point to Containerd being faster than CRI-O.
What about rkt? I was quite sad to see that this is now largely dead. This is a massive shame because rkt was arguably the most elegant technical solution. It didn’t require a daemon and instead the client would initiate containers directly under SystemD. It had no need for pause containers and it understood pods natively. While I’m not sure exactly what happened I can see no updates to rktlet which is the rkt CRI plugin since early this year. I suspect that since Redhat acquired CoreOS and also created CRI-O they chose to back CRI-O.
From a security perspective none of these options are great. Isolation between containers involves managing seccomp, SELinux or AppArmor which very few companies do given the complexity. While all containers share the host kernel you are always one kernel bug away from being totally compromised.
In May 2018 Google released gVisor which is supposed to address many of the security concerns around container isolation. You can enable the runsc runtime in Kubernetes which will start containers with their own user space kernel.
I found a very interesting design document that details the motivation, use cases and various trade-offs between isolation solutions. Essentially, Google wanted to add two additional layers of security around the host Kernel. The first layer being a user space kernel that all syscalls would be filtered through. The second layer being well defined seccomp policies around the user space kernel. Attackers would effectively need to break out of the container, and then out of the protection around the container.
In theory gVisor should be very low overhead, adding a negligible amount of start time, and no increase in memory footprint. The trade-off for security comes at the cost of syscall performance. The Linux kernel has approximately 400 syscalls and at the time of writing gVisor has support for approximately 60% of them. This includes the most popular ones so most applications work, although there are some exceptions like Postgres listed on the project readme as not being supported yet.
For some applications the syscall performance can be terrible. Memory and CPU should operate at host performance. However, storage and network performance can be 10 – 30x slower. Given that Kubernetes runs mostly stateless web applications the only IO that is important is the network IO. gVisor has a network passthrough mode that can be enabled to provide host level access to the network namespace. This comes at the cost of reduced container isolation.
For some background context gVisor is actually the same code that Google have been using internally to isolate their containers. It is also currently in use on GCloud AppEngine which is the Google cloud serverless offering. I would say that gVisor is still very much beta software when we’re talking about installing it on Kubernetes.
Back in May 2015 some Intel grey beards posted an article about speeding virtual machines up so they match the speed of containers. Everyone ignored them because Docker was the new hotness. However, these guys will end up getting proved correct soon.
The initial article stated that they had managed to boot a virtual machine in 150ms and it presented a memory overhead of 15 – 20mb. They anticipated that this could be reduced further with some work. That work has now been done and start times are below 100ms.
Kata Containers are a merger of two projects that were trying to achieve the same goal, Clear Containers by Intel and Hyper Containers by HyperHQ. HyperHQ also has their own Container Orchestration product called Hyper.sh (which has unfortunately closed down) which supports Kata Containers.
With Kata Containers each container starts up in its own virtual machine (hypervisor) and runs one or more containers inside it. You get the benefit of full security isolation and the trade-off on start time and memory consumption isn’t all that high.
The fastest virtualization layer on Linux x86 is Qemu-lite which includes several optimizations to reduce start time and memory footprint. You can read more about the Kata Container architecture here.
It’s true that Clear Containers and Hyper may not compare in popularity to Docker as we know it, but according to the Katacontainers FAQ it has been used massively in China, for example “JD.com, China’s largest ecommerce company”. That’s huge.
Kata Containers only went to v1 in May this year. You can run Kata Containers side by side with normal containers on the same hosts. I suspect all we need is more time for people to try it out and report success or failures.
In a more recent development AWS open sourced their Firecracker MicroVM which is used behind the scenes for Lambda. Kata Containers will likely move to this in future to replace the Qemu layer.
There are other plugins that run under CRI but when I started to read about them they are so immature that it’s not worth reviewing them yet.
Docker Engine isn’t a terribly bad solution and probably isn’t worth the hassle of migrating until Kubernetes decides to rip out the native integration. For those starting new clusters I’d probably look into CRI-O as this is stable, simple and removes the Docker daemon from your hosts.
Many, including Google are betting on the Containerd CRI plugin as the potential winner in this space. Containerd was spun out from Docker in 2017 and is itself very stable. It has gone through a few iterations of packaging on Kubernetes and is now a plugin that runs under CRI. It’s worth evaluating this in a development environment. You could end up being ahead of the curve if this does end up becoming the most stable and performant runtime.
For gVisor I’d probably watch this for a while. The companies who are invested in serverless hosting will undoubtedly iterate gVisor because it’s in their interest to binpack as many web app containers onto a host as possible. Perhaps in 6 – 12 months this will become a viable solution for the rest of us.
Kata Containers are definitely worth trialling now. Although the Kubernetes interface is new, the underlying technology under Kata Containers is really old and stable. I can very easily imagine a future where clusters run a mix of containers and Kata Containers depending on security context. The overhead is much smaller than I had anticipated and the benefits are extremely good.
As ever, if I have got anything wrong or missed any option you’d like added to this blog please leave a message below or contact me via the contact form.
This topic is both immense and incredibly interesting. Here are some extra blogs to read. Be warned, the landscape moves so fast that even blogs less than 6 months old are now out of date.
Tell us about a new Kubernetes application
Never miss a thing! Sign up for our newsletter to stay updated.
Discover and learn about everything Kubernetes