90 days of AWS EKS in Production

I’ve been using EKS in production for a small number of months now and so far, so good. Really impressed by the simplicity of getting a cluster up and running and ready for workloads. AWS provide a great Getting Started Guide on their website, which is super duper for getting your head around the components and glue required for getting EKS stood up.

EKS is a very vanilla service, giving users a cluster that conforms to CNCF standards, which Kubernetes purists will be very happy with, however, don’t think that because AWS provides Kubernetes as a service, you no longer have to worry about getting your nodes optimised and ready for your heavy workloads. You should consider an EKS worker node to be the same as a standard, out of the box, EC2 node. If you commonly make optimisations or do hardening, or install software that your company requires for their standards, you should still do all that on EKS.

Fortunately, AWS provides the means to do that in a very straightforward way. AMIs provided by AWS for standing up your EKS workers, contain a bootstrap file at /etc/eks/bootstrap.sh, which is called from UserData when you boot, or an AutoScalingGroup boots node. You can use that UserData to edit arguments passed to this script in your LaunchConfiguration.

A busy Kubelet has a lot of things to do. Not only is it running your actual, very mixed workloads, it’s collecting data and metadata from your applications, dealing with security and auth, managing your network stack and of course telling docker how and when to run containers.

So, with no further ado, these are optimisations, suggestions and considerations for people looking at getting EKS into production. I can’t take credit for having the brain power to come up a lot of these things, so credit inline for specific things I’ve found in the wild.

Reserving Resources For The System and Kubelet

Viewing kubectl get nodes, when we were in a busy period, I noticed nodes in  NotReady state, which I believed was caused by docker itself being exhausted of system resources. I needed to set a few flags when starting the kubelet to ensure there was always enough memory, cpu and disk for vital system resources, and docker itself, to run. The figures suggested by the Kubernetes docs guided us here and seemed completely reasonable.

You’re about to start reserving non-trivial amounts of RAM before the kubelet gets a shot at running your applications, so it’s probably best to not use anything less a large type instance.

If you want to use smaller, you could write some bash to parse /proc/meminfo and make a calculation based on how much ram the system has.

These lines can be placed into the UserData as arguments for  boostrap.sh.

Network Stack Optimisation

Rather than reinventing the wheel I went hunting the web for good examples.

A great resource I found was here:

https://blog.codeship.com/running-1000-containers-in-docker-swarm/

Docker Swarm, at scale, yeah, that works for my purposes.  Thank you, Tit Petric.

DNS lookup scaling

Out of the box, AWS provides a kube-dns deployment containing a single pod of scale 1.  After a week or so in production, I was skimming our logs and came across this beauty. This reinforced something I had seen in our exception handling system.

Wow, we were doing 150 DNS queries/sec??  As it turns out, we were. This is a copy of the resolv.conf I found in our containers:

For each DNS lookup, in our busy applications, we needed to do A and AAAA lookups for each of those in the search field. I counted 10 lookups for each resolution, followed by 2 more for the actual , recursive lookup.  tcpdump confirmed this.

I undertook the following actions, some of which will annoying the purists, I’m sure.

  • Scale up the kube-dns deployment.

  • Knowing that we don’t use Kubernetes short names in our cluster, we curated our own resolv.conf.  Kubernetes merges the system DNS into its own resolv.conf  We decided to shrink it to use only kubernetes generated config and not try and use our externally defined search path from our VPC DHCP config.  This shortened the list in the search line to 3.

  • Fully qualified DNS names where we could, meaning the getaddrinfo() call would not go through the resolution stack fully.

  • Where we were happy for it to happen, we updated the dnsPolicy inside our deployments where we knew we had a high frequency of resolutions.

  • Disabled IPv6 (This won’t disable resolution attempts, but it brought it to our attention that it may confuse things in our fairly old applications).

Extra Bootstrapping Steps

Authentication

When you create an EKS cluster, the user that actually creates the cluster is the only one that can access the cluster by default.  Even IAM Administrator users can’t login.  This seems to be due to the way AWS bootstraps the cluster behind the scenes in AWS-land.  Once the cluster is up, you can add users as per normal as you might in any Kubernetes setup.

My recommendation, in a production environment, to use an IAM role to create the cluster.

Once the cluster is created, this auth configuration creates an admin role called mycluster-admin inside your cluster, mapped to an IAM role outside your cluster.

Once you’ve done this, should you have a catastrophe, without user credentials, you’ll be able to get access to the cluster.

Extra Kubelet Args

As mentioned, the UserData is the place to do configuration for your extra Kubelet config.  Appending your bespoke config to the /etc/eks/bootstrap.sh file.

Here’s how ours ended up looking, using some of the options mentioned here. I add a good few labels to make sure I can get pods ending up in the right place.

Subnet Design

EKS uses the amazon-vpc-cni-k8s network plugin which assigns an IP address from the host ENI (Amazon lingo for a network interface) to each pod running on that node. There are a couple of things to consider.

Depending on what instance type you use it will determine the number of ENI’s available and therefore maximum number of pods. The maximum pods you can schedule on an instance will be the maximum interfaces multiplied by the IP addresses per interface.  The Kubelet bootstrap script uses a file at /etc/eks/eni-max-pods.txt (link) to make sure your Kubelet doesn’t try and run more pods than IPs are available.

Part of the design of the CNI plugin means that there’s a cooling off period after pod termination before IPs are returned to the pool and are available again. In your design, you should specify new subnets for your workers that are good deal larger than the default /24 range.  A decent plan might be that you use a /16 VPC, with /19 internal subnet ranges for your worker nodes, and /21 ranges for your public facing subnet ELB/ALB ranges.  This should give you plenty of scope to run pods that you haven’t even thought of running yet.

The takeaway here is that you’ll use more IPs than you think.

Our cluster subnets look like this.  I quite like the idea of having a data subnet.  I don’t assign any internet route for it, but if there’s anything that I don’t want to have internet access, then it can go in there.

UserData Configuration In Full

This is the final user-data we used in our launch configuration for the EKS worker nodes.

In Summary….

EKS has worked out well for us over the past few months and we’ve had no problems with stability or control plane performance. I know that many people are waiting for EKS to improve before jumping on. AWS has a history of releasing early and iterating based on feedback.

Some major concerns that we had were:

  • A proven upgrade process
  • Access to the control plane logs
  • Managed workers
  • Better integration with IAM

EKS only supports Kubernetes 1.10 but has released two updates since launch. The first was to add the aggregation APIs so that HPA and metrics server would work. The second fixed the critical API server vulnerability. Both were seamless, hands-off upgrades and I’m hopeful that when Kubernetes 1.11 is available it will be just as easy.

Not having access to the master or etcd’s (or equivalent) logs is a bit annoying, although, saying that, I’ve coped without thus far.  AWS have stated that they will make these available via CloudWatch soon.

As written already in this blog, I did need to perform a bit of manual worker tuning that perhaps isn’t required on GKE and AKS. These are all settings that could theoretically be added to the EKS optimised AMI or are trivially added via launch configuration.  You could even build your own!  Overall the worker management isn’t too difficult and certainly isn’t a showstopper.

We’re using the IAM Authenticator to authenticate to the API server and kube2iam to give pods access to AWS resources. We’re also using the ALB ingress controller which makes configuring routes into the cluster easy. This is all basic but is enough to get by for now.

About the Author

This is a guest post by Graham Moore, a senior DevOps and certified AWS architect who has worked on contracts for numerous high profile technology companies in and around London. Add him on LinkedIn if you’d like to discuss cloud consulting projects.

Related

I bought the Kubedex.com domain on the 5th of September and quickly got to work on the website. Today marks a full 7…

  • Blog
  • 2.1K
Read more

Blog UpdatesSince the last update there have been 4 new blogs. Three that I wrote and a guest blog from another…

  • Blog
  • 313
Read more

Blog Updates At the beginning of this week I saw the usual monitoring related threads on Reddit. They are always the…

  • Blog
  • 1.4K
Read more

Tell us about a new Kubernetes application

Newsletter

Never miss a thing! Sign up for our newsletter to stay updated.

About

Discover and learn about everything Kubernetes

Navigation