This chart bootstraps a kiam deployment on a Kubernetes cluster using the Helm package manager. kiam runs as an agent on each node in your Kubernetes cluster and allows cluster users to associate IAM roles to Pods.

Features

  • No client SDK modifications are needed: Kiam intercepts Metadata API requests.
  • Separated Agent and Server processes. Allows user workloads to run on nodes without sts:AssumeRole permissions to enhance cluster security.
  • Denies access to all other AWS Metadata API paths by default (but can be whitelisted via flag)
  • AWS credentials are prefetched to allow fast responses (and avoid problems with races between Pods requesting credentials and the Kubernetes client caches being aware of the Pod)
  • Multi-account IAM support. Pods can assume roles from any AWS account assuming trust relationships permit it
  • Prometheus and StatsD metrics
  • Uses the Kubernetes Events API to record IAM errors against the Pod so that cluster users can more readily diagnose IAM problems (via kubectl describe pod …)
  • Text and JSON log formats

Iterating for Security and Reliability

Kiam bridges Kubernetes’ Pods with Amazon’s Identity and Access Management (IAM). It makes it easy to assign short-lived AWS security credentials to your application.

We created Kiam in 2017 to quickly address correctness issues we had running kube2iam in our production clusters. We’ve made a number of changes to it’s original design to make it more secure, reliable and easier to operate. This article covers a little of the story that led to us creating Kiam and more about what makes it novel.

Kiam Improvements

Kiam now offers:

  • Increased security by splitting the process into two: an agent and server. Only the server process needs to be permitted to perform sts:AssumeRole. Cluster operators can place user workloads on nodes with only essential IAM policy necessary for kubelet. This guards against privilege escalation from an application compromise.
  • Prefetching credentials from AWS. This reduces response times as observed by SDK clients which, given very restrictive default timeouts, would otherwise cause clients to fail.
  • Load-balancing requests across multiple servers. This helps deploy updates without breaking agents. We observed SDK behaviour in production where applications would fail as soon as the proxy was restarted even when applications held valid, unexpired credentials. It also protects against errors interacting with an individual server.
  • These changes are the most significant parts where Kiam deviates from kube2iam. Most are also largely a result/benefit of separating Kiam into two processes: server and agent. I’ll now go through each in a little more detail.

What makes Kiam novel

I’d now like to explain a little more about how Kiam’s novel design mitigates these issues.

The necessity of Prefetching

  • Kiam uses Kubernetes’ client-go cache package to create a process which uses two mechanisms (via the ListerWatcher interface) for tracking pods:
  • Watcher: the client tells the API server which resources it’s interested in tracking and the server will stream updates as they’re available. Think of these as deltas to some state.
  • Lister: this performs a (relatively expensive) List which retrieves details about all running pods. It takes longer to return but ensures you pick up details about all running pods, not just a delta.

As Kiam becomes aware of Pods they’re stored in a cache and indexed using the client libraries’ Indexers type. Kiam uses an index to be able to identify pods from their IP address: when an SDK client connects to Kiam’s HTTP proxy it uses the client’s IP address to identify the Pod in this cache.

It’s important then that when an SDK client attempts to connect to Kiam the Pod cache is filled with the details for the running Pod. Based on the Java client code we saw above Kiam has up to 5 seconds to respond with the configured role and so, by extension, Kiam has 5 seconds to track a running Pod.

If Kiam can’t find the Pod details in the cache it’s possible the details from the watcher haven’t yet been delivered (but may eventually be). Inside the agent we include some retry and backoff behaviour that will keep checking for the pod details in the cache up until the SDK client disconnects. Ideally the pod details will either be filled by the watcher or lister processes within time.

Kiam’s retries and backoffs use the Context package to propagate cancellation from the incoming HTTP request down through the chain of child calls that Kiam makes. This cancellable context lets us wait as long as possible for the operations to succeed and has been hugely helpful for writing a system that honours timeouts and retries.

Credential prefetching

Alongside maintaining the Pod cache the other responsibility of the Server process is to maintain a cache of AWS credentials retrieved from calling sts:AssumeRole on behalf of the runnings pods.

Originally, to keep things simple and obvious, Kiam used to request credentials upon request. When a client connected we would make a request to AWS in-band, store the fetched credentials in a cache and then keep refreshing them as long as the Pod was still running. But, as we saw above, the expectations from AWS SDK clients is that the metadata API returns very quickly. Kiam and Kube2iam both use Amazon STS to retrieve credentials which is quite a bit slower than the metadata API.

We have metrics tracking the completion times for the sts:AssumeRole operation and generally it’s very stable: typical 99th percentile times of 550 milliseconds. Every now and then though it goes beyond that. Below is a plot from our monitoring showing an increase in both max (yellow) and 95th percentile (blue) to well over a few seconds. Although this was a few months ago this isn’t atypical (it’s just taken me a long time to write this article :).

Millisecond durations for AWS sts:AssumeRole operations

It’s quite normal for such spikes to happen but it’s problematic if we hit slow responses from AWS when fetching credentials in-band: a slow response from STS would cause us to propagate a failure to our SDK clients given their strict retry and timeout policies. To mitigate this Kiam prefetches credentials.

When a Pod is tracked through an update from a Kubernetes API watcher or from a full sync it’s added to a buffered channel with the prefetcher on the other side. The prefetcher requests credentials and stores them in the credentials cache ahead of the client requesting them.

Prefetching is an optimisation: if a pod requests credentials for a role and the credentials cache doesn’t already have them they’ll be requested again. We also use a Future to wrap around the AssumeRole operations within the cache to avoid requesting credentials for the same role repeatedly while waiting for credentials to be issued.

Prefetching helps us to smooth out interactions with the STS API and return responses to clients far quicker. To see just how much of a difference there’s another plot a little later that covers the same period as above.

Reduced AssumeRole API growth

Prefetching was one of the drivers for why we chose to split Kiam into server and agent processes: it would’ve been too costly for us, in terms of time and volume of AWS API calls, to do when running as a homogenous process where each process on each node would request credentials for all roles.

In the original homogenous model the number of sts:AssumeRole API calls would grow as Nodes * Roles: adding an additional node or role results in more than 1 additional call and, on a large dynamic cluster, this could be quite significant. Historically we’d also seen our STS operation durations suffer when we hammered it.

One solution is for each node to only requests credentials that it’s currently using. If the proxy process restarted (because of an upgrade, failure etc.) it would also lose all credentials and refilling the caches could be slow. We’d observed previously that SDK clients had immediately raised exceptions in such a situation- causing clients to error.

Kiam’s novel Agent/Server separation causes sts:AssumeRole calls to grow as Servers * Roles but given Servers are normally pinned to a subset of nodes, with a near constant number of replicas, API growth can be simplified to linear with regards to Roles.

Fetching credentials out-of-band before a client requests them reduces the impact of a slow AWS response. Running multiple servers that fetch the same credentials provides a degree of resilience to an individual server failure to track a pod/fetch credentials: the agent can take credentials from the first server that responds successfully and can retry the operation safely across servers.

Performance Summary

Running multiple servers with redundant caches and prefetching credentials improves the chances the Kiam HTTP proxy performs within the expectations of the AWS SDK client.

Earlier I showed a plot highlighting a period of severely slow responses from the STS API that would cause any SDK client with default timeouts to fail. The plot below covers the same period as before but shows the Kiam agent’s response times. The max (yellow) is just over 1 second (when AWS’ response was over 10 seconds) and the 95th percentile (blue) time is closer to 47ms. Both are well within the limits of the SDK clients.

Increased Security

As mentioned earlier, Kiam and Kube2iam both followed the same deployment model at the beginning: a DaemonSet that installed itself via iptables on every node in the cluster. Such a deployment requires all nodes running user pods to have IAM policy attached that permit the sts:AssumeRole call. Having all nodes able to assume any role though is problematic in the event of a node being exploited: a host exploit would open access to any role. This is especially undesirable when we want, as far as possible, to run soft multi-tenant, multi-environment clusters.

Carving out a separate Server process meant we could run the server processes on a subset of machines and only they would need sts:AssumeRole. Nodes running user workloads would only require IAM permissions needed for kubelet and nodes running the server processes wouldn’t be permitted to run any user processes.

gRPC communications between the agents and servers are protected with TLS and mutual verification to allow only the agent, and the server’s health checker, to interact with the server.

Deploying updates

When kiam was still deployed as a single-process DaemonSet we had problems rolling out updates.

Pods would, via their SDK client, have fetched temporary credentials successfully but, because they’re session-based credentials, would also continually attempt to refresh them in the background. When this background refresh failed because the Kiam process was unavailable the SDK’s would immediately invalidate the credentials being used and throw errors.

Applications would immediately see their AWS operations fail despite having valid credentials. This problem hit us a number of times and caused a lot of pain for the various service teams that use the clusters.

Running separate Server and Agent processes made it easier for us to independently deploy changes. If a server is updated we can often deploy them without affecting the agent processes at all. If the agent needs to be updated, however, it can restart far quicker as it no longer needs any caches with credentials or pod data- any such requests are immediately forwarded to the server processes.

Tell us about a new Kubernetes application

Newsletter

Never miss a thing! Sign up for our newsletter to stay updated.

About

Discover and learn about everything Kubernetes

Navigation