Kubernetes

Walk-through of the Skyscrapers’ Kubernetes cluster

Here you’ll find user-level documentation of our Kubernetes reference solution. You’ll learn how to access the cluster, how to deploy applications and services and how to monitor them. You’ll also find some tips & tricks on how to get the most from your Kubernetes cluster.

The best place to start is this walk-through. After that you can read more in-depth documentation on specific topics in their respective files in this folder.

If you are new to Kubernetes, check the getting started page for more information.

Requirements

Optionally you can install some other tools to make your life easier. These are documented in the tools page.

Authentication

To gain access to an EKS cluster you need to authenticate via AWS IAM and configure your kubeconfig accordingly. To do this you’ll need a recent version of awscli (>= 2.13.0). If you don’t have the AWS CLI yet, you can install it by following the AWS instructions or via Homebrew/Linuxbrew:

brew install awscli

You’ll first need to authenticate to the AWS account where the EKS cluster is deployed (or your Admin account if you use delegation). Depending on how you configured your awscli config, --region and --profile are optional.

Make sure to replace <my_assumed_role_arn> with a correct role depending on your access level. Which roles you can assume are documented in your customer-specific documentation.

aws eks update-kubeconfig --name <cluster_name> --alias <my_alias> [--role-arn <my_assumed_role_arn>] [--region <aws_region>] [--profile <my_aws_profile>]

# For example:
aws eks update-kubeconfig --name production-eks-example-com --alias production --role-arn arn:aws:iam::123456789012:role/developer

Deploying on Kubernetes with the Helm Package Manager

After a roll out of a Kubernetes cluster, it could be tempting to start executing numerous kubectl create or kubectl apply commands to get stuff deployed on the cluster.

Running such commands is a good idea to learn how deployments are done on Kubernetes, but it is not the appropriate tool to construct fully self contained application deployments. The Kubernetes community came up with a separate tool for that:

The Helm Package Manager

With Helm, you create self contained packages for a specific piece of a deployment, e.g. a web stack. Such packages are called Charts in Helm terminology.

You probably want such a stack to be deployed in a specific Kubernetes namespace, with a specific configuration (ConfigMap), defining a Kubernetes Service referring to a Deployment. But if you want to have this setup reproducible, you need a way to parameterize this. By using a Template engine, a Go function library and the use of a Values.yaml file, you can build a template of a specific piece and re-use that for multiple deployments.

The Helm documentation is quite good and explanatory, and the best practices section highlight some of the important topics around chart development.

Good examples always help out a lot. Here is a list of existing git Charts repositories:

The above Chart repositories contain Charts that serve as building blocks for bigger composite installations.

Ingress

See Ingress specific documentation for more information on how to use Ingress controllers on your Kubernetes cluster.

IAM Roles

EKS IAM roles for Service Accounts is used by default.

Your deployments can be assigned with specific IAM roles to grant them fine-grained permissions to AWS services. To do that you’ll need to create a Service Account for your Pod and annotate it with the IAM role to use. For example

apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::889180461196:role/kube/staging-eks-example-com-myapp
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels: {}
spec:
  selector:
    matchLabels: {}
  template:
    metadata: {}
    spec:
      serviceAccountName: myapp
      # Important to set correct fsGroup, depending on which user your app is
      # running as.
      # https://github.com/aws/amazon-eks-pod-identity-webhook/issues/8
      securityContext:
        fsGroup: 1001
        # For completeness you can add the following too (not required)
        #runAsNonRoot: true
        #runAsGroup: 1001
        #runAsUser: 1001

It’s important to use a recent AWS SDK in your application for IRSA support.

You can find more examples and technical documentation in the official documentation: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-technical-overview.html

For JAVA-based applications IRSA does not work out of the box, you need to do the following change for your application, quoting from https://pablissimo.com/1068/getting-your-eks-pod-to-assume-an-iam-role-using-irsa :

you need to add an instance of STSAssumeRoleWithWebIdentitySessionCredentialsProvider to a credentials chain, and pass that custom chain to your SDK init code via the withCredentials builder method. This class doesn’t automatically come as part of the credentials chain. Nor does it automatically initialise itself from environment variables the same way other providers do. You’ll have to pass in the web identity token file, region name and role ARN to get it running [!NOTE] Usually a Skyscrapers engineer will create the required IAM roles and policies for you. It’s important that we match your ServiceAccount to the IAM policy’s Condition. If you manage these policies yourself, it’s important to setup the IAM role with the correct federated trust relationship. For example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:${namespace}:${service-account-name}"
        }
      }
    }
  ]
}

Storage

Persistent Volumes

Persistent Volumes in our cluster are backed by AWS EBS volumes. Among the obvious caveats around scheduling (a volume is limited to a single AZ), there’s also a more silent and hard to predict caveat.

Depending on EC2 instance type, most of them support a maximum of only 28 attachments, including network interfaces, EBS volumes, and NVMe instance store volumes.. This means that only a limited number of EBS volumes per K8s node can be used, also considering our CNI uses multiple network interfaces.

Kubernetes limits the max amount of volumes for M5,C5,R5,T3 and Z1D to only 25 volumes to be attached to a Node, however this often isn’t enough depending how much network interfaces are in use by the CNI.

Unfortunately AWS doesn’t throw an error either when this happens. Instead a Volume will stay stuck in the Attaching state and your Pod will fail to launch. After ~5 minutes Kubernetes will taint the node with NodeWithImpairedVolumes.

We have added a Prometheus alert to catch this taint and you can follow the actions described in the runbook when this happens.

Local NVMe Instance Storage

Certain AWS EC2 instances come with fast local NVMe Instance Storage and can usually be recognized with the d suffix (eg. m5d.large). Our platform will automatically mount these volumes under the /ephemeralX paths (eg. /ephemeral0, /ephemeral1, …).

You can use this storage by mounting it via a hostPath volume in your Pod spec, for example:

apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  containers:
    - name: test
      image: k8s.gcr.io/test-webserver
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
      volumeMounts:
        - name: ephemeral
          mountPath: /fastdata
          subPathExpr: $(POD_NAME)
  volumes:
    - name: ephemeral
      hostPath:
        path: /ephemeral0

It is important to note here that in the example we use the K8s Downward API with subPath expansion so each Pod uses it’s own subfolder under the /ephemeral0 path.

Monitoring

Cluster and application monitoring is a quite extensive topic by itself, so there’s a specific document for it here.

Logs

Cluster and application logging is a quite extensive topic by itself, so there’s a specific document for it here.

Cluster updates and rollouts

As part of our responsibilities, we continuously roll improvements (upgrades, updates, bug fixes and new features). Depending on the type of improvement, the impact on platform usage and application varies anywhere between nothing to having (a small) downtime. Below an overview for the most common types of improvements. More exceptional types will be handled separately.

Type of improvementDescriptionExpected impact on your workloads
Add-ons (non-breaking)Improvements expected to not have an impact on the current usage of the cluster or application behaviour.These are rolled out automatically at any time during the day. You are informed during the updates.No impact
Add-ons (non-breaking but disruptive)Improvements to add-ons that may lead to temporary unavailability of platform functionalities (monitoring, logging, dashboard, etc) but that do not impact application workloads.These are rolled out automatically at any time during the day. You are informed before and during the updates.No impact
Add-ons (breaking)These improvements may need changes or intervention by you before they can be rolled out.We will reach out to you to discuss what’s needed on how the improvement will be rolled out.In some cases: minimal planned downtime
Cluster improvementsLow-frequency improvements to the foundations of the cluster. Usually these involve rolling updates leading to nodes being recycled.These are rolled out automatically at any time during the day. You are informed before and during the updates.Cluster-aware workloads: No impactOther workloads: potential minimal downtime

To minimize the impact on your workloads, we suggest you to implement cluster-aware workloads as much as possible (TODO: define cluster-aware) and implement PodDisruptionBudgets. There’s more information on this here.

Cronjobs

Kubernetes can run cronjobs for you. More information/examples about cronjobs can be found here.

Cronjob Monitoring

Monitoring for cronjobs is implemented by default. This is done with prometheus and will alert when the last run of the cronjob has failed.

The following alerts are covering different failure cases accordigly:

  • KubeJobCompletion: Warnning alert after 1 hour if any Job doesn’t succeed or doesn’t run at all.
  • KubeJobFailed: Warning alert after 1 hour if any Job failed
  • KubeCronJobRunning: Warning alert after 1 hour if a CronJob keeps on running

Clean up

Starting from Kubernetes 1.7 the scheduled jobs don’t get automatically cleaned up. So make sure that you add the following two lines to the spec section of your cronjob.

successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3

This will clean up all jobs except the last 3, both for successful and failed jobs.

Accessing cluster resources and services locally

One of the main challenges developers and operators face when using Kubernetes, is communication between cluster services and those running in a local workstation. This is sometimes needed to test new versions of a service for example, or to access a cluster service that’s not exposed to the internet.

There are multiple solutions to overcome this, depending on the use-case and the specific requirements, although a nice all-round tool that covers most of the use-cases is Telepresence.

Telepresence creates a proxy tunnel to a Kubernetes cluster, allowing you to directly communicate with cluster services and Pods as if they were running in your local network. Head over to the documentation to know more on how it works and how to use it.

Telepresence works out of the box with our managed Kubernetes clusters that are not behind VPN and you can start using it right away on such clusters. For those private clusters that are behind OpenVPN, there’s an issue affecting DNS resolution when using Telepresence. We’re looking into that issue and we’ll update this documentation once we find a solution for it.

Note that the first time Telepresence is used on a cluster, it will automatically install the required cluster components, this requires permissions for creating Namespaces, ServiceAccounts, ClusterRoles, ClusterRoleBindings, Secrets, Services, MutatingWebhookConfiguration, and for creating the traffic-manager deployment which is typically done by a full cluster administrator. After that initial setup, these components will keep running in the cluster for future Telepresence usage. A user running Telepresence is expected to only have the minimum cluster permissions necessary to create a Telepresence intercept, and otherwise be unable to affect Kubernetes resources.

If you have trouble running Telepresence for the first time, please contact your Customer Lead or a colleague that has the necessary permissions.

Last updated on