Kubernetes
Walk-through of the Skyscrapers’ Kubernetes cluster
Here you’ll find user-level documentation of our Kubernetes reference solution. You’ll learn how to access the cluster, how to deploy applications and services and how to monitor them. You’ll also find some tips & tricks on how to get the most from your Kubernetes cluster.
The best place to start is this walk-through. After that you can read more in-depth documentation on specific topics in their respective files in this folder.
If you are new to Kubernetes, check the getting started page for more information.
Requirements
Optionally you can install some other tools to make your life easier. These are documented in the tools page.
Authentication
To gain access to an EKS cluster you need to authenticate via AWS IAM and configure your kubeconfig accordingly. To do this you’ll need a recent version of awscli (>= 2.13.0). If you don’t have the AWS CLI yet, you can install it by following the AWS instructions or via Homebrew/Linuxbrew:
brew install awscliYou’ll first need to authenticate to the AWS account where the EKS cluster is deployed (or your Admin account if you use delegation). Depending on how you configured your awscli config, --region and --profile are optional.
Make sure to replace <my_assumed_role_arn> with a correct role depending on your access level. Which roles you can assume are documented in your customer-specific documentation.
aws eks update-kubeconfig --name <cluster_name> --alias <my_alias> [--role-arn <my_assumed_role_arn>] [--region <aws_region>] [--profile <my_aws_profile>]
# For example:
aws eks update-kubeconfig --name production-eks-example-com --alias production --role-arn arn:aws:iam::123456789012:role/developerDeploying on Kubernetes with the Helm Package Manager
After a roll out of a Kubernetes cluster, it could be tempting to start executing numerous kubectl create or kubectl apply commands to get stuff deployed on the cluster.
Running such commands is a good idea to learn how deployments are done on Kubernetes, but it is not the appropriate tool to construct fully self contained application deployments. The Kubernetes community came up with a separate tool for that:
With Helm, you create self contained packages for a specific piece of a deployment, e.g. a web stack. Such packages are called Charts in Helm terminology.
You probably want such a stack to be deployed in a specific Kubernetes namespace, with a specific configuration (ConfigMap), defining a Kubernetes Service referring to a Deployment. But if you want to have this setup reproducible, you need a way to parameterize this.
By using a Template engine, a Go function library and the use of a Values.yaml file, you can build a template of a specific piece and re-use that for multiple deployments.
The Helm documentation is quite good and explanatory, and the best practices section highlight some of the important topics around chart development.
Good examples always help out a lot. Here is a list of existing git Charts repositories:
The above Chart repositories contain Charts that serve as building blocks for bigger composite installations.
Ingress
See Ingress specific documentation for more information on how to use Ingress controllers on your Kubernetes cluster.
IAM Roles
EKS IAM roles for Service Accounts is used by default.
Your deployments can be assigned with specific IAM roles to grant them fine-grained permissions to AWS services. To do that you’ll need to create a Service Account for your Pod and annotate it with the IAM role to use. For example
apiVersion: v1
kind: ServiceAccount
metadata:
name: myapp
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::889180461196:role/kube/staging-eks-example-com-myapp
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels: {}
spec:
selector:
matchLabels: {}
template:
metadata: {}
spec:
serviceAccountName: myapp
# Important to set correct fsGroup, depending on which user your app is
# running as.
# https://github.com/aws/amazon-eks-pod-identity-webhook/issues/8
securityContext:
fsGroup: 1001
# For completeness you can add the following too (not required)
#runAsNonRoot: true
#runAsGroup: 1001
#runAsUser: 1001It’s important to use a recent AWS SDK in your application for IRSA support.
You can find more examples and technical documentation in the official documentation: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-technical-overview.html
For JAVA-based applications IRSA does not work out of the box, you need to do the following change for your application, quoting from https://pablissimo.com/1068/getting-your-eks-pod-to-assume-an-iam-role-using-irsa :
you need to add an instance of
STSAssumeRoleWithWebIdentitySessionCredentialsProviderto a credentials chain, and pass that custom chain to your SDK init code via thewithCredentialsbuilder method. This class doesn’t automatically come as part of the credentials chain. Nor does it automatically initialise itself from environment variables the same way other providers do. You’ll have to pass in the web identity token file, region name and role ARN to get it running [!NOTE] Usually a Skyscrapers engineer will create the required IAM roles and policies for you. It’s important that we match your ServiceAccount to the IAM policy’sCondition. If you manage these policies yourself, it’s important to setup the IAM role with the correct federated trust relationship. For example:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:sub": "system:serviceaccount:${namespace}:${service-account-name}"
}
}
}
]
}Storage
Persistent Volumes
Persistent Volumes in our cluster are backed by AWS EBS volumes. Among the obvious caveats around scheduling (a volume is limited to a single AZ), there’s also a more silent and hard to predict caveat.
Depending on EC2 instance type, most of them support a maximum of only 28 attachments, including network interfaces, EBS volumes, and NVMe instance store volumes.. This means that only a limited number of EBS volumes per K8s node can be used, also considering our CNI uses multiple network interfaces.
Kubernetes limits the max amount of volumes for M5,C5,R5,T3 and Z1D to only 25 volumes to be attached to a Node, however this often isn’t enough depending how much network interfaces are in use by the CNI.
Unfortunately AWS doesn’t throw an error either when this happens. Instead a Volume will stay stuck in the Attaching state and your Pod will fail to launch. After ~5 minutes Kubernetes will taint the node with NodeWithImpairedVolumes.
We have added a Prometheus alert to catch this taint and you can follow the actions described in the runbook when this happens.
Local NVMe Instance Storage
Certain AWS EC2 instances come with fast local NVMe Instance Storage and can usually be recognized with the d suffix (eg. m5d.large). Our platform will automatically mount these volumes under the /ephemeralX paths (eg. /ephemeral0, /ephemeral1, …).
You can use this storage by mounting it via a hostPath volume in your Pod spec, for example:
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: test
image: k8s.gcr.io/test-webserver
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
volumeMounts:
- name: ephemeral
mountPath: /fastdata
subPathExpr: $(POD_NAME)
volumes:
- name: ephemeral
hostPath:
path: /ephemeral0It is important to note here that in the example we use the K8s Downward API with subPath expansion so each Pod uses it’s own subfolder under the /ephemeral0 path.
Monitoring
Cluster and application monitoring is a quite extensive topic by itself, so there’s a specific document for it here.
Logs
Cluster and application logging is a quite extensive topic by itself, so there’s a specific document for it here.
Cluster updates and rollouts
As part of our responsibilities, we continuously roll improvements (upgrades, updates, bug fixes and new features). Depending on the type of improvement, the impact on platform usage and application varies anywhere between nothing to having (a small) downtime. Below an overview for the most common types of improvements. More exceptional types will be handled separately.
| Type of improvement | Description | Expected impact on your workloads |
|---|---|---|
| Add-ons (non-breaking) | Improvements expected to not have an impact on the current usage of the cluster or application behaviour.These are rolled out automatically at any time during the day. You are informed during the updates. | No impact |
| Add-ons (non-breaking but disruptive) | Improvements to add-ons that may lead to temporary unavailability of platform functionalities (monitoring, logging, dashboard, etc) but that do not impact application workloads.These are rolled out automatically at any time during the day. You are informed before and during the updates. | No impact |
| Add-ons (breaking) | These improvements may need changes or intervention by you before they can be rolled out.We will reach out to you to discuss what’s needed on how the improvement will be rolled out. | In some cases: minimal planned downtime |
| Cluster improvements | Low-frequency improvements to the foundations of the cluster. Usually these involve rolling updates leading to nodes being recycled.These are rolled out automatically at any time during the day. You are informed before and during the updates. | Cluster-aware workloads: No impactOther workloads: potential minimal downtime |
To minimize the impact on your workloads, we suggest you to implement cluster-aware workloads as much as possible (TODO: define cluster-aware) and implement PodDisruptionBudgets. There’s more information on this here.
Cronjobs
Kubernetes can run cronjobs for you. More information/examples about cronjobs can be found here.
Cronjob Monitoring
Monitoring for cronjobs is implemented by default. This is done with prometheus and will alert when the last run of the cronjob has failed.
The following alerts are covering different failure cases accordigly:
KubeJobCompletion: Warnning alert after 1 hour if any Job doesn’t succeed or doesn’t run at all.KubeJobFailed: Warning alert after 1 hour if any Job failedKubeCronJobRunning: Warning alert after 1 hour if a CronJob keeps on running
Clean up
Starting from Kubernetes 1.7 the scheduled jobs don’t get automatically cleaned up. So make sure that you add the following two lines to the spec section of your cronjob.
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3This will clean up all jobs except the last 3, both for successful and failed jobs.
Accessing cluster resources and services locally
One of the main challenges developers and operators face when using Kubernetes, is communication between cluster services and those running in a local workstation. This is sometimes needed to test new versions of a service for example, or to access a cluster service that’s not exposed to the internet.
There are multiple solutions to overcome this, depending on the use-case and the specific requirements, although a nice all-round tool that covers most of the use-cases is Telepresence.
Telepresence creates a proxy tunnel to a Kubernetes cluster, allowing you to directly communicate with cluster services and Pods as if they were running in your local network. Head over to the documentation to know more on how it works and how to use it.
Telepresence works out of the box with our managed Kubernetes clusters that are not behind VPN and you can start using it right away on such clusters. For those private clusters that are behind OpenVPN, there’s an issue affecting DNS resolution when using Telepresence. We’re looking into that issue and we’ll update this documentation once we find a solution for it.
Note that the first time Telepresence is used on a cluster, it will automatically install the required cluster components, this requires permissions for creating Namespaces, ServiceAccounts, ClusterRoles, ClusterRoleBindings, Secrets, Services, MutatingWebhookConfiguration, and for creating the traffic-manager deployment which is typically done by a full cluster administrator. After that initial setup, these components will keep running in the cluster for future Telepresence usage. A user running Telepresence is expected to only have the minimum cluster permissions necessary to create a Telepresence intercept, and otherwise be unable to affect Kubernetes resources.
If you have trouble running Telepresence for the first time, please contact your Customer Lead or a colleague that has the necessary permissions.