Changelog

This changelog lists all updates, improvements and new features our Engineering team develops for our Skyscrapers Reference Developer Platform. These are rolled out automatically to all DevOps-as-a-Service customers.

2020 Q2

2020-04-01 Maintenance
Help fight COVID-19 with your Kubernetes cluster
In the context of the current global situation regarding the COVID-19 pandemic, we’re making it easy for us and our customers to commit part of our infrastructure spare resources to the Folding@Home project. In short, Folding@Home uses distributed …
#kubernetes #folding@home #covid-19 #distributed-computing

2020 Q1

2020-03-31 Maintenance
Upgrades to monitoring components
We’ve rolled out some minor updates to the monitoring components. List of updated components: grafana: v6.6.2 –> v6.7.1 Several enhancements and bug fixes. prometheus: v2.14.0 –> v2.15.2 Several enhancements and bug fixes, including …
#kubernetes #eks #add-on #grafana #monitoring #prometheus #alertmanager
2020-03-27 Maintenance
Cluster addons upgrades
Over the past weeks we’ve rolled out a bunch of updates to our Kubernetes addons stack for all staging and production clusters. List of updated components: cert-manager: v0.9.0 –> v0.13.1 During the process we also made it through the major …
#kubernetes #eks #add-on #grafana #monitoring #loki #cert-manager #nginx-ingress #dex #oauth-proxy #external-dns
2020-03-27 Maintenance
Alert and documentation for NodeWithImpairedVolumes
Using EBS-backed Persistent Volumes on Kubernetes comes with some caveats. Among those is the (silent) limit of maximum attachments per EC2 instance. For more information about this issue, you can check the documentation. We have also added an alert to …
#kubernetes #monitoring #aws #pv #ebs
2020-03-26 Maintenance
Use NetworkPolicies
We have deployed Calico to our EKS setups as a network policy engine. By default, Pods are non-isolated and thus accept traffic from any source. By specifying NetworkPolicies you can isolate Pods from each other and thus have more fine-grained K8s …
#kubernetes #eks #networkpolicy #calico #cni
2020-03-26 Maintenance
Upgrade of core EKS components
We have upgraded the core cluster components, running in kube-system, to their latest recommended versions (for EKS 1.14): AWS VPC CNI from 1.5.3 to 1.5.5 CoreDNS from 1.3.1 to 1.6.6 KubeProxy from 1.14.7 to 1.14.9 These are minor updates, bringing some …
#kubernetes #eks #upgrade #coredns #cni #kube-proxy
2020-03-25 Maintenance
Upgrade Concourse to version 5.8.1
We rolled out Concourse version 5.8.1 to all our setups. This is a CVE version upgrade that patches an edge case of CVE-2018-15798 You can check out the full Concourse changelog here.
#concourse #cicd #upgrade #pipelines #cve
2020-03-23 Maintenance
Upgrade Caddy to version 1.0.4 with ACMEv2
We rolled out version 1.0.4 of the Caddy web server to all our setups which use on-demand “whitelabel” type of domains. All these certificates are now being requested and renewed against the ACMEv2 API. ACMEv1 has been deprecated for a while …
#caddy #ssl #whitelabel
2020-03-17 Maintenance
Upgrade Concourse to version 5.8.0
We rolled out Concourse version 5.8.0 to all our setups. This is a minor version upgrade, coming from version 5.7.2, and it includes the following: The firrst step to spaces in Concourse a handful of fixes and smaller features. You can check out the full …
#concourse #cicd #upgrade #pipelines
2020-02-25 Maintenance
Vault on K8s
As of now we have the option to deploy Vault on our reference solution out of the box. Previously we setup Vault on a 2 node EC2 cluster with a Dynamodb backend. This way of working had some downsides and made it harder for us to maintain and upgrade the …
#vault #kubernetes
2020-02-21 Maintenance
All secret data is now encrypted in our Kubernetes definition files
As you may know, we define our Kubernetes clusters’ desired state in a yaml file, which is stored in the customer private Git repository. That file is then fed into our CI, which is the one responsible for rolling out the cluster. That cluster …
#kubernetes #kms #terraform #secrets #encryption
2020-02-11 Maintenance
Bugfix - Velero backups failing on some clusters
We use Velero as our solution to backup complete K8s cluster workloads (both K8s resources and Persistent Volumes). However, we discovered and resolved 2 bugs in our implementation which could lead to failed K8s resource backups: An error in our policies …
#kubernetes #bug #velero #backup
2020-02-11 Maintenance
Bugfix - Raise fs.inotify limits
During our migrations from KOPS to EKS clusters, some customer Pods had issues launching, due to hitting fs.inotify.max_user_instances and/or fs.inotify.max_user_watches limits. Turns out these sysctl have been raised from their defaults for the KOPS base …
#kubernetes #bug #eks #ami
2020-01-16 Maintenance
Documentation on how to have feature environments on Concourse
The Concourse team is working hard to have an implementation to accomodate feature environments in Concourse. However this is still WIP at this moment and per request of our customers we researched a way to have feature environments with Concourse. …
#concourse #documentation #feature-environments
2020-01-16 Maintenance
Allow runing K8s nodes and Concourse workers in public subnets
We now make it possible to run (part of) your Kubernetes and/or Concourse worker nodes in public subnets, if the situation requires it. However our default is still to deploy these instances in private subnets. These nodes will get a public IP assigned, …
#kubernetes #concourse
2020-01-15 Maintenance
Bugfix - loki-promtail wasn't scheduled on tainted nodes
We offer Grafana Loki as default logging solution, which relies on the Promtail daemonset for gathering logs on each K8s node and shipping them to Loki. However the Promtail pods weren’t scheduled on Kubernetes nodes with a Taint in place. This is …
#kubernetes #bug #loki
2020-01-15 Maintenance
Bugfix - Grafana instability, increased memory request/limit
For some customers, with more complex dashboards, Grafana has recently become unstable sometimes due to hitting our configured memory limits. Instead of using a fixed value, this is now configurable on a per cluster basis.
#kubernetes #bug #grafana
2020-01-14 Maintenance
Teleport setups and updates are now fully automated
In our quest to automate most of the components of our infrastructure, we’ve set up CI/CD pipelines to automate the rollout of Teleport servers and their nodes. As you might know, we use Teleport to access the instances of our (and your) …
#teleport #automation #cicd #concourse

2019 Q4

2019-12-12 Maintenance
Upgrade Concourse to version 5.7.2
During the coming days, we’ll roll out Concourse version 5.7.2 to all our setups. This is a minor version upgrade, comming from version 5.5.3, and it includes the following: a highlight of features you might be interested in: A new experimental …
#concourse #cicd #upgrade #pipelines
2019-12-12 Maintenance
Spot termination alerts on Slack
We have updated the alert routing from k8s-spot-termination-handler to notify in our shared Slack channel to increase visibility. We’ve rolled this change out to all our clusters during the last couple of days. The k8s-spot-termination-handler is …
#kubernetes #monitoring #slack #notifications
2019-12-12 Maintenance
Improved Prometheus-based ElasticSearch monitoring
It has come to our attention that in certain cases our Prometheus-based ElasticSearch monitoring wasn’t correctly detecting issues and sending alerts. This problem would arise when deployed through Helm with a long release name. Kubernetes object …
#kubernetes #monitoring #prometheus #elasticsearch
2019-12-12 Maintenance
Fixed Kubernetes cluster-autoscaler ASG auto-detectionn
Some earlier changes in how we label our AWS AutoScaling Groups (ASGs) and which labels the Kubernetes cluster-autoscaler uses for automatically detecting these ASGs caused the scaler to not work properly. This could result in clusters not automatically …
#kubernetes #autoscaling #cluster-autoscaler
2019-12-05 Maintenance
Several other K8s Reference Solution improvemens
To complement today’s barrage of changelog updates, here’s some miscellaneous additions that didn’t make it in onther post 😁: Usage of auth-proxy for accessing the Kubernetes Dashboard on EKS clusters, removing the need to re-generate a …
#kubernetes #eks #add-ons
2019-12-05 Maintenance
Kubernetes monitoring upgrades
In the past week we’ve rolled out a bunch of updates to our Kubernetes cluster-monitoring stack. Except for the updated compoments (see below) we have also added a couple of new features: Allow enabling of custom Grafana plugins through the cluster …
#kubernetes #eks #add-ons #kops #prometheus #grafana #monitoring
2019-12-05 Maintenance
Heavily reduced resource reservations for our default cluster Add-ons
In the past weeks we’ve revisited the resource reservations (requests and limits) we made for running all the cluster Add-ons. These Add-ons provide you with features like Ingress, DNS, certificates, monitoring and logging, backups and so forth. …
#kubernetes #eks #add-ons #kops #resources