Changelog
This changelog lists all updates, improvements and new features our Engineering team develops for our Skyscrapers Reference Developer Platform. These are rolled out automatically to all DevOps-as-a-Service customers.
2019 Q3
- 2019-07-16
Maintenance
Cluster and Persistent Volume backups with Velero 1.0
Staging Kubernetes clusters are now backed up through Heptio Velero. Production rollout is happening in the following days. As default schedule, backups are taken each night (0:00 UTC) and are retained for 10 days, however these are configurable. Backups …
- 2019-07-09
Maintenance
SSO / OAuth2 overhaul
We’ve completely updated our cluster’s Single-Sign-On setup, adding new features and fixing some long-standing bugs. What has changed: DEX, which we use as a single Identity Service for all authentication within the cluster, has been separated …
2019 Q2
- 2019-06-06
Maintenance
Support for Cognito in ElasticSearch
in v2.3.8 we added support for Cognito and its options to our terraform-awselasticsearch module.
- 2019-06-04
Maintenance
Adding Prometheus monitoring for Elasticsearch on ECS
Our ECS monitoring solution now supports monitoring Elasticsearch clusters using Elasticsearch Exporter, Prometheus and AlertManager, so we can get notified via slack (critical/warnings) and via OpsGenie (critical) for any issues with ES. This is similar …
- 2019-04-17
Maintenance
Move to the AWS provided Kibana
We’re in the process of removing our kibana deployment from all the Staging clusters and replacing it with the AWS provided kibana setup that comes with the AWS ElasticSearch service. Production clusters will follow. This change will free up some …
- 2019-04-16
Maintenance
Update kube2iam to 0.10.7
We’ve updated kube2iam to the latest version (0.10.7) on all clusters. For context, kube2iam is the component that provides IAM credentials to containers running in your Kubernetes clusters without the need to distribute secrets. This new version of …
- 2019-04-10
Maintenance
Upgrade Concourse to version 5
During the comming days, we’ll roll out Concourse version 5.0.1 to all our setups. This is a major version upgrade, comming from version 4.2.3, and it includes some important new features and fixes. The most relevant change for Concourse users is …
- 2019-04-08
Maintenance
Increased monitoring alerts visibility
During the following days we’re going to rollout some changes in how Kubernetes monitoring notifications are delivered. From now on, all notifications comming from the production k8s monitoring system will be shown in our shared slack channel, that …
2019 Q1
- 2019-03-29
Maintenance
Upgrade to Kubernetes 1.11.9 [CVE-2019-1002100, CVE-2019-9946, CVE-2019-3874, CVE-2019-1002101]
We are in the process of upgrading our managed Kubernetes clusters from v1.11.6 to v1.11.9. Next to some general bugfixes and improvements, which you can find full details in the Kubernetes changelog, this rollout comes with several high and medium …
- 2019-03-19
Maintenance
Create simple AWS resources from K8s via the AWS Service Operator
We’ve made the AWS Service Operator available for deployment on our managed Kubernetes clusters. This Operator allows you to manage some AWS resources, like ECR repositrories and S3 buckets, by using Kubernetes Custom Resource Definitions. For …
- 2019-03-18
Maintenance
Support for cronjob monitoring
Update (18-03-2019): We found out there were enough default alerts covering all cases of cronjob failures. The following alerts are covering different failure cases accordingly: KubeJobCompletion: Warnning alert after 1 hour if any Job doesn’t …
- 2019-03-06
Maintenance
Upgrade Kubernetes components
We are in the process of upgrading our staging Kubernetes clusters components to the latest stable releases. Production clusters will follow in 1 to 2 weeks (to be announced) after we have confirmed there are no issues with our customer’s workloads. …
- 2019-03-06
Maintenance
Improved monitoring alerts on Slack
We have updated the format of the monitoring Slack notifications. You might have already noticed that the monitoring messages in your Slack channels now contain more useful information and are more structured. We’ve already started rolling out the …
- 2019-02-21
Maintenance
Mongodb monitoring and dashboards
We have updated the clusters to have support for mongodb monitoring, alerts and dashboards. If you have a mongodb cluster you will see that there is now a mongodb dashboard in Grafana and that we added specific alert rules for mongodb in prometheus.
- 2019-02-19
Maintenance
Improved etcd backups
We’ve upgraded all the k8s cluster with a new etcd backup implementation. The old backup solution was relying on daily snapshots taken from a service running in the master nodes. We’ve decided to take a new approach by using AWS Data Lifecycle …
- 2019-02-18
Maintenance
CVE-2019-5736 - Rolling out patched runc
Update: Added other affected services next to Kubernetes. Last week a new vulnerability in Docker’s runc was announced: CVE-2019-5736. You can read more about this specific vulnerability and how it affects Kubernetes users in the Kubernetes blog: …
- 2019-01-21
Maintenance
Use encrypted EBS volumes for etcd storage and (optionally) encrypt k8s node root volumes
We’re rolling out a major update for our Kubernetes etcd clusters to now use encrypted EBS volumes for storing all of the Kubernetes state. As an optional feature, it’s also possible to have the Kubernetes nodes root volumes encrypted. If this …
- 2019-01-15
Maintenance
Move to CoreDNS dns server and add gp2-encrypted StorageClass
We’re updating our Kubernetes staging clusters with CoreDNS, the new dns server that replaces KubeDNS. After an in-depth analysis and tests we’ve verified that the performance and the stability between the two solutions are almost identical. …
- 2019-01-11
Maintenance
Upgrade Vault to 1.0.1
A Vault upgrade for our setups was long overdue. We’ve upgraded our Vault installation tools from version 0.9.3 to 1.0.1, which is the latest Vault version available at the moment. As Vault is set up as HA, the downtime of the upgrade will be …
- 2019-01-03
Maintenance
Upgrade to Kubernetes 1.11.6 [updated]
Update: Changed Kubernetes update from 1.10.12 to 1.11.6 We’ve upgraded our Kubernetes staging clusters to the latest stable version. Bumping from version 1.10.10 to 1.10.12 1.11.6. There are some nice additions to the 1.11 release, like Pod priority …
2018 Q4
- 2018-12-03
Maintenance
Upgrade to Kubernetes 1.10.11 [updated]
Update 2 (2018-12-03): Since our last update, the people at Kubernetes updated their documentation to add an important fix in the 1.10.11 changelog: CVE-2018-1002105: Fix critical security issue in kube-apiserver upgrade request proxy handler (#71411, …
- 2018-11-27
Maintenance
Set resource reservations for kubelet and other system processes
Following our efforts to improve the overall stability of our Kubernetes clusters, we’ve now set resource reservations for kubelet and other system processes. This will ensure that these critical processes always have enough CPU and memory available …
- 2018-11-27
Maintenance
Adding Prometheus monitoring for ECS
We’ve deployed on all our ECS managed staging clusters a prometheus monitoring system. This allows us to have a better monitoring for our ECS nodes and adds the opportunity to create custom metrics to monitor your applications. Thanks to alert …
- 2018-11-21
Maintenance
Updated Prometheus & Grafana monitoring stack - update
As announced in our previous update, we have migrated our cluster-monitoring stack by using the new stable/prometheus-operator as base chart. By now these updates have already been rolled out across staging clusters. Initially we planned to do a phased …
- 2018-11-19
Maintenance
Updated Prometheus & Grafana monitoring stack
Our cluster monitoring stack is based on the prometheus-operator developed by the people at CoreOS, more concretely we used kube-prometheus as a starting point for a complete setup. This project has seen numerous changes and improvements, like the …