Cluster add-on upgrades, with Loki, Traefik and monitoring refreshed

We rolled out another round of cluster add-on upgrades. It is shipping to non-production clusters first, with production to follow in the coming week. Most of it is routine version hygiene, but three charts got a major refresh (Loki, Traefik, and the kube-prometheus-stack monitoring stack), and your oauth2-proxy now reports metrics to your cluster monitoring.

What changed

oauth2-proxy metrics are now scraped. A casing typo in our config meant the oauth2-proxy ServiceMonitor was never created, so its metrics were not collected. That is fixed: oauth2-proxy now appears as a Prometheus target and its metrics are available in Grafana.
Loki, Traefik and kube-prometheus-stack moved up a major chart version each. These are mostly internal chart changes (Helm value renames, refreshed Prometheus Operator CRDs). The one thing you’ll notice: Loki’s Grafana dashboards now come from the upstream chart, so you’ll see a refreshed set of Loki dashboards, with app_instance as the cluster selector instead of cluster.

Add-on upgrades

alloy v1.17.0 (chart v1.10.0)
amazon-eks-ami v20260618
aws-ebs-csi-driver v1.62.0-eksbuild.1
aws-efs-csi-driver v3.3.0-eksbuild.1
aws-mountpoint-s3-csi-driver v2.7.0-eksbuild.1
aws-load-balancer-controller v3.4.0
cert-manager v1.20.3
coredns v1.14.3-eksbuild.3
dex v2.44.0 (chart v0.24.1)
eks-node-monitoring-agent v1.6.6-eksbuild.1
fluent-bit v5.0.7 (chart v0.57.7)
karpenter v1.13.0
keda v2.20.1
kube-prometheus-stack v87.3.0 (Prometheus Operator v0.92.0)
- Major chart bump across two versions; bundles a newer Grafana (v12.7.1) and updated Prometheus Operator CRDs. No dashboards or alerts change for you.
kube-proxy v1.35.3-eksbuild.13 (1.35 clusters)
grafana-loki v3.7.3 (chart v18.2.0)
- Major chart bump. The chart restructured its monitoring config upstream. Your Loki alerts are unchanged, but the Loki dashboards in Grafana now come from the upstream chart instead of our own copies: expect a refreshed set of Loki dashboards whose cluster selector is now app_instance rather than cluster.
metrics-server v0.8.1-eksbuild.11
nvidia-device-plugin v0.19.3
oauth2-proxy v7.15.3 (chart v10.7.0) — ServiceMonitor now created (see above)
prometheus-blackbox-exporter v0.28.0 (chart v11.15.0)
secrets-store-csi-driver-provider-aws v3.1.1
grafana-tempo v2.10.7 (chart v2.2.3)
traefik v3.7.5 (chart v41.0.1)
- Major chart bump. Internal Helm value renames only (logging config keys); no change to routing, middlewares or your IngressRoutes.
velero v1.18.1 (chart v12.1.0)
vertical-pod-autoscaler v1.7.0 (chart v0.10.0)
- Adds an InPlace update mode that resizes pod resources without evicting the pod.
wg-easy v15.3.0