Grafana
Grafana is deployed as part of the monitoring stack and wired to Prometheus, Loki, CloudWatch, Alertmanager, and (optionally) Tempo by default. It’s exposed at https://grafana.<cluster_fqdn> and authenticated through DEX.
This page collects the day-to-day tasks you may need to perform on Grafana: adjusting configuration via the cluster definition, shipping dashboards, and firing alerts.
Configure Grafana via the cluster definition
All Grafana configuration lives under spec.cluster_monitoring.grafana in your cluster definition. The schema descriptions in .vscode/cluster-definition-schema-eks.yaml are authoritative for per-field details; the recipes below cover the common cases.
Ship dashboards as ConfigMaps
Beyond creating dashboards directly in the Grafana UI, you can version-control them as ConfigMaps — useful if you want them deployed alongside your application. Grafana picks up any ConfigMap labeled grafana_dashboard (any value will do):
apiVersion: v1
kind: ConfigMap
metadata:
labels:
grafana_dashboard: application
name: grafana-dashboard-myapplication
namespace: mynamespace
data:
grafana-dashboard-myapplication.json: |
{ <Grafana dashboard json> }Make sure dashboards reference "Prometheus" (or one of your declared datasources) as the datasource UID.
You can get inspired by some of the dashboards already deployed in the cluster:
kubectl get configmaps -l grafana_dashboard --all-namespacesAdd custom datasources
Beyond the built-in Prometheus/Loki/CloudWatch/Alertmanager/Tempo datasources, you can add your own in two ways.
As a ConfigMap (declarative)
Similar to dashboards — create a ConfigMap labeled grafana_datasource (any value will do) containing a Grafana datasource provisioning YAML. The Grafana sidecar picks it up and writes it into Grafana’s provisioning directory.
Simple example — point at an Elasticsearch cluster reachable from the EKS cluster (e.g. an AWS OpenSearch domain):
apiVersion: v1
kind: ConfigMap
metadata:
name: elasticsearch-logs-datasource
namespace: mynamespace
labels:
grafana_datasource: "1"
data:
datasources.yaml: |
apiVersion: 1
datasources:
- name: Elasticsearch-Logs
type: elasticsearch
uid: elasticsearch-logs
url: https://vpc-logs.eu-west-1.es.amazonaws.com
access: proxy
database: "logs-*"
editable: false
jsonData:
esVersion: "8.0.0"
timeField: "@timestamp"
logMessageField: message
logLevelField: levelMore advanced — multi-tenant Loki, where each tenant gets its own datasource with a distinct X-Scope-OrgID header, optionally scoped to a specific Grafana org:
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-tenants-datasource
namespace: mynamespace
labels:
grafana_datasource: "1"
data:
datasources.yaml: |
apiVersion: 1
datasources:
- name: Loki-TenantA
type: loki
uid: loki-tenant-a
url: http://loki-read:3100
access: proxy
orgId: 2
editable: false
jsonData:
httpHeaderName1: X-Scope-OrgID
secureJsonData:
httpHeaderValue1: tenant-aEach entry follows Grafana’s datasource provisioning schema, so any field (jsonData, secureJsonData, orgId, …) is supported.
Important
The datasource sidecar runs as an initContainer only (live reload is incompatible with our OAuth setup), so Grafana must be restarted to pick up a new or changed ConfigMap.
kubectl rollout restart statefulset cluster-monitoring-grafana -n infrastructureWarning
Values in secureJsonData are stored in the ConfigMap in plaintext. Don’t commit real secrets to git — encrypt sensitive ConfigMaps at rest (sops, sealed-secrets) or add them manually via the UI instead.
Manually via the Grafana UI
Add a datasource under Connections → Data sources in the Grafana UI. Grafana’s database lives on the StatefulSet’s PVC (cluster_monitoring.grafana.persistence is enabled by default), so manually-created datasources persist across pod restarts.
This is convenient for quick exploration or for datasources that need real secrets you don’t want in git. Caveat: it’s not declarative — the datasource won’t survive a PVC loss and won’t appear in another cluster; prefer ConfigMaps for anything long-lived.
Promote specific users to GrafanaAdmin
The DEX-backed [auth.generic_oauth] section is preset with the minimum required keys (enabled, client_id, scopes, OAuth URLs, etc.). To extend it — typically to map OAuth claims onto Grafana roles — use generic_oauth_extras:
spec:
cluster_monitoring:
grafana:
generic_oauth_extras:
role_attribute_path: "contains(email, 'alice@example.com') && 'GrafanaAdmin' || contains(email, 'bob@example.com') && 'GrafanaAdmin' || 'Editor'"
allow_assign_grafana_admin: trueThe expression elevates two specific users to GrafanaAdmin and defaults everyone else to Editor. See Grafana’s generic OAuth reference for the full list of supported keys.
Important
Don’t override keys already set by the stack (enabled, auto_login, client_id, scopes, auth_url, token_url, api_url, skip_org_role_sync) via this field — doing so would break OAuth.
Add an extra OAuth provider
custom_auth_config adds a new grafana.ini auth section alongside the built-in DEX flow (auth.generic_oauth). For the provider’s secret, use custom_auth_secrets, which creates KMS-decrypted env vars in the Grafana pod.
Example for Entra ID (Azure AD):
spec:
cluster_monitoring:
grafana:
custom_auth_config:
auth.azuread:
name: "Entra ID"
enabled: true
auto_login: false
client_id: CLIENT_ID
scopes: "openid email profile groups"
auth_url: "https://login.microsoftonline.com/TENANT_ID/oauth2/v2.0/authorize"
token_url: "https://login.microsoftonline.com/TENANT_ID/oauth2/v2.0/token"
allowed_organizations: TENANT_ID
use_pkce: true
custom_auth_secrets:
GF_AUTH_AZUREAD_CLIENT_SECRET: a-kms-encrypted-payloadEach value in custom_auth_secrets must be KMS-encrypted with the context k8s_stack=secrets.
Enable feature toggles
Experimental Grafana features are enabled via featureToggles (merged into the [feature_toggles] section of grafana.ini):
spec:
cluster_monitoring:
grafana:
featureToggles:
provisioning: true
kubernetesDashboards: trueSee Grafana’s feature-toggles reference for the full catalog.
Install plugins
spec:
cluster_monitoring:
grafana:
plugins:
- grafana-piechart-panelEnable extra built-in dashboards
Skyscrapers bundles a set of optional dashboards you can enable by name:
spec:
cluster_monitoring:
grafana:
extra_dashboards:
- sqsFire alerts from Grafana
Beyond PrometheusRule, you can define alerts using Grafana’s alerting system, which is useful for multi-datasource queries (for example, correlating Loki logs with Prometheus metrics). The Skyscrapers-managed Grafana ships with a Main Alertmanager Contact Point wired to the cluster’s Alertmanager — use it so your Grafana alerts get the same routing as Prometheus alerts.
Start by building the alert in the Grafana UI, then export it as YAML and ship it as a ConfigMap labeled grafana_alert (any value will do). The following example triggers when unexpected HTTP status codes appear in the public nginx ingress logs:
kind: ConfigMap
apiVersion: v1
metadata:
name: http-errors-nginx-ingress-alert-rule
namespace: infrastructure
labels:
grafana_alert: "logs-nginx-http-errors"
data:
rules.yaml: |
groups:
- name: logs-nginx-http-errors
folder: Ingress Alerts
interval: 5m
rules:
- title: Unexpected HTTP status codes discovered in the nginx-ingress logs
condition: A
data:
- refId: A
queryType: range
relativeTimeRange:
from: 600
to: 0
datasourceUid: P8E80F9AEF21F6940
model:
datasource:
type: loki
uid: P8E80F9AEF21F6940
editorMode: code
expr: |-
sum by () (
count_over_time(
{app_kubernetes_io_name="nginx-ingress"}
| json
| httpRequest_status !=~ "^2\\d\\d$"
[5m])
)
instant: true
intervalMs: 1000
maxDataPoints: 43200
queryType: range
refId: A
noDataState: OK
execErrState: KeepLast
for: 5m
annotations:
summary: Unexpected HTTP status codes are found in the logs of our public nginx ingress
labels:
severity: "critical"
isPaused: false
notification_settings:
receiver: Main AlertmanagerNote
Grafana alerts live in Grafana’s SQLite DB on the StatefulSet’s PVC, so they persist across pod restarts as long as cluster_monitoring.grafana.persistence stays enabled (default).