Skip to content

Monitoring

We use a Prometheus, Alertmanager and Grafana stack for a complete monitoring setup of both our clusters and applications running on them.

For multi-cluster setups one can optionally use Thanos to aggregate metrics from multiple Prometheus instances and store them in a single place. This allows us to have a single Grafana instance to visualize metrics from all clusters. Prometheus is used with remote write.

We make use of the CoreOS operator principle: by deploying prometheus-operator we define new Kubernetes Custom Resource Definitions (CRD) for Prometheus, Alertmanager, ServiceMonitor and PrometheusRule which are responsible for deploying Prometheus and Alertmanager setups, Prometheus scrape targets and alerting rules, respectively.

As of this writing there isn't an operator setup yet for Grafana, but you can add custom dashboards dynamically via ConfigMaps.

Accessing the monitoring dashboards

Prometheus, Alertmanager and Grafana are accessible via their respective dashboards. These dashboards are exposed via Ingresses, either private or public, depending on your cluster's configuration. If the dashboards are set to be private, you'll need to be connected to the cluster's VPN in order to access them.

These monitoring dashboards are secured via Dex, which adds an authentication layer. Dex can be configured with multiple identity providers to provide a seamless authentication experience. The most typical providers that we use are GitHub and Azure, which are explained in more detail below. Dex configuration is specific to each cluster, and is currently managed by Skyscrapers, although some integration details might need to be provided by you (the customer).

Github

The GitHub integration uses the GitHub OAuth2 flow, and it's pretty straight-forward to configure. In most cases, we will set up this integration for you, creating the OAuth application in our GitHub organization. For reference, this is the official Dex documentation to configure this integration.

Microsoft Azure

Dex Microsoft connector also works with Azure accounts and tenants. You'll need to register a new application in your Azure tenant. To do so, go to the Azure Active Directory portal for the correct tenant, go to App Registrations in the side bar and click New registration. You can give it the name you want, something significant like DEX - <cluster name>. Once the application is created, you'll need to create a client secret by going to Certificates & secrets in the side bar and clicking the New client secret button.

Once the application is registered and configured, you'll have to provide us with the following information:

  • application client id
  • application secret value
  • tenant id
  • the list of groups to authorize in Dex. Users belonging to these groups will be granted access.

In some cases, you might need to give the application special admin consent so Dex is able to list groups on behalf of logged in user. If that is the case, you'll need to add an explicit Directory.Read.All permission to the list of Delegated Permissions and then open the following link in your browser and log in under organization administrator account: https://login.microsoftonline.com/<tenant>/adminconsent?client_id=<dex client id>. You'll get a page similar to this one:

azure auth

Note

after you click Accept and grant the necessary permissions, you might get an error page from Dex. That's ok and you can ignore it and close it.

Kubernetes application monitoring

You can also use Prometheus to monitor your application workloads and get alerts when something goes wrong. In order to do that you'll need to define your own ServiceMonitors and PrometheusRules. There are two requirementes though:

  • All ServiceMonitors and PrometheusRules you define need to have the prometheus label (any value will do).
  • The namespace where you create your ServiceMonitors and PrometheusRules need to have the prometheus label too (any value will do).
kubectl label namespace yournamespace prometheus=true

Custom Grafana Dashboards can also be added to this same namespace by creting a ConfigMap with a grafana_dashboard label (any value will do), containing the dashboard's json data. Please make sure that "datasources" are set to "Prometheus" in your dashboards!

Note

even if these objects are created in a specific namespace, Prometheus can scrape metric targets in all namespaces.

It is possible to configure alternative routes and receivers for Alertmanager. This is done in your cluster definition file under the addons section. Example:

spec:
 cluster_monitoring:
   alertmanager:
     custom_routes: |
       - match:
           namespace: my-namespace
         receiver: custom-receiver
spec:
 cluster_monitoring:
   alertmanager:
     custom_receivers_payload:
       | # (The whole yaml block should be encrypted via KMS with the context 'k8s_stack=secrets')
       - name: custom-receiver
         webhook_configs:
           - send_resolved: true
             url: <opsgenie_api_url>

Note

This configuration can be made by creating a PR to your repo (optional), and/or communicated to your Customer Lead because this needs to be rolled out to the cluster.

Example ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    prometheus: application
  name: myapplication-php-fpm-exporter
  namespace: mynamespace
spec:
  endpoints:
    - targetPort: 8080
      interval: 30s
  namespaceSelector:
    matchNames:
      - production
  selector:
    matchLabels:
      app: application
      component: php

You can find more examples in the official Prometheus Operator documentation or get inspired by the ones already deployed in the cluster:

kubectl get servicemonitors --all-namespaces

Example PrometheusRule

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: application
  name: myapplication-alert-rules
  namespace: mynamespace
spec:
  groups:
  - name: myapplication.rules
    rules:
    - alert: MyApplicationPhpFpmDown
      expr: phpfpm_up{job="application"} == 0
      for: 10m
      labels:
        severity: critical
        namespace: namespace
      annotations:
        description: '{{`{{ $value }}`}}% of {{`{{ $labels.job }}`}} PHP-FPM are down!'
        summary: PHP-FPM down
        runbook_url: 'https://github.com/myorg/myapplication/tree/master/runbook.md#alert-name-myapplicationphpfpmdown'

You can find more examples in the official Prometheus Operator documentation or get inspired by the ones already deployed in the cluster:

kubectl get prometheusrules --all-namespaces

Note

We use the namespace label in the alerts, to distinguish between infrastructure and application alerts, and distribute them to the appropriate receivers. If the namespace label is set to any of the namespaces we are responsible for (e.g. infrastructure, cert-manager, keda, istio-system, ...), or if the alert doesn't have a namespace label (a sort of catch-all), we consider them infrastructure, and are routed to our on-call system and Slack channels. Otherwise the alerts are considered application alerts and routed to the #devops_<customername>_alerts Slack channel (and any other receivers that you define in the future). This label can be "hardcoded" as an alert label in the alert definition, or exposed from the alert expression. This is important since some alerts span multiple namespaces and it's desirable to get the namespace from which the alert originated.

Tip

we highly recommend including a runbook_url annotation to all alerts so the engineer that handles those alerts has all the needed information and can troubleshoot issues faster.

Using Grafana to fire alerts

You leverage the Grafana alerting system to fire alerts based on the data you have in Prometheus. This is done by creating a Grafana dashboard (or setting up Alert Rules) and setting up alerts on it.

By default Skyscrapers managed Grafana has a built in integration with Alertmanager as a Contact Point. Make sure to use the Main Alertmanager as the contact point for your alerts. To make sure that Grafana persistently stores the alerts, you need to make sure the Grafana workload has persistent storage. You can do this by setting the spec.cluster_monitoring.grafana.persistence.enabled to true in your cluster definition file.

This means that you can set up alerts in Grafana and they will be sent to Alertmanager, which will then route them to the correct receiver.

Example Grafana Dashboard

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    grafana_dashboard: application
  name: grafana-dashboard-myapplication
  namespace: mynamespace
data:
  grafana-dashboard-myapplication.json: |
{ <Grafana dashboard json> }

You can get inspired by some of the dashboards already deployed in the cluster:

kubectl get configmaps -l grafana_dashboard --all-namespaces

AWS services monitoring

AWS services can also be monitored via Prometheus and Alertmanager, like the rest of the cluster. To do so we use the Prometheus cloudwatch-exporter, which imports CloudWatch metrics into Prometheus. From there we can build alerts that trigger when some conditions happen.

The cloudwatch-exporter is not deployed by default as a base component of the reference solution, as it's highly dependent on the customer needs.

We provide pre-made Helm charts for some AWS resources, like RDS, Redshift and Elasticsearch, which deploy the cloudwatch-exporter and some predefined alerts, but additional cloudwatch-exporters can be deployed to import any other metrics needed. Be aware that exporting data from Cloudwatch is quite costly, and every additional exported metric requires additional API calls, so make sure you only export the metrics you'll use.

Normally, you'll deploy a different cloudwatch-exporter for each AWS service that you want to monitor, as each of them will probably require different period configurations.

You'll also need an IAM role for the cloudwatch-exporter with the following policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:ListMetrics",
        "cloudwatch:GetMetricStatistics",
        "tag:GetResources"
      ],
      "Resource": "*"
    }
  ]
}

Note

A single role can be used for all cloudwatch exporters deployed on the same cluster.

Recommendations and best practices

Prometheus labels

As the Prometheus documentation states:

Use labels to differentiate the characteristics of the thing that is being measured:

  • api_http_requests_total - differentiate request types: type="create|update|delete"
  • api_request_duration_seconds - differentiate request stages: stage="extract|transform|load"

Caution

Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

Not following that advise can cause the whole Prometheus setup to become unstable, go out of memory and eventually cause collateral damage on the node that's running on. A good example can be found in this Github issue.

This is how Prometheus would perform with "controlled" metric labels:

prometheus-healthy

Vs. a Prometheus with "uncontrolled" metric labels:

prometheus-unhealthy

Prometheus scrapers for common technologies

Here's a list of Prometheus scrapers already available for common frameworks.

PHP

Prometheus has a native client library for PHP. This is easy to implement inside your application, but by default doesn't expose standard metrics. This is ideal if you want to expose your own metrics (can also be business metrics).

PHP-FPM

If you want to get PHP-FPM metrics, we recommend using this exporter. It's being actively maintained and the documentation is reasonably good. You have to setup the exporter as a sidecar container in your pods/task-definition, then it'll access the PHP-FPM socket to read statistics and expose them as Prometheus metrics.

You first need to expose the metrics in PHP-FPM. You can do this by adding the following config to your PHP-FPM image.

pm.status_path = /status

Then you'll need to add the php-fpm-exporter as a sidecar container to your pod/task-definition.

Here is an example for k8s:

- name: {{ template "app.fullname" . }}-fpm-exporter
  image: hipages/php-fpm_exporter:2.2.0
  env:
    - name: PHP_FPM_SCRAPE_URI
      value: "tcp://127.0.0.1:{{ .Values.app.port }}/status"
  ports:
    - name: prom
      containerPort: 9253
      protocol: TCP
  livenessProbe:
    tcpSocket:
      port: prom
    initialDelaySeconds: 10
    periodSeconds: 5
  readinessProbe:
    tcpSocket:
      port: prom
    initialDelaySeconds: 10
    timeoutSeconds: 5
  resources:
    limits:
      cpu: 30m
      memory: 32Mi
    requests:
      cpu: 10m
      memory: 10Mi

Note

You'll need to adjust {{ template "app.fullname" . }} and {{ .Values.app.port }} to the correct helm variables. The first one represents the app name we want to monitor. The second is the php-fpm port of the application.*

Ruby

Prometheus has a native client library for Ruby. This is a really good library when you run your ruby application as single process. Unfortunately a lot of applications use Unicorn or another multi-process integration. When using a multi-process Ruby you can best use the fork of GitLab.

You have to integrate this library in your application and expose it as an endpoint. Once that is done, you can add a ServiceMonitor to scrape it.

Workers

Workers by default don't expose a webserver to scrape from. This will have to change and every worker will need to expose a simple webserver so that Prometheus can scrape its metrics.

It is really discouraged to use Pushgateway for this. For more info why this is discouraged, see the Pushgateway documentation.

Nginx

Nginx has an exporter through the Nginx VTS module. We forked the official Docker Nginx image and added the VTS module to nginx. We had to fork it because the VTS module needs to be compiled with Nginx.

Setup

  1. Use our Nginx Docker image instead of the upstream Nginx.
  2. Add the exporter as sidecar to the pod/task-definition.

Example for k8s:

- name: {{ template "app.fullname" . }}-exporter
  image: "sophos/nginx-vts-exporter:v0.10.0"
  env:
    - name: NGINX_STATUS
      value: http://localhost:9999/status/format/json
  ports:
    - name: prom
      containerPort: 9913

Note

You will need to adjust {{ template "app.fullname" . }} to the correct helm variables. It represents the app name you want to monitor.

RabbitMQ

Starting from version 3.8.0, RabbitMQ ships with built-in Prometheus & Grafana support. The Prometheus metric collector needs to be enabled via the rabbitmq_prometheus plugin. Head to the official documentation to know more on how to enable and use it.

Once the rabbitmq_prometheus plugin is enabled, the metrics port needs to be exposed in the RabbitMQ Pods and Service. RabbitMQ uses TCP port 15692 by default.

Then a ServiceMonitor is needed to instruct Prometheus to scrape the RabbitMQ service. Follow the instructions above to set up the correct ServiceMonitor.

At this point the RabbitMQ metrics should already be available in Prometheus. We can also deploy a RabbitMQ overview dashboard in Grafana, which displays detailed graphs and metrics from the data collected in Prometheus. Reach out to your Customer Lead in case you'd be interested in such dashboard.

Thanos

Architecture

We opt for this architecture: Deployment via Receive in order to scale out or integrate with other remote write-compatible sources, so that we can use the Prometheus remote write feature to send metrics to Thanos.

image:thanos

For a more in depth explanation of all the components.

How to enable Thanos in a cluster?

A Skyscrapers engineer can help you to enable Thanos and/or you can update your cluster definition file, through pull request:

spec:
  thanos:
    enabled: true
    receive:
      tenants:
        - <name of leaf cluster>
      pv_size: 100Gi
      tsdb_retention: 6h
    retention_1h: 30d
    retention_5m: 14d
    retention_raw: 14d

The receive pv_size and tsdb_retention are used to configure the thanos receive component. The tsdb retention is the amount of time that the data will be stored in the receive component. The retention_1h, retention_5m and retention_raw are used to configure the thanos store component. The retention_1h is the amount of time that the data will be stored in the store component for 1h resolution. The retention_5m is the amount of time that the data will be stored in the store component for 5m resolution. The retention_raw is the amount of time that the data will be stored in the store component for raw resolution.

How to enable Prometheus remote write?

A Skyscrapers engineer can help you to enable Prometheus remote write and/or you can update your cluster definition file, through pull request:

  cluster_monitoring:
    prometheus:
      remote_write:
        enabled: true
        name: thanos
        url: https://receive.thanos.<FQDN of Thanos cluster>/api/v1/receive

How to add Thanos rules?

  1. Create a yaml file with the rules you want to add. In the example we'll name it thanos-rules.yml. It will be reused in the next step. NOTE: The file needs to end with *.yml otherwise it will not be picked up by the thanos rules.
groups:
  - name: test-thanos-alertmanager
    rules:
      - alert: TestAlertForThanos
        expr: vector(1)
        labels:
          severity: warning
        annotations:
          summary: Test alert
          description: This is a test alert to see if alert manager is working properly with Thanos
  1. Overwrite the existing thanos-ruler-configmap in the infrastructure namespace with the new yaml file.
kubectl create configmap thanos-ruler-config --from-file=thanos-rules.yml -n infrastructure --dry-run=client -o yaml | kubectl replace -f -