Monitoring

Overview

Be default, we use a Prometheus, Alertmanager and Grafana stack for a complete monitoring setup of both our clusters and applications running on them.

We make use of the CoreOS operator principle: by deploying prometheus-operator we define new Kubernetes Custom Resource Definitions (CRDs) for Prometheus, Alertmanager, ServiceMonitor, PodMonitor, PrometheusRule, etc., which are responsible for deploying Prometheus and Alertmanager setups, Prometheus scrape targets and alerting rules, respectively.

As of this writing there isn’t an operator setup yet for Grafana, but you can add custom dashboards dynamically via ConfigMaps.

  • Prometheus does the service monitoring and keeps time-series data.
  • Alertmanager is responsible for handling alerts, based on rules from Prometheus. In our case Alertmanager is responsible for making sure alerts end up in Slack and our escalation system(s).
  • Grafana provides visualization and dashboards of the Prometheus time-series data, as well as external resources (like AWS CloudWatch).

Accessing the monitoring dashboards

Prometheus, Alertmanager and Grafana are accessible via their respective dashboards. These dashboards are exposed via Ingresses, either private or public, depending on your cluster’s configuration. If the dashboards are set to be private, you’ll need to be connected to the cluster’s VPN in order to access them. These dashboards also require authentication, which is setup using an Identity Provider of choice (via DEX) during the initial Platform setup.

These dashboards can be reached via the following URLs (make sure to replace the use the correct cluster FQDN):

  • Grafana: https://grafana.production.eks.example.com
  • Prometheus: https://prometheus.production.eks.example.com
  • Alertmanager: https://alertmanager.production.eks.example.com

Your environment-specific URLs will also be documented in your own documentation repository.

Shared responsibility and alerting overview

Our monitoring stack comes with a big set of built-in alerts for both the platform infrastructure any workloads you deploy on it. Based upon our shared responsibility model, these alerts will be routed to either our on-call engineers or to your DevOps team.

Generally speaking, Skyscrapers is responsible for platform (AWS, EKS) monitoring, while you are responsible for running and monitoring the workloads you deploy on the platform.

Practically, from a setup point of view, we centrally define our built-in alerts via PrometheusRules which evaluate metrics for the whole EKS cluster, including Pod metrics for customer workloads. Depending on the namespace where the metrics originate from, our system routes the alerts to the appropriate receivers in Alertmanager.

Everything originating from the platform namespaces (see below) is routed to the Skyscrapers on-call system, while all other namespaces are routed to the customer receiver. This customer receiver is by default a Slack channel (of choice), but we can help you setup a better escalation system and alert routing.

Skyscrapers automatically responds to Critical alerts on platform alerts originating from clusters en cloud resources with the production SLA, through our on-call system. Application workload related alerts, by default sent to the customer’s Slack channel, are not responded or handled by Skyscrapers. However as customer you can decide you need help resolving an issue and use our escalation path to get Skyscrapers assistance.

At the moment of writing, the following namespaces are considered platform namespaces:

  • arc-system
  • cert-manager
  • concourse
  • flux-system
  • hnc-system
  • infrastructure
  • istio-system
  • keda
  • kube-node-lease
  • kube-system
  • nvidia-device-plugin
  • observability
  • sks-mgmt
  • vault

And we can optionally include more namespaces, for example for custom customer resources maintained by Skyscrapers (eg. Neo4j).

There are also a handful alerts, not originating from any specific namespaces, like Kubernetes API server related alerts, which are routed to the Skyscrapers on-call system as well.

Many alerts will also have a Runbook URL attached to it, which can be used by both Skyscrapers and customer engineers to help debugging and resolving the issue. General runbooks can be found in the upstream kube-prometheus documentation, while Skyscrapers also maintains specific runbooks for our custom alerts in this documentation site. Our ongoing goal is to gradually have runbooks defined for every alert we provide.

Kubernetes application monitoring

You can also use Prometheus to monitor custom metrics from your application workloads and get alerts when something goes wrong. In order to do that you’ll need to define your own ServiceMonitors and PrometheusRules. There are two requirementes though:

  • All ServiceMonitors and PrometheusRules you define need to have the prometheus label (any value will do).

  • The namespace where you create your ServiceMonitors and PrometheusRules need to have the prometheus label too (any value will do).

    kubectl label namespace yournamespace prometheus=true

Custom Grafana Dashboards can be created through the Grafana UI or by creating a ConfigMap with a grafana_dashboard label (any value will do), containing the dashboard’s json data (see the example ConfigMap). Please make sure that "datasources" are set to "Prometheus" in your dashboards!

Note

even if these objects are created in a specific namespace, Prometheus can scrape metric targets in all namespaces.

It is possible to configure alternative routes and receivers for Alertmanager. This is done in your cluster definition file under the addons section. Example:

spec:
 cluster_monitoring:
   alertmanager:
     custom_routes: |
       - match:
           namespace: my-namespace
         receiver: custom-receiver
  • Upstream documentation

    spec:
     cluster_monitoring:
       alertmanager:
         custom_receivers_payload:
           | # (The whole yaml block should be encrypted via KMS with the context 'k8s_stack=secrets')
           - name: custom-receiver
             webhook_configs:
               - send_resolved: true
                 url: <opsgenie_api_url>
  • Upstream documentation

Note

This configuration can be made by creating a PR to your repo (optional), and/or communicated to your Customer Lead because this needs to be rolled out to the cluster.

Example ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    prometheus: application
  name: myapplication-php-fpm-exporter
  namespace: mynamespace
spec:
  endpoints:
    - targetPort: 8080
      interval: 30s
  namespaceSelector:
    matchNames:
      - production
  selector:
    matchLabels:
      app: application
      component: php

You can find more examples in the official Prometheus Operator documentation or get inspired by the ones already deployed in the cluster:

kubectl get servicemonitors --all-namespaces

Example PrometheusRule

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: application
  name: myapplication-alert-rules
  namespace: mynamespace
spec:
  groups:
  - name: myapplication.rules
    rules:
    - alert: MyApplicationPhpFpmDown
      expr: phpfpm_up{job="application"} == 0
      for: 10m
      labels:
        severity: critical
        namespace: namespace
      annotations:
        description: '{{`{{ $value }}`}}% of {{`{{ $labels.job }}`}} PHP-FPM are down!'
        summary: PHP-FPM down
        runbook_url: 'https://github.com/myorg/myapplication/tree/master/runbook.md#alert-name-myapplicationphpfpmdown'

You can find more examples in the official Prometheus Operator documentation or get inspired by the ones already deployed in the cluster:

kubectl get prometheusrules --all-namespaces

Note

We use the namespace label in the alerts, to distinguish between infrastructure and application alerts, and distribute them to the appropriate receivers. If the namespace label is set to any of the namespaces we are responsible for (e.g. infrastructure, cert-manager, keda, istio-system, …), or if the alert doesn’t have a namespace label (a sort of catch-all), we consider them infrastructure, and are routed to our on-call system and Slack channels. Otherwise the alerts are considered application alerts and routed to the #devops_<customername>_alerts Slack channel (and any other receivers that you define in the future). This label can be “hardcoded” as an alert label in the alert definition, or exposed from the alert expression. This is important since some alerts span multiple namespaces and it’s desirable to get the namespace from which the alert originated. [!TIP] we highly recommend including a runbook_url annotation to all alerts so the engineer that handles those alerts has all the needed information and can troubleshoot issues faster.

Using Grafana to fire alerts

Aside from PrometheusRules, you can also leverage the Grafana alerting system to fire alerts based on the data you have in Prometheus or external connection. This is done by creating a Grafana dashboard (or setting up Alert Rules) and setting up alerts on it.

You can start with creating the alert in the Grafana UI, and once you’re happy with it you can export it as yaml and put it in a ConfigMap and deploy it automatically together with your application (see this section on how to do so).

By default Skyscrapers managed Grafana has a built in integration with Alertmanager as a Contact Point. Make sure to use the Main Alertmanager as the contact point for your alerts. To make sure that Grafana persistently stores the alerts, you need to make sure the Grafana workload has persistent storage. You can do this by setting the spec.cluster_monitoring.grafana.persistence.enabled to true in your cluster definition file.

This means that you can set up alerts in Grafana and they will be sent to Alertmanager, which will then route them to the correct receiver.

Example Grafana Alert

For Grafana to pick up your alert ConfigMap, you just need to label it with grafana_alert (any value will do):

The following example alert will trigger when unexpected IPs are found in the logs of our public nginx ingress:

kind: ConfigMap
apiVersion: v1
metadata:
  name: http-errors-nginx-ingress-alert-rule
  namespace: infrastructure
  labels:
    grafana_alert: "logs-nginx-http-errors"
data:
  rules.yaml: |
    groups:
      - name: logs-nginx-http-errors
          folder: Ingress Alerts
          interval: 5m
          rules:
            - title: Unexpected HTTP status codes discovered in the nginx-ingress logs
              condition: A
              data:
                - refId: A
                  queryType: range
                  relativeTimeRange:
                    from: 600
                    to: 0
                  datasourceUid: P8E80F9AEF21F6940
                  model:
                    datasource:
                        type: loki
                        uid: P8E80F9AEF21F6940
                    editorMode: code
                    expr: |-
                        sum by () (
                          count_over_time(
                            {app_kubernetes_io_name="nginx-ingress"}
                            | json
                            | httpRequest_status !=~ "^2\\d\\d$"
                          [5m])
                        )
                    instant: true
                    intervalMs: 1000
                    maxDataPoints: 43200
                    queryType: range
                    refId: A
              noDataState: OK
              execErrState: KeepLast
              for: 5m
              annotations:
                summary: Unexpected HTTP status codes are found in the logs of our public nginx ingress
              labels:
                severity: "critical"
              isPaused: false
              notification_settings:
                receiver: Main Alertmanager

Example Grafana Dashboard

You can create Grafana dashboards directly in the Grafana UI as well as create them as ConfigMaps in your Kubernetes cluster. The ConfigMap method is useful if you want to version control your dashboards, or if you want to deploy them automatically together with your application.

For Grafana to pick up your dashboard ConfigMap, you just need to label it with grafana_dashboard (any value will do):

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    grafana_dashboard: application
  name: grafana-dashboard-myapplication
  namespace: mynamespace
data:
  grafana-dashboard-myapplication.json: |
{ <Grafana dashboard json> }

You can get inspired by some of the dashboards already deployed in the cluster:

kubectl get configmaps -l grafana_dashboard --all-namespaces

AWS services monitoring

AWS services can also be monitored via Prometheus and Alertmanager, like the rest of the cluster. To do so we use the Prometheus cloudwatch-exporter, which imports CloudWatch metrics into Prometheus. From there we can build alerts that trigger when some conditions happen.

The cloudwatch-exporter is not deployed by default as a base component of the reference solution, as it’s highly dependent on the customer needs.

We provide pre-made Helm charts for some AWS resources, like RDS, Redshift and Elasticsearch, which deploy the cloudwatch-exporter and some predefined alerts, but additional cloudwatch-exporters can be deployed to import any other metrics needed. Be aware that exporting data from Cloudwatch is quite costly, and every additional exported metric requires additional API calls, so make sure you only export the metrics you’ll use.

Normally, you’ll deploy a different cloudwatch-exporter for each AWS service that you want to monitor, as each of them will probably require different period configurations.

You’ll also need an IAM role for the cloudwatch-exporter with the following policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:ListMetrics",
        "cloudwatch:GetMetricStatistics",
        "tag:GetResources"
      ],
      "Resource": "*"
    }
  ]
}

Note

A single role can be used for all cloudwatch exporters deployed on the same cluster.

Recommendations and best practices

Prometheus labels

As the Prometheus documentation states:

Use labels to differentiate the characteristics of the thing that is being measured:

  • api_http_requests_total - differentiate request types: type=“create|update|delete”
  • api_request_duration_seconds - differentiate request stages: stage=“extract|transform|load”

Caution

Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

Not following that advise can cause the whole Prometheus setup to become unstable, go out of memory and eventually cause collateral damage on the node that’s running on. A good example can be found in this Github issue.

This is how Prometheus would perform with “controlled” metric labels:

prometheus-healthy

Vs. a Prometheus with “uncontrolled” metric labels:

prometheus-unhealthy

Prometheus scrapers for common technologies

Here’s a list of Prometheus scrapers already available for common frameworks.

PHP

Prometheus has a native client library for PHP. This is easy to implement inside your application, but by default doesn’t expose standard metrics. This is ideal if you want to expose your own metrics (can also be business metrics).

PHP-FPM

If you want to get PHP-FPM metrics, we recommend using this exporter. It’s being actively maintained and the documentation is reasonably good. You have to setup the exporter as a sidecar container in your pods/task-definition, then it’ll access the PHP-FPM socket to read statistics and expose them as Prometheus metrics.

You first need to expose the metrics in PHP-FPM. You can do this by adding the following config to your PHP-FPM image.

pm.status_path = /status

Then you’ll need to add the php-fpm-exporter as a sidecar container to your pod/task-definition.

Here is an example for k8s:

- name: {{ template "app.fullname" . }}-fpm-exporter
  image: hipages/php-fpm_exporter:2.2.0
  env:
    - name: PHP_FPM_SCRAPE_URI
      value: "tcp://127.0.0.1:{{ .Values.app.port }}/status"
  ports:
    - name: prom
      containerPort: 9253
      protocol: TCP
  livenessProbe:
    tcpSocket:
      port: prom
    initialDelaySeconds: 10
    periodSeconds: 5
  readinessProbe:
    tcpSocket:
      port: prom
    initialDelaySeconds: 10
    timeoutSeconds: 5
  resources:
    limits:
      cpu: 30m
      memory: 32Mi
    requests:
      cpu: 10m
      memory: 10Mi

Note

You’ll need to adjust {{ template "app.fullname" . }} and {{ .Values.app.port }} to the correct helm variables. The first one represents the app name we want to monitor. The second is the php-fpm port of the application.*

Ruby

Prometheus has a native client library for Ruby. This is a really good library when you run your ruby application as single process. Unfortunately a lot of applications use Unicorn or another multi-process integration. When using a multi-process Ruby you can best use the fork of GitLab.

You have to integrate this library in your application and expose it as an endpoint. Once that is done, you can add a ServiceMonitor to scrape it.

Workers

Workers by default don’t expose a webserver to scrape from. This will have to change and every worker will need to expose a simple webserver so that Prometheus can scrape its metrics.

It is really discouraged to use Pushgateway for this. For more info why this is discouraged, see the Pushgateway documentation.

RabbitMQ

Starting from version 3.8.0, RabbitMQ ships with built-in Prometheus & Grafana support. The Prometheus metric collector needs to be enabled via the rabbitmq_prometheus plugin. Head to the official documentation to know more on how to enable and use it.

Once the rabbitmq_prometheus plugin is enabled, the metrics port needs to be exposed in the RabbitMQ Pods and Service. RabbitMQ uses TCP port 15692 by default.

Then a ServiceMonitor is needed to instruct Prometheus to scrape the RabbitMQ service. Follow the instructions above to set up the correct ServiceMonitor.

At this point the RabbitMQ metrics should already be available in Prometheus. We can also deploy a RabbitMQ overview dashboard in Grafana, which displays detailed graphs and metrics from the data collected in Prometheus. Reach out to your Customer Lead in case you’d be interested in such dashboard.

Last updated on