Grafana Tempo

This guide explains how to use Grafana Tempo for distributed tracing in your applications running on Kubernetes. It’s written for Skyscrapers customers who want to use tracing.

Overview

Grafana Tempo is a highly scalable, cost-effective tracing backend that integrates seamlessly with Grafana. It supports the OpenTelemetry Protocol (OTLP), making it easy to instrument your applications.

How Tempo is Set Up (High Level)

The Tempo setup in your Kubernetes cluster consists of:

  • Alloy as Collector: Applications send traces via the OpenTelemetry Protocol (OTLP) to the Alloy endpoint on the cluster.
  • Tempo backend: deployed as a single-tenant service inside your Kubernetes cluster with S3 storage.
  • Grafana integration: Traces can be visualized in Grafana and linked with logs and metrics.

The setup can be configured by enabling the observability.tempo.enabled flag in your cluster definition file.

Application Instrumentation

Demo Application

Want to see tracing in action? Deploy the Mythical Beasts demo application in your cluster. It comes pre-instrumented with OpenTelemetry and is a great way to explore tracing.

You can do this by following the instructions in the Tempo documentation here.

Note

Adjust the tracing endpoint in the mythical-beasts-deployment.yaml file to point to your Alloy service: alloy.observability.svc.cluster.local. You will also have to remove - name: AUTHUSER value: grafanaopsuser - name: AUTHPASSWORD value: <password> from the file. If you want to run the demo application in another namespace from default: create a temporary namespace and adjust it to all the ressources in the manifests.

Install OpenTelemetry SDK/Agent

Choose the library for your language/framework:

Configure Exporter

Each SDK/agent needs to know where to send traces:

OTEL_EXPORTER_OTLP_ENDPOINT=http://alloy.observability.svc.cluster.local:4317
OTEL_EXPORTER_OTLP_TRACES_INSECURE=true
OTEL_RESOURCE_ATTRIBUTES=service.name=<my-app>,service.namespace=<my-namespace> # optional

You can set these as environment variables in your Kubernetes Deployment manifest.

Example (Kubernetes snippet):

env:
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://alloy.observability.svc.cluster.local:4317"
  - name: OTEL_RESOURCE_ATTRIBUTES
    value: "service.name=<my-app>,service.namespace=<my-namespace>"

Auto vs Manual Instrumentation

👉 Recommended approach: Start with auto-instrumentation to get quick visibility. Then, gradually add manual instrumentation where you need deeper insights.

Auto-Instrumentation

What it is: Automatically hooks into common libraries and frameworks (HTTP servers, gRPC, database clients, message queues, etc.). When to use: Great for getting started quickly or when you want broad coverage with minimal code changes.

How to enable:

  • Java: Run your app with the OpenTelemetry Java agent: java -javaagent:opentelemetry-javaagent.jar -jar app.jar
  • Python: Use the CLI wrapper: opentelemetry-instrument python app.py
  • Node.js: Load the instrumentation before your app starts: node -r @opentelemetry/auto-instrumentations-node app.js

Pros: Fast to adopt, no or minimal code changes. Cons: Limited control over which spans are created and what metadata is added.

Manual Instrumentation

What it is: Adding explicit tracing code in your application. When to use: Useful when you need detailed insights into specific functions, business logic, or performance-sensitive areas. How to add (example in Go):

import "go.opentelemetry.io/otel"
import "go.opentelemetry.io/otel/trace"

var tracer = otel.Tracer("my-service")

func handleRequest(ctx context.Context) {
    ctx, span := tracer.Start(ctx, "handleRequest")
    defer span.End()

    // Your application logic here
}

Pros: Full control over what gets traced, ability to add meaningful custom attributes. Cons: Requires code changes and ongoing maintenance.

Viewing Traces in Grafana

  1. Open Grafana (provided by Skyscrapers).
  2. Go to Explore and select Tempo from the dropdown.
  3. Search for recent traces.
  4. Drill down into spans to see:
  • Latency breakdown
  • Parent/child relationships
  • Linked logs and metrics

Grafana Tempo View

Use cases

Tempo provides distributed tracing capabilities for your applications, allowing you to:

Track requests as they flow through your distributed system Identify performance bottlenecks and slow operations Visualize service dependencies and understand how services communicate Debug errors by seeing the complete context of failed requests Monitor application behavior during deployments and in production

Traces show you the complete journey of a request, including timing information for each service interaction, making it easier to understand system behavior and troubleshoot issues.

Example trace query

In Grafana Explore with Tempo selected:

  1. Use Search to find traces by service name, operation, or tags
  2. Click on a trace to see the complete request flow
  3. Analyze timing, errors, and service dependencies

Troubleshooting

Tempo pods not starting

Check pod status and logs:

kubectl describe pod -n observability <tempo-pod-name>
kubectl logs -n observability <tempo-pod-name>

Common issues:

  • Insufficient resources: Increase resources.requests in your configuration
  • Volume mounting issues: Verify volumeSize and storage class availability

No traces appearing in Grafana

  1. Verify Alloy is running and collecting traces:

    kubectl get pods -n observability | grep alloy
  2. Check that your applications are instrumented to send traces to Alloy

  3. Verify Tempo data source is configured in Grafana:

    • Go to Configuration > Data Sources in Grafana
    • Ensure Tempo is listed and the connection is successful

High memory usage

If Tempo pods are using excessive memory:

  1. Reduce the retention period to store fewer traces
  2. Increase memory limits in the resources configuration
  3. Consider increasing replicas to distribute the load

Resources

Last updated on