How-to use the ECR Pull-Through Cache

This guide covers how to enable the ECR Pull-Through Cache for your clusters.

The default tier (quay.io and registry.k8s.io) is fully managed by Skyscrapers; there is nothing for you to do. The opt-in tiers (docker.io, ghcr.io, registry.gitlab.com) require credentials, which AWS only accepts in Secrets Manager. Skyscrapers does not provision Skyscrapers-owned credentials into customer accounts, so the customer creates and owns the credential secrets, and shares the ARN with Skyscrapers. Skyscrapers then configures the ARN into the platform.

The sections below split the work along those lines.


For the customer: opt-in tier credentials

Only follow this section if you want to enable one or more of the opt-in tiers (Docker Hub, ghcr.io, GitLab). If you only need the default tier, you can skip ahead; there is nothing for you to do.

Decide which tiers you need

TierCachesWhen to enable
docker.ioVelero, Grafana, Loki, Alloy, TempoRecommended: Docker Hub rate-limits anonymous pulls and Velero in particular has lost Verified Publisher status.
ghcr.ioFlux controllers, KEDA, Dex, Fluent-bit, WireGuard, kube-green, GHA Runner controller, Tailscale, Karpenter EIP-assignerRecommended: ghcr.io has no public rate limit today, but caching insulates you from outages.
registry.gitlab.comNone of the platform system images. Only enable if your own application workloads pull from GitLab Container Registry.Optional, application-driven.

Generate the access tokens

For each tier you enable, generate a credential in the source platform.

Docker Hub

A free Docker Hub account is sufficient for the platform system images. Use a Pro or Team account if you also intend to route images for your application workloads through the cache.

  1. Sign in at hub.docker.com.
  2. Account SettingsPersonal access tokensGenerate new token.
  3. Give it a descriptive name (e.g. ecr-ptc-<environment>) and Read-only scope.
  4. Copy the token; you will not be able to view it again.

GitHub Container Registry (ghcr.io)

  1. Sign in at github.com with a service account or a real user that has access to all the public images you need.
  2. SettingsDeveloper settingsPersonal access tokensTokens (classic)Generate new token (classic).
  3. Scope: only read:packages. No other scopes are needed for read-only image pulls.
  4. Set a 1-year (or shorter) expiration and copy the token.

Note

Use a classic PAT, not a fine-grained one. Fine-grained tokens currently don’t support the GitHub Container Registry. Ref: https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry

GitLab Container Registry

  1. In GitLab, Edit profileAccess tokensAdd new token.
  2. Scope: only read_registry.
  3. Set an expiration and copy the token.

Create the Secrets Manager secret(s)

Create one secret per tier in your SharedTooling AWS account (Usually called Shared, Tools, SharedTools, SharedTooling), in the same region as your EKS clusters. The cache module will reference these by ARN. A Skyscrapers engineer can help guide you, if you’re unsure which account to use.

# Docker Hub
aws secretsmanager create-secret \
  --profile CustomerSharedTooling \
  --name ecr-pullthroughcache/docker-hub \
  --description "Docker Hub credentials for ECR pull-through cache" \
  --secret-string '{"username":"<docker-hub-username>","accessToken":"<docker-hub-token>"}'

# ghcr.io
aws secretsmanager create-secret \
  --profile CustomerSharedTooling \
  --name ecr-pullthroughcache/ghcr \
  --description "GitHub Container Registry credentials for ECR pull-through cache" \
  --secret-string '{"username":"<github-username>","accessToken":"<github-pat>"}'

# GitLab
aws secretsmanager create-secret \
  --profile CustomerSharedTooling \
  --name ecr-pullthroughcache/gitlab \
  --description "GitLab Container Registry credentials for ECR pull-through cache" \
  --secret-string '{"username":"<gitlab-username>","accessToken":"<gitlab-pat>"}'

Important

The secret name must start with ecr-pullthroughcache/; AWS only allows pull-through cache rules to reference secrets matching this prefix. Any other prefix (ecr/, pullthroughcache/, etc.) will be rejected at apply time.

The JSON payload must use username and accessToken as the keys; password is not accepted by ECR.

Share the ARNs with Skyscrapers

Send the secret ARN(s) to Skyscrapers via your usual support channel. Each ARN looks like:

arn:aws:secretsmanager:<region>:<sharedtooling-account>:secret:ecr-pullthroughcache/docker-hub-XXXXXX

The ARN itself is not sensitive: it’s a pointer, the credential lives inside Secrets Manager and never leaves your account. Skyscrapers needs the ARN to configure it into the cache module.

Rotate the credentials when they expire

When a token expires, you regenerate it in the source platform and update the existing Secrets Manager secret in place, keeping the same name and ARN so no Skyscrapers-side changes are needed:

aws secretsmanager put-secret-value \
  --secret-id ecr-pullthroughcache/docker-hub \
  --secret-string '{"username":"<username>","accessToken":"<new-token>"}'

ECR picks up the new credentials on the next pull-through fetch: no Terraform re-apply, no platform restart.


For Skyscrapers: platform setup

This section covers the platform-side work: deploying the cache, threading any customer-provided secret ARNs in, and enabling the per-cluster config.

Deploy or update the ecr-pull-through-cache Terraform module

Note

For new customers this step will be handled during onboarding.

The module lives in skyscrapers/ecr-stack and is consumed via Terragrunt under the customer’s SharedTooling account. The module deploys the pull-through cache rules, repository creation templates, and registry-level cross-account policy.

Key inputs:

cluster_account_ids = ["111111111111", "222222222222", "333333333333"] # dev, staging, prod

# From the customer (omit any tier the customer hasn't opted into)
docker_hub_credentials_secret_arn = "arn:aws:secretsmanager:<region>:<sharedtooling-account>:secret:ecr-pullthroughcache/docker-hub-XXXXXX"
ghcr_credentials_secret_arn       = "arn:aws:secretsmanager:<region>:<sharedtooling-account>:secret:ecr-pullthroughcache/ghcr-XXXXXX"
gitlab_credentials_secret_arn     = "arn:aws:secretsmanager:<region>:<sharedtooling-account>:secret:ecr-pullthroughcache/gitlab-XXXXXX"

# Optional: extra principals beyond `*-workers` / `*-fargate` roles in cluster_account_ids
# (e.g. CI runner IRSA roles that warm the cache during builds).
additional_pull_through_principal_arns = [
  "arn:aws:iam::<account>:role/gitlab-runner-irsa",
]

The default tier (quay.io, registry.k8s.io) is enabled unconditionally, no input needed. Each opt-in tier is enabled the moment a credential ARN is set; leave the variable null to disable.

The module’s registry_url output is the value to use in the cluster definition (next step).

Enable the cache in the cluster definition

In the customer’s cluster definition YAML, add:

spec:
  ecr_pull_through_cache:
    enabled: true
    registry: "<sharedtooling-account>.dkr.ecr.<region>.amazonaws.com"
    docker_hub:
      enabled: true   # only when docker_hub_credentials_secret_arn is set above
    ghcr:
      enabled: true   # only when ghcr_credentials_secret_arn is set above

The registry value is the registry_url output of the cache module.

docker_hub.enabled and ghcr.enabled are independent toggles per cluster; a single cache can serve clusters that have different opt-in profiles. The GitLab tier has no per-cluster toggle because the platform system images do not use it; customer application workloads reference <registry>/gitlab/... directly from their own manifests.

Re-apply eks-addons for each cluster. On the next Flux reconcile, system component image references switch to <registry>/<prefix>/....

Verify the cluster is using the cache

Use audit-ecr-ptc.sh (in skyscrapers-tools/bin/) against the cluster’s kubeconfig:

./audit-ecr-ptc.sh <sharedtooling-account>.dkr.ecr.<region>.amazonaws.com

The script categorises every container, init container, and ephemeral container image into three buckets:

  • via PTC: images served from <registry>/<prefix>/...
  • exempt: images on public.ecr.aws or any other *.dkr.ecr.*.amazonaws.com (AWS-managed addons, customer app repos)
  • direct: images still pulling straight from upstream

A non-zero exit code indicates at least one image is bypassing the cache. Expected direct entries on a freshly-rolled cluster are:

  • HNC (gcr.io/k8s-staging-multitenancy/hnc-manager): not supported by ECR PTC.
  • Customer application workloads: these are not rewritten by the platform.

Anything else is a bug; open an issue against skyscrapers/kubernetes-stack.

Disable

To temporarily turn the cache off (e.g. while debugging), set enabled: false in the cluster definition and re-apply eks-addons. All image references revert to their upstream URLs and pulls go direct to quay.io / registry.k8s.io / docker.io / ghcr.io. The ECR cache itself stays in place; re-enabling is a single-line change.

To remove the cache entirely, set enabled: false on every cluster sharing it, then destroy the ecr-pull-through-cache Terraform module in the SharedTooling account. Customer-owned credential secrets stay untouched (Skyscrapers does not own those).

Last updated on