Terragrunt
Welcome! If you’re here, it means you’re interested in contributing to our infrastructure as code. We encourage and support your involvement. Should you have any questions or need further clarification, please don’t hesitate to reach out for assistance and training.
- Folder structure
- Terragrunt configuration
- Formatting
- Naming
- AWS authentication
- Usage
- Atlantis
- Install
- Secrets
- Modules
Folder structure
Terragrunt projects should be organized using the following structure:
<repository root>
└── k8s-clusters # Cluster definitions and the YAML files they publish
└── (cluster-name).yaml # EksCluster CRD-like definition consumed by the cluster stack
└── (cluster-name)-platform-info.yaml # Auto-generated; published by the cluster stack on apply
└── terraform
└── live # Contains the live representation of the infrastructure
└── default # Generic and cross-environment stacks
└── bootstrap # Base stack added via template
└── (opentofu files)
└── terragrunt.hcl # Terragrunt configuration
└── sso # AWS IAM/SSO configuration
└── (opentofu files)
└── terragrunt.hcl # Terragrunt configuration
└── production
└── my-first-application # Application / workload stack
└── terragrunt.hcl # Terragrunt configuration
└── my-other-application # Application / workload stack
└── terragrunt.hcl # Terragrunt configuration
└── eks
└── (cluster-name)
└── addons
└── terragrunt.hcl # Terragrunt configuration
└── cluster
└── terragrunt.hcl # Terragrunt configuration
└── custom-addons
└── terragrunt.hcl # Terragrunt configuration
└── networking
└── base
└── terragrunt.hcl # Terragrunt configuration
└── peering
└── terragrunt.hcl # Terragrunt configuration
└── route53
└── (opentofu files)
└── terragrunt.hcl # Terragrunt configuration
└── env.hcl # Contains environment-specific Terragrunt config
└── ...
└── ...
└── .gitignore
└── aws_provider.hcl # Terragrunt configuration that defines the common AWS opentofu provider
└── eks_provider.hcl # Terragrunt configuration that defines the common EKS opentofu provider
└── terragrunt.hcl # Main Terragrunt configuration
└── modules
└── my-first-application # Application / workload module
└── (opentofu files)
└── my-other-application # Application / workload module
└── (opentofu files)
└── networking-peering # VPC peering module
└── (opentofu files)
Standards we use
- If code is re-used across multiple environments it should be made into a module. See also modules.
- Don’t hardcode IDs, ARNs, or other values that are produced by another stack. Read them from the appropriate source — see Consuming outputs from other stacks.
Terragrunt configuration
Terragrunt configuration is defined in a terragrunt.hcl file. This uses the same HCL syntax as OpenTofu itself.
Example for a terragrunt file that creates an RDS in a Skyscrapers-managed VPC and EKS cluster:
include "root" {
path = find_in_parent_folders("root.hcl")
}
include "env" {
path = find_in_parent_folders("env.hcl")
expose = true
}
include "aws_provider" {
path = find_in_parent_folders("aws_provider.hcl")
}
terraform {
source = "${get_path_to_repo_root()}/terraform/modules/rds"
}
locals {
platform_info = yamldecode(file("${get_repo_root()}/k8s-clusters/${include.env.inputs._eks_provider_cluster_name}-platform-info.yaml"))
}
inputs = {
vpc_id = local.platform_info.networking.vpcId
db_subnet_ids = local.platform_info.networking.privateDbSubnetIds
cluster_workers_sg_id = local.platform_info.eksCluster.nodes.securityGroupId
engine_mode = "provisioned"
engine_version = "5.7.mysql_aurora.2.11.1"
instance_class = "db.r5.xlarge"
instances = { 1 = {} }
}Consuming outputs from other stacks
There are three ways to read values produced by another stack — and they aren’t interchangeable. Pick the one that matches the kind of stack you’re reading from.
Cluster and networking metadata: platform-info
For VPC IDs, subnet IDs, EKS worker security groups, OIDC provider ARNs, NAT gateway IPs and the like — anything published by a Skyscrapers-managed eks-cluster or networking-stack apply — read from the *-platform-info.yaml files at the repo root. The cluster stack writes these on every apply, so the file in master is always in sync with reality.
locals {
platform_info = yamldecode(file("${get_repo_root()}/k8s-clusters/${include.env.inputs._eks_provider_cluster_name}-platform-info.yaml"))
}
inputs = {
vpc_id = local.platform_info.networking.vpcId
private_db_subnets = local.platform_info.networking.privateDbSubnetIds
cluster_workers_sg_id = local.platform_info.eksCluster.nodes.securityGroupId
}Note
The include "env" block needs expose = true so the locals block can read include.env.inputs._eks_provider_cluster_name. The cluster name is set per-environment in <env>/env.hcl.
A non-exhaustive map of what’s available:
| Use case | Path under local.platform_info |
|---|---|
| VPC ID | networking.vpcId |
| VPC CIDR | networking.vpcCidrBlock |
| Private app subnets | networking.privateAppSubnetIds |
| Private DB subnets | networking.privateDbSubnetIds |
| Public LB subnets | networking.publicLbSubnetIds |
| Private/public route tables | networking.privateRouteTableIds / publicRouteTableIds |
| NAT gateway IPs | networking.natGatewayIps |
| Availability zones | networking.availabilityZones |
| EKS worker SG ID | eksCluster.nodes.securityGroupId |
| EKS worker private subnets | eksCluster.nodes.privateSubnetIds |
| EKS OIDC provider ARN | eksCluster.oidc.providerArn |
| Cluster FQDN / endpoint / version | eksCluster.fqdn / endpoint / version |
If a stack needs values from more than one cluster (a shared service that two clusters’ workers ingress to, for example), pass the file paths in via inputs and decode in TF code:
# terragrunt.hcl
inputs = {
cluster_platform_info_files = [
"${get_repo_root()}/k8s-clusters/development-eks-example.yaml",
"${get_repo_root()}/k8s-clusters/production-eks-example.yaml",
]
}# main.tf
locals {
cluster_platform_info = [for path in var.cluster_platform_info_files : yamldecode(file(path))]
worker_sg_ids = [for c in local.cluster_platform_info : c.eksCluster.nodes.securityGroupId]
}Customer-local stacks: terragrunt dependency
For stacks that live inside the same customer repo and produce outputs another stack needs — application-to-application wiring, an SES identity ARN consumed by a few apps, an OIDC provider, a Route53 zone ID, etc. — use a normal terragrunt dependency block:
dependency "ses" {
config_path = "../ses"
}
inputs = {
ses_identity_arn = dependency.ses.outputs.domain_identity_arn
}This pattern requires the upstream stack’s source to be cloneable from the machine running terragrunt, so it’s the right choice only when the upstream is in your own repo. Don’t point a dependency at a Skyscrapers-managed stack — use platform-info instead.
Reading raw state: terraform_remote_state
When the upstream is a stack you operate yourself but you don’t want a hard dependency (typically because the upstream is consumed by many siblings and you want each consumer to evaluate independently, or because the upstream sources from a private repo you don’t want to clone every time you plan), read its state directly from S3.
This is purely an OpenTofu construct, so it goes in your .tf files, not in terragrunt.hcl:
data "terraform_remote_state" "shared_networking" {
backend = "s3"
config = {
profile = var.tf_state_aws_profile
region = var.tf_state_aws_region
bucket = var.tf_state_bucket
key = "shared/networking/base"
}
}
# Use:
# data.terraform_remote_state.shared_networking.outputs.vpc_idtf_state_aws_profile, tf_state_aws_region and tf_state_bucket are already injected as terragrunt inputs from root.hcl — you only need to declare them as variables in your stack.
If you’re consuming a remote state from inside a module that’s used by several stacks, take a <thing>_state_key variable so each caller can point at its own upstream:
variable "networking_state_key" {
description = "S3 state key of the networking/base stack to read VPC info from"
type = string
}
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
profile = var.tf_state_aws_profile
region = var.tf_state_aws_region
bucket = var.tf_state_bucket
key = var.networking_state_key
}
}# in the caller's terragrunt.hcl
inputs = {
networking_state_key = "shared/networking/base" # or "production/networking/base", etc.
}Choosing between the three
| Reading from… | Use |
|---|---|
A Skyscrapers-managed cluster or networking stack (anything that publishes *-platform-info.yaml) | platform-info via yamldecode(file(...)) |
| Another stack in your own customer repo (apps, SES, OIDC, Route53, etc.) | terragrunt dependency block |
A stack you run yourself but where you don’t want a dependency constraint (e.g. shared-VPC infrastructure read by many consumers) | data "terraform_remote_state" in .tf code |
Rule of thumb: if the upstream is not code you and your team can git clone and run, you must not use dependency against it — fall back to platform-info or terraform_remote_state.
Formatting
OpenTofu code should always be formatted by tofu fmt -recursive ./. This command will take care of all indentation, alignment, …
Note
Having the Terraform plugin installed also helps with this.
Variables and outputs should have a clear description what it is for and the expected format. For example:
variable "client_sg_ids" {
description = "Security group IDs for client access to the Structr instance(s)"
type = list(string)
default = null
}output "elb_dns_name" {
description = "ELB DNS name for the frontend"
value = module.elb.elb_dns_name
}Naming
Resources, variables and outputs should use _ as a separator.
Other than the general naming guidelines, OpenTofu resource names should:
- be truncated automatically if they are longer than the maximum allowed length
- not be suffixed with the type (eg.
"aws_iam_role" "billing"vs"aws_iam_role" "billing_role") as this is redundant already with the resource type. This also let’s you keep names shorter, making it less likely to hit the character limit
And OpenTofu variables and outputs should:
- end with the type they’re referring to, for example if the output is an instance ID, its name should be
my_instance_id, notmy_instance. This makes it much more clear what the actual output is. - be singular if they’re a single string or number, and plural if they’re a list. For example, if an output contains a list of instance IDs, its name should be
my_instance_ids.
AWS authentication
To authenticate OpenTofu to AWS, we use a delegated access approach. Instead of accessing direclty an “ops” account with some set of credentials, we authenticate with an “admin” account and configure the OpenTofu AWS provider to assume an admin role in the target “ops” account. See the diagram below.
1. User with
access to
admin account
+
|
|
v
+-----+-----+
| |
| OpenTofu +-------------+
| | |
+---+-+-----+ |
| | |
| | |
| | | Direct access to the
3. Assumed role | | 2. Assume | OpenTofu state S3
with temp. | | role in | bucket and DynamoDB table
credentials | | ops staging |
+---------+ | |
| v |
| +-----+------+ |
| | | |
| | Admin +<-----------+
| | account |
| | |
| +------------+
|
v
+-----+------+ +------------+
| | | |
| Ops | | Ops |
| staging | | production |
| account | | account |
| | | |
+------------+ +------------+Each customer has an “admin” account and at least one “ops” account. The “admin” account is where the OpenTofu state is stored and where all the IAM users that need access to the infrastructure are created. The “ops” accounts are the ones containing the actual operational resources, like EC2 instances, load balancers, etc. Ideally, the “ops” accounts don’t have IAM users with direct access, instead there are multiple IAM roles with different set of capabilities, which can be assumed by users from the “admin” account.
Following a least privilege approach the user running OpenTofu should have a set of credentials configured to access the “admin” account, with just the following permissions:
- access to the S3 bucket containing the OpenTofu state files
- access to the DynamoDB table containing the OpenTofu state locks
- permission to assume a more privileged role in the target “ops” accounts
These are some of its benefits:
- we don’t have to manage and secure static credentials with direct admin access to each “ops” accounts.
- the provided credentials from the assumed role last for just an hour, so it’s more difficult that they get compromised.
Usage
Terragrunt is a layer on top of OpenTofu. Therefore all the commands that can be used in OpenTofu can also be added to the terragrunt command. More info: https://terragrunt.gruntwork.io/docs/getting-started/quick-start/
Atlantis
Install
You can install OpenTofu and Terragrunt through the package manager that you use. More info:
- https://opentofu.org/docs/intro/install/
- https://terragrunt.gruntwork.io/docs/getting-started/install/
Caution
We lock the version of OpenTofu in order not to accidentally apply breaking changes. This is done in the code in the required_providers section.
Caution
Make sure you don’t have Terraform installed on your machine.
If you have, you can set TERRAGRUNT_TFPATH=tofu so Terragrunt will use the OpenTofu binary instead of the Terraform binary.
Secrets
All secrets such as passwords, certificates, … must be encrypted. You can do this using:
- KMS, see the official docs how.
- SOPS with KMS backend, See the official docs how
You can re-use the KMS key used for OpenTofu encryption documented in the customer’s documentation. Usually this key is created through Terragrunt in the general stack.
SOPS
In order to work with SOPS in Terragrunt stacks the following components are needed:
A .sops.yaml file in corresponding environment folder (eg. terraform/live/production/.sops.yaml ) with the following config:
---
creation_rules:
- kms: "arn:aws:kms:eu-west-1:123456789012:key/11111111-111-1111-1111-111111111111"
role: arn:aws:iam::123456789012:role/ops/admin
aws_profile: <customer>SharedToolingThe following lines need to be added to the terragrunt.hcl file where you want to load in these secrets:
locals {
secret_vars = yamldecode(sops_decrypt_file("./secrets.yaml"))
}
inputs = merge({
...
}, local.secret_vars)To create the sops secret file you can just run sops secrets.yaml in the folder where you want the secret to be saved. The content of this file is best structured as yaml.
Important
Don’t forget to add the secrets.yaml file to the .gitignore file so it can be taken up into the git repository.
Modules
If you can re-use a set of OpenTofu code, consider adding it as a module.
By default we try to use the upstream modules available in the Terraform and OpenTofu communities: https://github.com/opentofu/registry/tree/main?tab=readme-ov-file https://registry.terraform.io/browse/modules. In the case there is no upstream module available we also created some modules ourselves. You can find them all on GitHub: https://github.com/skyscrapers?utf8=%E2%9C%93&q=terraform-&type=&language=hcl.
Each module must have a README.md consisting of:
- A description of what it does.
- Which requirements does the module need.
- Configuration parameter documentation (autogenerated).
Documentation
You should use terraform-docs to automatically generate a variable table from OpenTofu variables for use in documentation.
Use the following parameters:
terraform-docs markdown --sort-by required --escape=false <folder>You can easily create a function for this which also copies the output to your clipboard. For example
tf-docs () { terraform-docs markdown --sort-by required --escape=false $1 | <your OS's clipboard> }