Platform & Application Responsibility Definitions

This document defines the operational boundary between what Skyscrapers manages as platform and what customers own as application. It clarifies monitoring, alerting, and response responsibilities for both parties.

For a visual overview of how these responsibilities are distributed across service tiers, see the Shared Responsibility Model.

The three layers

We distinguish three layers in every environment we manage. Each layer has a clear owner responsible for its operation, monitoring, and incident response.

Infrastructure

The foundational cloud resources that the platform runs on.

Owner: Skyscrapers (provisioned and managed via IaC)

This includes the AWS account structure, VPCs, subnets, NAT gateways, IAM roles and policies, and managed technology components such as RDS instances, OpenSearch clusters, S3 buckets, and other resources provisioned through Skyscrapers’ infrastructure-as-code. These resources carry the maintainer=skyscrapers tag in AWS.

Infrastructure resources are managed at the cloud provider level and have different alerting characteristics than platform components running inside Kubernetes. We separate them because the monitoring approach, tooling, and response procedures differ — even though both layers are operated by Skyscrapers.

Platform

The Kubernetes cluster and all Skyscrapers-managed add-ons that provide a production-ready runtime for customer workloads.

Owner: Skyscrapers

This includes the K8s control plane, worker nodes and node pools (managed via Karpenter), and all components deployed in Skyscrapers-managed namespaces (identifiable by the maintainer=skyscrapers label). Concrete examples are Traefik, cert-manager, external-dns, the monitoring stack (Prometheus, Alertmanager, Grafana), the logging stack (Fluent Bit, Loki), Vault, Dex, VPA, Velero, CoreDNS, and any other cluster add-on Skyscrapers operates. The full list is available here: Skyscrapers Services.

Skyscrapers provides and maintains these components as a runtime platform. Customers consume the capabilities they expose (e.g. creating Ingress resources, deploying applications, requesting certificates, defining PodMonitors,…) but do not operate the underlying components.

Application

Everything the customer deploys onto the platform to serve their end users.

Owner: Customer

This includes all workloads deployed by the customer (eg Deployments, StatefulSets, Jobs, CronJobs, etc.), application-level Kubernetes resources such as Ingresses, Services, ConfigMaps, Secrets, HPA/VPA configurations, and NetworkPolicies, application endpoints and health checks, CI/CD pipelines and their configurations, and any cloud resources not provisioned or managed by Skyscrapers.

Monitoring & alerting responsibilities

How ownership is determined

The boundary between platform and application alerting is determined by namespace labels, not by the type of metric or resource. All Kubernetes namespaces carrying the maintainer=skyscrapers label are classified as platform. Alerts originating from these namespaces are routed to Skyscrapers’ internal alert pipeline (infra-alerts). Everything else is routed to the customer’s alert channel (app-alerts).

You can verify which namespaces are platform-managed on your cluster:

kubectl get ns -l maintainer=skyscrapers

For managed technology components outside Kubernetes (such as RDS, OpenSearch, S3), ownership is determined by the maintainer=skyscrapers AWS resource tag.

A full list of all active alerts and their current state can be found in the Prometheus UI available on each cluster. For details on specific alerts, including descriptions and recommended actions, see the Runbook.

What Skyscrapers monitors and responds to

Skyscrapers monitors and responds to alerts concerning the availability and health of platform components and managed infrastructure.

Concretely, Skyscrapers actively responds to:

Component availability: platform add-ons being down or unhealthy (e.g. ingress controller, cert-manager, Prometheus, Loki, CoreDNS)
Cluster and node availability: EKS control plane issues, nodes not ready, node pool scaling failures
Disk space: node filesystem space filling up, persistent volume space for platform components
Managed technology component health: RDS instance availability, OpenSearch cluster health, backup job failures
Platform error rates: elevated error rates on platform components

For production environments, critical alerts go through Skyscrapers’ on-call escalation system (24/7) and are acted upon by a Skyscrapers engineer. Warning-level alerts are tracked in Skyscrapers’ internal monitoring channel during business hours, but are not guaranteed to be acted upon.

For non-production environments (staging, development), alerts are logged in Slack and handled on a best-effort basis during business hours. There is no 24/7 on-call coverage for non-production environments.

What Skyscrapers does NOT respond to

Skyscrapers does not monitor or respond to alerts that fall within the application layer. Examples include:

Resource usage alerts: high CPU or memory consumption on application pods, K8s node CPU usage driven by application workloads, RDS CPU usage
Application scaling limits: HPA reaching maximum replicas, pod pending due to insufficient requested resources
Application-level failures: failing health checks on customer endpoints, failing service monitors, certificate errors on customer-deployed Ingresses, application crash loops
Single OOMKill events: individual out-of-memory kills on application pods are not acted upon by Skyscrapers. Customers are responsible for tuning resource requests and limits for their workloads. For platform components, Skyscrapers has separate alerts in place (such as SystemPodCrashLooping and SystemTargetDown) to detect sustained service degradation.
Queue and messaging backlogs: SQS queue depth growing, message processing delays
Application performance: slow response times, elevated error rates on customer services
Customer-managed cloud resources: any AWS/cloud resource not provisioned by Skyscrapers

The supporting role

While application monitoring is the customer’s responsibility, Skyscrapers plays a supporting role. Customers can escalate application issues to Skyscrapers through the normal support process when they need help with troubleshooting, root cause analysis, or platform-related investigation. See the Support Process for details.

Alert routing

Skyscrapers configures two default alert routes in Alertmanager:

Route	Scope	Severity: Info	Severity: Warning	Severity: Critical
`infra-alerts`	Namespaces with `maintainer=skyscrapers`	Not routed	Slack (Skyscrapers internal — customer notified if potential impact)	On-call escalation + incident management
`app-alerts`	All other namespaces	Not routed	Customer alert channel	Customer alert channel

Customers are expected to define an appropriate escalation path for their own alerts. During onboarding, Skyscrapers configures the customer’s preferred alert destination (the default is a shared Slack channel). Customers can request custom routes and endpoints for different severity levels. If you want to configure this further, get in touch with us.

Summary

	Platform (Skyscrapers)	Application (Customer)
Scope	K8s platform + add-ons in `maintainer=skyscrapers` namespaces, managed technology components with `maintainer=skyscrapers` AWS tags	All customer-deployed workloads, customer-managed cloud resources
Monitors	Component health, cluster/node availability, disk space, backup success, platform error rates	Application health, endpoint availability, queue depths, business metrics, application performance
Responds to	Platform outages, node failures, disk pressure, managed service unavailability	Application errors, scaling limits, misconfigurations, performance degradation
Does NOT respond to	Application workload issues, customer resource misconfigurations, single OOMKill events	Platform component failures (escalate to Skyscrapers)
Escalation	Internal on-call (24/7 for critical, production). Non-production: business hours, best-effort.	Customer-defined escalation path, with Skyscrapers as support escalation

Last updated on February 26, 2026

DevOps-as-a-Service: Platform Users and User Roles Runbook