From 8871c6ac3b86a88f8f38761c20ad2194396935e2 Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Thu, 28 May 2026 16:47:07 +0100 Subject: [PATCH 01/12] chore: ask to delete placeholders when adapting --- REPO_INSTRUCTIONS.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/REPO_INSTRUCTIONS.md b/REPO_INSTRUCTIONS.md index d138c115..81bf8d96 100644 --- a/REPO_INSTRUCTIONS.md +++ b/REPO_INSTRUCTIONS.md @@ -113,6 +113,15 @@ look at ../sandbox and tell me how to deploy - in this repo, default that assumption to a Lambda-backed API unless the user asks for ECS, long-running workers, containers, or another specific runtime - state that assumption and ask for confirmation before making changes when backend choice materially affects infrastructure shape, cost, or security +## Replacement Requests + +- when the user asks to deploy or adapt an external app and says to proceed, determine whether the work is additive or a replacement unless the intent is already clear +- if the user says the work is a replacement, remove placeholder/demo code, docs, local services, infra stacks, workflow surface, and stale runtime paths that no longer serve the selected app shape +- do not keep unused demo capabilities just because they existed in the template +- still confirm before removing expensive or shared infrastructure capabilities unless the user explicitly names them for removal +- when removal would affect major capabilities, briefly list what would remain and what would be removed before editing +- record durable replacement decisions in `BOOTSTRAP_DECISIONS.md` when they affect future app-shaping work + ## New Repo Bootstrap Requests - when a request suggests the user is adapting this repo as a fresh app or new project, first determine whether this is a new repo/bootstrap scenario or a change to an existing app From a26049f4e7f7bf1bd87de885b48373c9558642ec Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Thu, 28 May 2026 16:52:11 +0100 Subject: [PATCH 02/12] chore: app shaping flow --- REPO_INSTRUCTIONS.md | 46 ++++++++++++++++++++++++-------------------- 1 file changed, 25 insertions(+), 21 deletions(-) diff --git a/REPO_INSTRUCTIONS.md b/REPO_INSTRUCTIONS.md index 81bf8d96..0c3ec8d3 100644 --- a/REPO_INSTRUCTIONS.md +++ b/REPO_INSTRUCTIONS.md @@ -113,31 +113,35 @@ look at ../sandbox and tell me how to deploy - in this repo, default that assumption to a Lambda-backed API unless the user asks for ECS, long-running workers, containers, or another specific runtime - state that assumption and ask for confirmation before making changes when backend choice materially affects infrastructure shape, cost, or security -## Replacement Requests +## App Shaping Flow -- when the user asks to deploy or adapt an external app and says to proceed, determine whether the work is additive or a replacement unless the intent is already clear -- if the user says the work is a replacement, remove placeholder/demo code, docs, local services, infra stacks, workflow surface, and stale runtime paths that no longer serve the selected app shape +Use this shared flow when the user is adapting an external app, replacing the placeholder app, simplifying the template, or bootstrapping a new app from this repo. + +- first determine whether the work is additive or replacement unless the intent is already clear +- when the target repo is empty or effectively empty, enter this flow immediately; treat a repo as effectively empty when it has no meaningful app, infra, runtime, or workflow code beyond placeholders, starter files, or minimal scaffolding +- determine the selected app capabilities, such as frontend, backend API, batch/worker runtime, database, auth, messaging, containers/ECS, Lambda, scheduled jobs, or static hosting +- ask only the missing app-shaping questions that are not already answered in `BOOTSTRAP_DECISIONS.md` +- persist durable bootstrap, simplification, replacement, and capability-selection answers in `BOOTSTRAP_DECISIONS.md` so they do not need to be asked repeatedly +- before asking a recorded app-shaping question, check `BOOTSTRAP_DECISIONS.md` first and reuse the recorded answer unless the user changes it +- if the user gives an answer that conflicts with an existing entry in `BOOTSTRAP_DECISIONS.md`, warn that the recorded decision is changing, then update the file +- if the user says the work is replacement, remove placeholder/demo code, docs, local services, infra stacks, workflow surface, and stale runtime paths that no longer serve the selected app shape - do not keep unused demo capabilities just because they existed in the template -- still confirm before removing expensive or shared infrastructure capabilities unless the user explicitly names them for removal +- do not delete or replace template/example code solely because a new feature request could be implemented more cleanly without it; replacement intent or a recorded decision must be clear +- keep or remove unused capabilities based on the recorded decision, and do not assume unmentioned capabilities should stay forever +- still confirm before removing expensive or shared infrastructure capabilities, such as load balancers, ECS clusters, databases, Cognito, Route53/CloudFront, or messaging, unless the user explicitly names them for removal - when removal would affect major capabilities, briefly list what would remain and what would be removed before editing -- record durable replacement decisions in `BOOTSTRAP_DECISIONS.md` when they affect future app-shaping work - -## New Repo Bootstrap Requests - -- when a request suggests the user is adapting this repo as a fresh app or new project, first determine whether this is a new repo/bootstrap scenario or a change to an existing app -- when the target repo is empty or effectively empty, enter bootstrap flow immediately -- treat a repo as effectively empty when it has no meaningful app, infra, runtime, or workflow code beyond placeholders, starter files, or minimal scaffolding -- if it appears to be a new repo/bootstrap scenario, ask whether the user wants to keep or remove the boilerplate/example application code before making broad changes -- treat clearly labeled example, demo, sample, or boilerplate code as removable only after confirming with the user -- do not delete or replace template/example code solely because a new feature request could be implemented more cleanly without it -- for potentially expensive infrastructure such as load balancers, ECS clusters, or other shared runtime components, ask whether the user wants to keep them for future use or remove them entirely before changing that footprint -- do not assume expensive infrastructure should be deployed, retained, or removed without explicit user confirmation when the request is a bootstrap or simplification scenario -- persist bootstrap-specific questions and user answers in `BOOTSTRAP_DECISIONS.md` so the same questions do not need to be asked repeatedly -- before asking a bootstrap-related clarifying question, check `BOOTSTRAP_DECISIONS.md` first and reuse the recorded answer unless the user changes it -- if the user gives an answer that conflicts with an existing entry in `BOOTSTRAP_DECISIONS.md`, warn that the recorded decision is changing, then update the file -- always consider security during bootstrap and simplification work; if a proposed API would be exposed to the public internet, say that explicitly and suggest at least one more secure option +- align local development, workflows, infra stacks, runtime code, docs, and verification commands with the selected app shape +- always consider security during app shaping; if a proposed API would be exposed to the public internet, say that explicitly and suggest at least one more secure option - do not assume a public unauthenticated API is acceptable just because it is the simplest technical shape -- at the end of a bootstrap or simplification flow, explicitly name any infrastructure that would remain but no longer be used by the proposed app shape, and ask whether the user wants to remove it or keep it for future use +- before closing an app-shaping task, explicitly name what remains, what was removed, what still needs operational setup, and any bootstrap commands the user should run + +## Bootstrap Operations + +- at the end of app-shaping work, offer the next operational bootstrap steps needed to make the selected app shape real end to end +- for AWS-backed deployments, this usually includes creating or updating GitHub OIDC roles, applying foundational stacks in dependency order, deploying initial infrastructure, publishing first runtime artifacts, running migrations, and seeding initial users when Cognito is enabled +- do not run AWS-mutating bootstrap commands without explicit user approval +- when offering OIDC setup, name the exact commands, for example `just tg ci aws/oidc apply`, `just tg dev aws/oidc apply`, or `just tg prod aws/oidc apply` +- when offering first environment setup, separate infra bootstrap from code deployment and call out any prerequisite shared resources such as VPCs, tagged subnets, hosted zones, ECR images, code buckets, or Terraform state ## CI OIDC Scope From 58b2a830c406dd862fc19d50e7e24c8248dc89d8 Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Thu, 28 May 2026 16:56:29 +0100 Subject: [PATCH 03/12] chore: add checks for aws resources + role --- REPO_INSTRUCTIONS.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/REPO_INSTRUCTIONS.md b/REPO_INSTRUCTIONS.md index 0c3ec8d3..303b112e 100644 --- a/REPO_INSTRUCTIONS.md +++ b/REPO_INSTRUCTIONS.md @@ -131,6 +131,8 @@ Use this shared flow when the user is adapting an external app, replacing the pl - still confirm before removing expensive or shared infrastructure capabilities, such as load balancers, ECS clusters, databases, Cognito, Route53/CloudFront, or messaging, unless the user explicitly names them for removal - when removal would affect major capabilities, briefly list what would remain and what would be removed before editing - align local development, workflows, infra stacks, runtime code, docs, and verification commands with the selected app shape +- for AWS-backed deployment shapes, offer to check required deployment prerequisites at the point the selected environment/domain is known; expected checks include the VPC, tagged subnets, and Route53 hosted zone required by the selected domain +- before relying on a hosted zone, confirm the intended hosted zone name with the user and verify it matches the selected `domain_name`/frontend domain shape - always consider security during app shaping; if a proposed API would be exposed to the public internet, say that explicitly and suggest at least one more secure option - do not assume a public unauthenticated API is acceptable just because it is the simplest technical shape - before closing an app-shaping task, explicitly name what remains, what was removed, what still needs operational setup, and any bootstrap commands the user should run @@ -139,6 +141,7 @@ Use this shared flow when the user is adapting an external app, replacing the pl - at the end of app-shaping work, offer the next operational bootstrap steps needed to make the selected app shape real end to end - for AWS-backed deployments, this usually includes creating or updating GitHub OIDC roles, applying foundational stacks in dependency order, deploying initial infrastructure, publishing first runtime artifacts, running migrations, and seeding initial users when Cognito is enabled +- before the first plan, apply, prerequisite check, or other AWS interaction in a task, confirm which AWS role, user, and account will be used - do not run AWS-mutating bootstrap commands without explicit user approval - when offering OIDC setup, name the exact commands, for example `just tg ci aws/oidc apply`, `just tg dev aws/oidc apply`, or `just tg prod aws/oidc apply` - when offering first environment setup, separate infra bootstrap from code deployment and call out any prerequisite shared resources such as VPCs, tagged subnets, hosted zones, ECR images, code buckets, or Terraform state From 1f6cd806ec0a6facd584ac4802c4cc955da1af19 Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Thu, 28 May 2026 19:27:58 +0100 Subject: [PATCH 04/12] chore: nat or public subnet choice --- REPO_INSTRUCTIONS.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/REPO_INSTRUCTIONS.md b/REPO_INSTRUCTIONS.md index 303b112e..0ed4e32c 100644 --- a/REPO_INSTRUCTIONS.md +++ b/REPO_INSTRUCTIONS.md @@ -113,6 +113,16 @@ look at ../sandbox and tell me how to deploy - in this repo, default that assumption to a Lambda-backed API unless the user asks for ECS, long-running workers, containers, or another specific runtime - state that assumption and ask for confirmation before making changes when backend choice materially affects infrastructure shape, cost, or security +## Runtime Network Placement + +- do not assume ECS services must run in private subnets +- when adapting an app that needs outbound internet access, explicitly ask whether the runtime should run in public subnets or private subnets before recommending NAT gateways +- only recommend NAT gateways when private subnet placement is required, explicitly chosen, or otherwise necessary for the selected security model +- if a service can safely run in public subnets, call out that public subnet placement with task public IPs may be the lower-cost deployment shape and explain the security implications +- for public-subnet ECS services, require a clear ingress model before implementation: public load balancer or API Gateway path, security group restrictions, authentication requirements, and whether tasks should receive public IPs +- for scraper, polling, webhook, or external-API-heavy services, treat subnet placement as an app-shaping decision because outbound connectivity affects architecture, cost, and security +- do not list NAT as an AWS prerequisite unless the selected runtime placement uses private subnets and needs outbound internet access + ## App Shaping Flow Use this shared flow when the user is adapting an external app, replacing the placeholder app, simplifying the template, or bootstrapping a new app from this repo. From 52cfa9dbf192e17cc08a8b356a0d61d13c62cb55 Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Fri, 29 May 2026 10:28:00 +0100 Subject: [PATCH 05/12] chore: readme refactor --- README.md | 120 +------------ REPO_INSTRUCTIONS.md | 29 +--- docs/agent/app-shaping.md | 23 +++ docs/get-started-locally.md | 107 ++++++++++++ infra/README.md | 109 +----------- infra/docs/deployment-model.md | 46 +++++ infra/docs/terragrunt-graph-helpers.md | 66 +++++++ infra/modules/aws/_shared/lambda/README.md | 39 +---- .../lambda/docs/provisioned-concurrency.md | 40 +++++ infra/modules/aws/_shared/service/README.md | 163 +----------------- .../_shared/service/docs/connection-types.md | 28 +++ .../service/docs/deployment-strategies.md | 41 +++++ .../_shared/service/docs/rollout-and-drift.md | 35 ++++ .../_shared/service/docs/scaling-patterns.md | 75 ++++++++ 14 files changed, 485 insertions(+), 436 deletions(-) create mode 100644 docs/agent/app-shaping.md create mode 100644 docs/get-started-locally.md create mode 100644 infra/docs/deployment-model.md create mode 100644 infra/docs/terragrunt-graph-helpers.md create mode 100644 infra/modules/aws/_shared/lambda/docs/provisioned-concurrency.md create mode 100644 infra/modules/aws/_shared/service/docs/connection-types.md create mode 100644 infra/modules/aws/_shared/service/docs/deployment-strategies.md create mode 100644 infra/modules/aws/_shared/service/docs/rollout-and-drift.md create mode 100644 infra/modules/aws/_shared/service/docs/scaling-patterns.md diff --git a/README.md b/README.md index 95e4d33b..f7c94eef 100644 --- a/README.md +++ b/README.md @@ -8,10 +8,7 @@ Lambda + ECS with CodeDeploy rollouts, plus provisioned concurrency controls for - [Overview](#overview) - [Using This Template With An AI Agent](#using-this-template-with-an-ai-agent) - [Bootstrap-Friendly Plans](#bootstrap-friendly-plans) -- [Prerequisites](#prerequisites) -- [Setup](#setup) -- [Common Tasks](#common-tasks) -- [Local Development](#local-development) +- [Get Started Locally](#get-started-locally) - [Infra Deployment Use Cases](#infra-deployment-use-cases) - [Reference](#reference) - [Read This Next](#read-this-next) @@ -48,117 +45,9 @@ See [infra/README.md](infra/README.md#dependency-notes) for the dependency strat Use [CONTRIBUTING.md](CONTRIBUTING.md) for expectations when changing the repo itself. -## Prerequisites +## Get Started Locally -The AWS account must already have the landing-zone or StackSet network in place before deploying this repo. - -- the Terraform in this repo reads the VPC and subnets with `data` sources rather than creating them -- the expected VPC and subnets must therefore already exist -- the private subnets must be tagged so the module lookups can find them, for example with names matching `*private*` -- if you plan to deploy the frontend custom domain, the matching Route53 hosted zone must also already exist -- the S3 Terraform state bucket should have bucket versioning enabled, because the repo uses the [Terraform S3 backend](https://developer.hashicorp.com/terraform/language/backend/s3) lockfile path rather than DynamoDB state locking - -If those shared network or DNS resources do not exist yet, the infra applies in this repo will fail during data lookup or certificate/DNS creation. - -Required shared prerequisites before a full environment deploy: - -- pre-existing VPC -- tagged private subnets that the data lookups can resolve -- Route53 hosted zone for the deployed frontend domain when using the frontend custom domain path - -## Setup - -### One-Time CI Role Bootstrap - -Before GitHub Actions can plan, apply, or deploy, bootstrap the GitHub OIDC roles once per environment: - -```sh -just tg ci aws/oidc apply -just tg dev aws/oidc apply -just tg prod aws/oidc apply -``` - -Run these with local AWS credentials that can create or update IAM roles and policies. - -After the roles exist, normal CI/CD workflows assume them through GitHub OIDC, and CI can update the roles when the OIDC module, trust policy, or allowed AWS permissions change. - -The `ci` OIDC role is intentionally narrower than the `dev` and `prod` roles. - -Detailed scope: - -- [infra/modules/aws/_shared/oidc/README.md](infra/modules/aws/_shared/oidc/README.md) - -Routing and runtime feasibility contracts: - -- [infra/modules/aws/network/README.md](infra/modules/aws/network/README.md) -- [infra/modules/aws/frontend/README.md](infra/modules/aws/frontend/README.md) -- [infra/modules/aws/_shared/service/README.md](infra/modules/aws/_shared/service/README.md) -- [infra/modules/aws/_shared/task/README.md](infra/modules/aws/_shared/task/README.md) - -## Common Tasks - -The root [`justfile`](justfile) keeps local developer commands. - -Split recipe files: - -- CI-only helpers: [`justfile.ci`](justfile.ci) -- CI build/deploy helpers: [`justfile.deploy`](justfile.deploy) - -Run split files locally with `--justfile`: - -```sh -just --justfile justfile.ci tf-lint-check -just --justfile justfile.deploy lambda-get-version -just --justfile justfile.deploy frontend-build -``` - -### Local Plan Some Infra - -Given a Terragrunt file is found at `infra/live/dev/aws/lambda_api/terragrunt.hcl` - -```sh -just tg dev aws/lambda_api plan -``` - -Detailed Terragrunt graph and saved-plan helper commands live in [infra/README.md](infra/README.md#terragrunt-graph-helpers). - -Placeholder app runtime tasks live with the code that owns them: - -- Lambda API message publishing: [lambdas/lambda_api/README.md](lambdas/lambda_api/README.md) -- Lambda worker queue publishing: [lambdas/lambda_worker/README.md](lambdas/lambda_worker/README.md) -- ECS worker publishing, database verification, and debug shells: [containers/worker/README.md](containers/worker/README.md) -- Database migration runtime and invocation: [lambdas/migrations/README.md](lambdas/migrations/README.md) -- Frontend auth and API proxy behavior: [frontend/README.md](frontend/README.md) - -## Local Development - -Start the local stack: - -```sh -just start -``` - -This starts local PostgreSQL, queue emulation, Lambda/ECS runtimes, migrations, the frontend dev server, and log tailing. - -Stop the local stack and remove Compose volumes: - -```sh -just stop -``` - -Run only the frontend dev server: - -```sh -just frontend -``` - -Local service notes: - -- frontend dev server and local API proxy: [frontend/README.md](frontend/README.md) -- Lambda runtime layout and local watch behavior: [lambdas/README.md](lambdas/README.md) -- ECS runtime layout and local watch behavior: [containers/README.md](containers/README.md) -- Lambda worker local queue publishing: [lambdas/lambda_worker/README.md](lambdas/lambda_worker/README.md) -- ECS worker local queue publishing and database verification: [containers/worker/README.md](containers/worker/README.md) +Local stack commands, common `just` tasks, AWS prerequisites, and OIDC bootstrap commands live in [Get Started Locally](docs/get-started-locally.md). ## Infra Deployment Use Cases @@ -183,7 +72,7 @@ For ECS scaling patterns and `scaling_strategy` examples, see: For the deployment model, runtime rollout split, and strategy overview, see: -- [infra/README.md](infra/README.md#deployment-model) +- [infra/docs/deployment-model.md](infra/docs/deployment-model.md) ## Read This Next @@ -199,3 +88,4 @@ For the deployment model, runtime rollout split, and strategy overview, see: - Frontend auth contract: [infra/modules/aws/cognito/README.md](infra/modules/aws/cognito/README.md) - Frontend hosting contract: [infra/modules/aws/frontend/README.md](infra/modules/aws/frontend/README.md) - Runtime log dashboard: [infra/modules/aws/observability/README.md](infra/modules/aws/observability/README.md) +- Get started locally, prerequisites, and bootstrap commands: [docs/get-started-locally.md](docs/get-started-locally.md) diff --git a/REPO_INSTRUCTIONS.md b/REPO_INSTRUCTIONS.md index 0ed4e32c..e0561fbf 100644 --- a/REPO_INSTRUCTIONS.md +++ b/REPO_INSTRUCTIONS.md @@ -76,6 +76,7 @@ These instructions apply to the entire repository. | `frontend/**` | `frontend/README.md`, plus `infra/modules/aws/frontend/README.md` and `infra/modules/aws/cognito/README.md` when deployed hosting or auth changes | | `justfile.ci`, `justfile.deploy`, or reusable workflow behavior | `.github/docs/README.md`, then `reusable-workflows.md`, `artifacts-and-plans.md`, or `discovery-and-matrices.md` as relevant | | `justfile.destroy` | `.github/docs/README.md` and `.github/docs/destroy.md` before editing | +| external app adaptation, placeholder replacement, template simplification, or app bootstrapping | `docs/agent/app-shaping.md` | ## Task Interpretation @@ -125,27 +126,15 @@ look at ../sandbox and tell me how to deploy ## App Shaping Flow -Use this shared flow when the user is adapting an external app, replacing the placeholder app, simplifying the template, or bootstrapping a new app from this repo. - -- first determine whether the work is additive or replacement unless the intent is already clear -- when the target repo is empty or effectively empty, enter this flow immediately; treat a repo as effectively empty when it has no meaningful app, infra, runtime, or workflow code beyond placeholders, starter files, or minimal scaffolding -- determine the selected app capabilities, such as frontend, backend API, batch/worker runtime, database, auth, messaging, containers/ECS, Lambda, scheduled jobs, or static hosting -- ask only the missing app-shaping questions that are not already answered in `BOOTSTRAP_DECISIONS.md` -- persist durable bootstrap, simplification, replacement, and capability-selection answers in `BOOTSTRAP_DECISIONS.md` so they do not need to be asked repeatedly -- before asking a recorded app-shaping question, check `BOOTSTRAP_DECISIONS.md` first and reuse the recorded answer unless the user changes it -- if the user gives an answer that conflicts with an existing entry in `BOOTSTRAP_DECISIONS.md`, warn that the recorded decision is changing, then update the file -- if the user says the work is replacement, remove placeholder/demo code, docs, local services, infra stacks, workflow surface, and stale runtime paths that no longer serve the selected app shape -- do not keep unused demo capabilities just because they existed in the template -- do not delete or replace template/example code solely because a new feature request could be implemented more cleanly without it; replacement intent or a recorded decision must be clear -- keep or remove unused capabilities based on the recorded decision, and do not assume unmentioned capabilities should stay forever -- still confirm before removing expensive or shared infrastructure capabilities, such as load balancers, ECS clusters, databases, Cognito, Route53/CloudFront, or messaging, unless the user explicitly names them for removal -- when removal would affect major capabilities, briefly list what would remain and what would be removed before editing +When the user is adapting an external app, replacing the placeholder app, simplifying the template, or bootstrapping a new app from this repo, read and follow `docs/agent/app-shaping.md` before proposing or editing the app shape. + +Keep this high-level contract in mind even before loading the detailed flow: + +- determine additive versus replacement intent unless it is already clear +- determine selected capabilities and list major unused capabilities rather than assuming they should stay forever +- record durable app-shaping answers in `BOOTSTRAP_DECISIONS.md` - align local development, workflows, infra stacks, runtime code, docs, and verification commands with the selected app shape -- for AWS-backed deployment shapes, offer to check required deployment prerequisites at the point the selected environment/domain is known; expected checks include the VPC, tagged subnets, and Route53 hosted zone required by the selected domain -- before relying on a hosted zone, confirm the intended hosted zone name with the user and verify it matches the selected `domain_name`/frontend domain shape -- always consider security during app shaping; if a proposed API would be exposed to the public internet, say that explicitly and suggest at least one more secure option -- do not assume a public unauthenticated API is acceptable just because it is the simplest technical shape -- before closing an app-shaping task, explicitly name what remains, what was removed, what still needs operational setup, and any bootstrap commands the user should run +- always surface public exposure, authentication, cost, and bootstrap implications before closing the task ## Bootstrap Operations diff --git a/docs/agent/app-shaping.md b/docs/agent/app-shaping.md new file mode 100644 index 00000000..b616d3e7 --- /dev/null +++ b/docs/agent/app-shaping.md @@ -0,0 +1,23 @@ +# App Shaping Flow + +Use this shared flow when adapting an external app, replacing the placeholder app, simplifying the template, or bootstrapping a new app from this repo. + +- first determine whether the work is additive or replacement unless the intent is already clear +- when the target repo is empty or effectively empty, enter this flow immediately; treat a repo as effectively empty when it has no meaningful app, infra, runtime, or workflow code beyond placeholders, starter files, or minimal scaffolding +- determine the selected app capabilities, such as frontend, backend API, batch/worker runtime, database, auth, messaging, containers/ECS, Lambda, scheduled jobs, or static hosting +- ask only the missing app-shaping questions that are not already answered in `BOOTSTRAP_DECISIONS.md` +- persist durable bootstrap, simplification, replacement, and capability-selection answers in `BOOTSTRAP_DECISIONS.md` so they do not need to be asked repeatedly +- before asking a recorded app-shaping question, check `BOOTSTRAP_DECISIONS.md` first and reuse the recorded answer unless the user changes it +- if the user gives an answer that conflicts with an existing entry in `BOOTSTRAP_DECISIONS.md`, warn that the recorded decision is changing, then update the file +- if the user says the work is replacement, remove placeholder/demo code, docs, local services, infra stacks, workflow surface, and stale runtime paths that no longer serve the selected app shape +- do not keep unused demo capabilities just because they existed in the template +- do not delete or replace template/example code solely because a new feature request could be implemented more cleanly without it; replacement intent or a recorded decision must be clear +- keep or remove unused capabilities based on the recorded decision, and do not assume unmentioned capabilities should stay forever +- still confirm before removing expensive or shared infrastructure capabilities, such as load balancers, ECS clusters, databases, Cognito, Route53/CloudFront, or messaging, unless the user explicitly names them for removal +- when removal would affect major capabilities, briefly list what would remain and what would be removed before editing +- align local development, workflows, infra stacks, runtime code, docs, and verification commands with the selected app shape +- for AWS-backed deployment shapes, offer to check required deployment prerequisites at the point the selected environment/domain is known; expected checks include the VPC, tagged subnets, and Route53 hosted zone required by the selected domain +- before relying on a hosted zone, confirm the intended hosted zone name with the user and verify it matches the selected `domain_name`/frontend domain shape +- always consider security during app shaping; if a proposed API would be exposed to the public internet, say that explicitly and suggest at least one more secure option +- do not assume a public unauthenticated API is acceptable just because it is the simplest technical shape +- before closing an app-shaping task, explicitly name what remains, what was removed, what still needs operational setup, and any bootstrap commands the user should run diff --git a/docs/get-started-locally.md b/docs/get-started-locally.md new file mode 100644 index 00000000..db141f08 --- /dev/null +++ b/docs/get-started-locally.md @@ -0,0 +1,107 @@ +# Get Started Locally + +Use this for local stack commands, common local commands, AWS prerequisites, and runtime task links. + +## Local Stack + +Start the local stack: + +```sh +just start +``` + +This starts local PostgreSQL, queue emulation, Lambda/ECS runtimes, migrations, the frontend dev server, and log tailing. + +Stop the local stack and remove Compose volumes: + +```sh +just stop +``` + +Run only the frontend dev server: + +```sh +just frontend +``` + +Local service notes: + +- frontend dev server and local API proxy: [frontend](../frontend/README.md) +- Lambda runtime layout and local watch behavior: [lambdas](../lambdas/README.md) +- ECS runtime layout and local watch behavior: [containers](../containers/README.md) +- Lambda worker local queue publishing: [lambdas/lambda_worker](../lambdas/lambda_worker/README.md) +- ECS worker local queue publishing and database verification: [containers/worker](../containers/worker/README.md) + +## Prerequisites + +The AWS account must already have the landing-zone or StackSet network in place before deploying this repo. + +- the Terraform in this repo reads the VPC and subnets with `data` sources rather than creating them +- the expected VPC and subnets must therefore already exist +- the private subnets must be tagged so the module lookups can find them, for example with names matching `*private*` +- if you plan to deploy the frontend custom domain, the matching Route53 hosted zone must also already exist +- the S3 Terraform state bucket should have bucket versioning enabled, because the repo uses the [Terraform S3 backend](https://developer.hashicorp.com/terraform/language/backend/s3) lockfile path rather than DynamoDB state locking + +If those shared network or DNS resources do not exist yet, the infra applies in this repo will fail during data lookup or certificate/DNS creation. + +Required shared prerequisites before a full environment deploy: + +- pre-existing VPC +- tagged private subnets that the data lookups can resolve +- Route53 hosted zone for the deployed frontend domain when using the frontend custom domain path + +## One-Time CI Role Bootstrap + +Before GitHub Actions can plan, apply, or deploy, bootstrap the GitHub OIDC roles once per environment: + +```sh +just tg ci aws/oidc apply +just tg dev aws/oidc apply +just tg prod aws/oidc apply +``` + +Run these with local AWS credentials that can create or update IAM roles and policies. + +After the roles exist, normal CI/CD workflows assume them through GitHub OIDC, and CI can update the roles when the OIDC module, trust policy, or allowed AWS permissions change. + +The `ci` OIDC role is intentionally narrower than the `dev` and `prod` roles. Detailed scope lives in [OIDC module docs](../infra/modules/aws/_shared/oidc/README.md). + +Routing and runtime feasibility contracts: + +- [network](../infra/modules/aws/network/README.md) +- [frontend](../infra/modules/aws/frontend/README.md) +- [shared ECS service](../infra/modules/aws/_shared/service/README.md) +- [shared ECS task](../infra/modules/aws/_shared/task/README.md) + +## Common Tasks + +The root [`justfile`](../justfile) keeps local developer commands. + +Split recipe files: + +- CI-only helpers: [`justfile.ci`](../justfile.ci) +- CI build/deploy helpers: [`justfile.deploy`](../justfile.deploy) + +Run split files locally with `--justfile`: + +```sh +just --justfile justfile.ci tf-lint-check +just --justfile justfile.deploy lambda-get-version +just --justfile justfile.deploy frontend-build +``` + +Given a Terragrunt file is found at `infra/live/dev/aws/lambda_api/terragrunt.hcl`: + +```sh +just tg dev aws/lambda_api plan +``` + +Terragrunt graph and saved-plan helper commands live in [Terragrunt Graph Helpers](../infra/docs/terragrunt-graph-helpers.md). + +Placeholder app runtime tasks live with the code that owns them: + +- Lambda API message publishing: [lambdas/lambda_api](../lambdas/lambda_api/README.md) +- Lambda worker queue publishing: [lambdas/lambda_worker](../lambdas/lambda_worker/README.md) +- ECS worker publishing, database verification, and debug shells: [containers/worker](../containers/worker/README.md) +- Database migration runtime and invocation: [lambdas/migrations](../lambdas/migrations/README.md) +- Frontend auth and API proxy behavior: [frontend](../frontend/README.md) diff --git a/infra/README.md b/infra/README.md index 4faf4db9..ace57680 100644 --- a/infra/README.md +++ b/infra/README.md @@ -152,49 +152,9 @@ That `containers/lib` directory is helper code only and is not treated as a depl ## Deployment Model -- infrastructure apply and feature-code rollout are intentionally decoupled in this boilerplate -- infra workflows create or update infrastructure stacks -- infra workflows create the stable runtime shape, including the Lambda and ECS CodeDeploy applications and deployment groups used later for real rollouts -- `*_infra` workflows apply infrastructure only -- build workflows produce Lambda zips and container images -- `*_code` workflows deploy feature code only -- code deploy workflows publish the real Lambda versions and ECS task revisions into that pre-created deploy surface -- `*_infra` wrappers need the inputs required to apply infra safely, such as directory-derived stack matrices and any artifact-derived bootstrap references -- in `prod`, the `*_infra` wrappers read shared artifact resources from `ci` but only apply service and task stacks in `prod` -- saved `plan` / `apply_plan` artifacts live in GitHub Actions artifacts keyed by workflow run id, with one run-level metadata artifact plus one per-stack plan artifact -- saved plan artifacts are time-limited; the run-level metadata artifact is retained for 14 days, so apply-from-plan must happen before artifact expiry -- each saved-plan stack always uploads `terragrunt.plan.meta.json`; the binary `terragrunt.tfplan` and rendered `terragrunt.plan.txt` are uploaded only when the plan contains real changes -- Code artifact retention and infra-plan retention are configured separately in the shared code bucket module -- rerunning infrastructure apply does not roll out new feature code -- the shared Lambda and ECS module READMEs are the canonical source for bootstrap, rollout, and rollback details for each runtime shape -- detailed workflow contracts, reusable-workflow inputs, repo-local action behavior, and `justfile_path` rules live in [.github/docs/README.md](../.github/docs/README.md) -- see [lambdas/README.md](../lambdas/README.md) and [containers/README.md](../containers/README.md) for runtime source layout, build behavior, and boilerplate patterns -- deploy workflows: - - publish Lambda versions and use Lambda CodeDeploy - - optionally invoke the `migrations` Lambda when it is part of the Lambda deploy matrix - - register ECS task revisions - - then either: - - use ECS CodeDeploy for load-balanced services - - or use native ECS rolling updates for internal services - - ECS task rollout is not implicitly blocked on Lambda or migration jobs; add that ordering only where a caller actually needs it - -### Deployment Overview - -```mermaid -flowchart TD - start["Choose Runtime Shape"] --> lambda["Lambda"] - start --> ecs["ECS"] - - lambda --> lambda_bg["Background / low-risk"] - lambda --> lambda_api["User-facing / request-serving"] - lambda_bg --> lambda_all["all_at_once"] - lambda_api --> lambda_canary["canary or linear"] - - ecs --> ecs_internal["internal"] - ecs --> ecs_lb["internal_dns or vpc_link"] - ecs_internal --> ecs_roll["rolling"] - ecs_lb --> ecs_cd["all_at_once / canary / linear / blue_green"] -``` +Infra applies create the stable runtime shape. Code deploy workflows publish and roll out feature code into that pre-created surface. + +Read [Deployment Model](docs/deployment-model.md) for the full infra/code split, saved-plan artifact behavior, and runtime rollout overview. ## Infra Deployment Use Cases @@ -230,68 +190,7 @@ just --justfile justfile.deploy frontend-build ### Terragrunt Graph Helpers -To return the direct dependencies for every module as a JSON object: - -```sh -just tg-all-module-dependencies dev -``` - -To test the wave-matrix processor locally through the same split used by CI, run: - -```sh -just tg-graph-waves dev -``` - -If you only need the raw Terragrunt graph output: - -```sh -just tg-graph dev > graph.txt -``` - -That runs the same non-interactive Terragrunt graph command used in CI: - -```sh -cd infra/live/dev/aws -terragrunt run-all graph-dependencies \ - --terragrunt-non-interactive \ - --terragrunt-include-external-dependencies -``` - -To process that saved graph file into compact dependency JSON: - -```sh -just tg-graph-process graph.json dev -``` - -To return only changed saved-plan items as an object array, set the saved-plan env vars and run: - -```sh -BUCKET_NAME=700060376888-eu-west-2-aws-serverless-github-deploy-tfplan \ -TG_GRAPH_METADATA_PLAN_RUN_ID=26105102715 \ -just tg-graph-changed-items graph.json dev -``` - -To join the processed graph with saved-plan metadata for one plan run, set the saved-plan env vars before running the processing command: - -```sh -BUCKET_NAME=700060376888-eu-west-2-aws-serverless-github-deploy-tfplan \ -TG_GRAPH_METADATA_PLAN_RUN_ID=26105102715 \ -just tg-graph-process graph.json dev -``` - -For a local saved-plan run, pass the Terragrunt operation as one quoted argument: - -```sh -just tg dev aws/oidc 'plan -out=terragrunt.tfplan' -``` - -The `tg` recipe treats the final argument as the Terragrunt operation string, so quoting lets you pass flags such as `-out=...` through the wrapper. The workflow saved-plan path expects the binary plan filename to be `terragrunt.tfplan`. - -To apply that same saved plan later, reuse the same run id: - -```sh -just tg dev aws/oidc 'apply terragrunt.tfplan' -``` +Graph, wave, and saved-plan helper commands live in [Terragrunt Graph Helpers](docs/terragrunt-graph-helpers.md). ## Naming Conventions diff --git a/infra/docs/deployment-model.md b/infra/docs/deployment-model.md new file mode 100644 index 00000000..7b295aa5 --- /dev/null +++ b/infra/docs/deployment-model.md @@ -0,0 +1,46 @@ +# Deployment Model + +Infrastructure apply and feature-code rollout are intentionally decoupled in this boilerplate. + +- infra workflows create or update infrastructure stacks +- infra workflows create the stable runtime shape, including the Lambda and ECS CodeDeploy applications and deployment groups used later for real rollouts +- `*_infra` workflows apply infrastructure only +- build workflows produce Lambda zips and container images +- `*_code` workflows deploy feature code only +- code deploy workflows publish the real Lambda versions and ECS task revisions into that pre-created deploy surface +- `*_infra` wrappers need the inputs required to apply infra safely, such as directory-derived stack matrices and any artifact-derived bootstrap references +- in `prod`, the `*_infra` wrappers read shared artifact resources from `ci` but only apply service and task stacks in `prod` +- saved `plan` / `apply_plan` artifacts live in GitHub Actions artifacts keyed by workflow run id, with one run-level metadata artifact plus one per-stack plan artifact +- saved plan artifacts are time-limited; the run-level metadata artifact is retained for 14 days, so apply-from-plan must happen before artifact expiry +- each saved-plan stack always uploads `terragrunt.plan.meta.json`; the binary `terragrunt.tfplan` and rendered `terragrunt.plan.txt` are uploaded only when the plan contains real changes +- Code artifact retention and infra-plan retention are configured separately in the shared code bucket module +- rerunning infrastructure apply does not roll out new feature code +- the shared Lambda and ECS module READMEs are the canonical source for bootstrap, rollout, and rollback details for each runtime shape +- detailed workflow contracts, reusable-workflow inputs, repo-local action behavior, and `justfile_path` rules live in [CI docs](../../.github/docs/README.md) +- see [Lambda source layout](../../lambdas/README.md) and [container source layout](../../containers/README.md) for runtime source layout, build behavior, and boilerplate patterns + +Deploy workflows: + +- publish Lambda versions and use Lambda CodeDeploy +- optionally invoke the `migrations` Lambda when it is part of the Lambda deploy matrix +- register ECS task revisions +- then either use ECS CodeDeploy for load-balanced services or native ECS rolling updates for internal services +- ECS task rollout is not implicitly blocked on Lambda or migration jobs; add that ordering only where a caller actually needs it + +## Runtime Overview + +```mermaid +flowchart TD + start["Choose Runtime Shape"] --> lambda["Lambda"] + start --> ecs["ECS"] + + lambda --> lambda_bg["Background / low-risk"] + lambda --> lambda_api["User-facing / request-serving"] + lambda_bg --> lambda_all["all_at_once"] + lambda_api --> lambda_canary["canary or linear"] + + ecs --> ecs_internal["internal"] + ecs --> ecs_lb["internal_dns or vpc_link"] + ecs_internal --> ecs_roll["rolling"] + ecs_lb --> ecs_cd["all_at_once / canary / linear / blue_green"] +``` diff --git a/infra/docs/terragrunt-graph-helpers.md b/infra/docs/terragrunt-graph-helpers.md new file mode 100644 index 00000000..533343d9 --- /dev/null +++ b/infra/docs/terragrunt-graph-helpers.md @@ -0,0 +1,66 @@ +# Terragrunt Graph Helpers + +Use these commands when debugging stack ordering, workflow wave generation, or saved-plan metadata joins. + +To return the direct dependencies for every module as a JSON object: + +```sh +just tg-all-module-dependencies dev +``` + +To test the wave-matrix processor locally through the same split used by CI: + +```sh +just tg-graph-waves dev +``` + +If you only need the raw Terragrunt graph output: + +```sh +just tg-graph dev > graph.txt +``` + +That runs the same non-interactive Terragrunt graph command used in CI: + +```sh +cd infra/live/dev/aws +terragrunt run-all graph-dependencies \ + --terragrunt-non-interactive \ + --terragrunt-include-external-dependencies +``` + +To process that saved graph file into compact dependency JSON: + +```sh +just tg-graph-process graph.json dev +``` + +To return only changed saved-plan items as an object array, set the saved-plan env vars and run: + +```sh +BUCKET_NAME=700060376888-eu-west-2-aws-serverless-github-deploy-tfplan \ +TG_GRAPH_METADATA_PLAN_RUN_ID=26105102715 \ +just tg-graph-changed-items graph.json dev +``` + +To join the processed graph with saved-plan metadata for one plan run, set the saved-plan env vars before running the processing command: + +```sh +BUCKET_NAME=700060376888-eu-west-2-aws-serverless-github-deploy-tfplan \ +TG_GRAPH_METADATA_PLAN_RUN_ID=26105102715 \ +just tg-graph-process graph.json dev +``` + +For a local saved-plan run, pass the Terragrunt operation as one quoted argument: + +```sh +just tg dev aws/oidc 'plan -out=terragrunt.tfplan' +``` + +The `tg` recipe treats the final argument as the Terragrunt operation string, so quoting lets you pass flags such as `-out=...` through the wrapper. The workflow saved-plan path expects the binary plan filename to be `terragrunt.tfplan`. + +To apply that same saved plan later, reuse the same run id: + +```sh +just tg dev aws/oidc 'apply terragrunt.tfplan' +``` diff --git a/infra/modules/aws/_shared/lambda/README.md b/infra/modules/aws/_shared/lambda/README.md index 8895bb10..a12db237 100644 --- a/infra/modules/aws/_shared/lambda/README.md +++ b/infra/modules/aws/_shared/lambda/README.md @@ -110,41 +110,4 @@ Use this when you want Lambda infra and Lambda rollout behavior managed together ## Provisioned Concurrency Patterns -Use `provisioned_config` to choose the Lambda warm-capacity shape. - -### No provisioned concurrency - -- best for background jobs and lower-frequency work where cold-start lag is acceptable - -```hcl -provisioned_config = { - fixed = 0 - reserved_concurrency = 2 -} -``` - -### Fixed provisioned concurrency - -- best for predictable request volume where you want a known warm pool - -```hcl -provisioned_config = { - fixed = 10 - reserved_concurrency = 50 -} -``` - -### Autoscaled provisioned concurrency - -- best for request-serving Lambdas where you want baseline warm capacity and cost control above that baseline - -```hcl -provisioned_config = { - auto_scale = { - max = 3 - min = 1 - trigger_percent = 70 - cool_down_seconds = 60 - } -} -``` +Use [provisioned concurrency patterns](docs/provisioned-concurrency.md) for no-concurrency, fixed, and autoscaled examples. diff --git a/infra/modules/aws/_shared/lambda/docs/provisioned-concurrency.md b/infra/modules/aws/_shared/lambda/docs/provisioned-concurrency.md new file mode 100644 index 00000000..19c0db2b --- /dev/null +++ b/infra/modules/aws/_shared/lambda/docs/provisioned-concurrency.md @@ -0,0 +1,40 @@ +# Lambda Provisioned Concurrency + +Use `provisioned_config` to choose the Lambda warm-capacity shape. + +## No Provisioned Concurrency + +- best for background jobs and lower-frequency work where cold-start lag is acceptable + +```hcl +provisioned_config = { + fixed = 0 + reserved_concurrency = 2 +} +``` + +## Fixed Provisioned Concurrency + +- best for predictable request volume where you want a known warm pool + +```hcl +provisioned_config = { + fixed = 10 + reserved_concurrency = 50 +} +``` + +## Autoscaled Provisioned Concurrency + +- best for request-serving Lambdas where you want baseline warm capacity and cost control above that baseline + +```hcl +provisioned_config = { + auto_scale = { + max = 3 + min = 1 + trigger_percent = 70 + cool_down_seconds = 60 + } +} +``` diff --git a/infra/modules/aws/_shared/service/README.md b/infra/modules/aws/_shared/service/README.md index 9efa3943..98740de8 100644 --- a/infra/modules/aws/_shared/service/README.md +++ b/infra/modules/aws/_shared/service/README.md @@ -45,64 +45,11 @@ Bootstrap and real task deploys use the same app health path, such as `/health` ## Decision Rules -Choose deployment strategy based on connection type and whether the service is load-balanced in this repo's model. - -### `rolling` - -- use for ECS services that are not load-balanced in this repo's model, such as internal workers without `internal_dns` or `vpc_link` -- this uses native ECS rolling updates rather than ECS CodeDeploy - -### `all_at_once` - -- use for load-balanced ECS services when you want CodeDeploy but do not need gradual traffic shifting - -```hcl -deployment_strategy = "all_at_once" -``` - -### `canary` - -- use for load-balanced ECS services where you want partial traffic shifting before full promotion - -```hcl -deployment_strategy = "canary" -``` - -### `linear` - -- use for load-balanced ECS services where you want a gradual, repeated traffic shift - -```hcl -deployment_strategy = "linear" -``` - -### `blue_green` - -- use when you want explicit blue/green intent in the service configuration -- in the current repo shape this maps to the ECS CodeDeploy all-at-once traffic switch - -```hcl -deployment_strategy = "blue_green" -``` +Use [deployment strategies](docs/deployment-strategies.md) when choosing `rolling`, `all_at_once`, `canary`, `linear`, or `blue_green`. ## Connection Types -### `internal` - -- use for internal services without API Gateway or shared-ALB traffic switching -- prefer `rolling` -- this shape is not compatible with this repo's ECS CodeDeploy path - -### `internal_dns` - -- use for load-balanced internal services that should be addressable through the shared internal ALB and DNS path -- supports ECS CodeDeploy in this repo - -### `vpc_link` - -- use for HTTP services exposed through the shared API Gateway via VPC link -- supports ECS CodeDeploy in this repo -- if JWT auth is enabled, the shared API Gateway authorizer is attached in this service shape +Use [connection types](docs/connection-types.md) when deciding between `internal`, `internal_dns`, and `vpc_link`. ## Feasibility Constraints @@ -114,110 +61,10 @@ deployment_strategy = "blue_green" ## Scaling Patterns -Use `desired_task_count` as the steady-state baseline and `scaling_strategy` when you want autoscaling above that baseline. - -### Fixed task count - -- use for predictable or low-volume services where a fixed number of tasks is enough -- leave `scaling_strategy = {}` - -```hcl -desired_task_count = 1 - -scaling_strategy = {} -``` - -### CPU scaling - -- use when task CPU is the best leading signal for scale pressure -- best fit for internal workers or APIs whose load correlates with compute saturation - -```hcl -desired_task_count = 1 - -scaling_strategy = { - max_scaled_task_count = 4 - cpu = { - scale_out_threshold = 70 - scale_in_threshold = 30 - scale_out_adjustment = 1 - scale_in_adjustment = -1 - cooldown_out = 120 - cooldown_in = 300 - } -} -``` - -### SQS scaling - -- use for queue-driven workers -- scale decisions are based on the named queue's visible-message count - -```hcl -desired_task_count = 1 - -scaling_strategy = { - max_scaled_task_count = 6 - sqs = { - queue_name = "my-worker-queue" - scale_out_threshold = 10 - scale_in_threshold = 0 - scale_out_adjustment = 1 - scale_in_adjustment = -1 - cooldown_out = 60 - cooldown_in = 300 - } -} -``` - -### ALB request scaling - -- use for load-balanced HTTP services -- scale decisions are based on target requests per task behind the ALB - -```hcl -desired_task_count = 2 - -scaling_strategy = { - max_scaled_task_count = 6 - alb = { - target_requests_per_task = 100 - cooldown_out = 60 - cooldown_in = 300 - } -} -``` +Use [scaling patterns](docs/scaling-patterns.md) for fixed task count, CPU, SQS, and ALB request examples. ## CI / Deploy Expectations -- infrastructure applies create the stable service shape and any CodeDeploy wiring needed for load-balanced services -- deploy workflows register and promote real `task_*` revisions -- the deployment workflow applies the new task revision, uses CodeDeploy for load-balanced services, and uses native rolling deploys for internal services -- the shared module accepts `codedeploy_alarm_names` for automatic rollback - -## Rollback - -Use CloudWatch alarms with `codedeploy_alarm_names` when you want ECS CodeDeploy to roll back a load-balanced service deployment automatically. - -```hcl -codedeploy_alarm_names = [ - local.api_5xx_alarm_name -] -``` - -The alarm resources themselves are owned by the caller. This shared module consumes the alarm names and wires them into the ECS deployment group. - -## Drift / Ownership Rules - -The ECS service ignores: - -- `task_definition` -- `load_balancer` -- dedicated-listener `default_action` - -Reason: +Infrastructure applies create the stable service shape and deploy workflows own real task rollouts. -- deploy workflows own the live revision -- infra owns the stable service shape -- CodeDeploy ECS services reject `load_balancer` updates via `UpdateService` -- CodeDeploy also owns the live target-group switch on dedicated listeners +Read [rollout and drift](docs/rollout-and-drift.md) for CI deploy expectations, rollback alarms, and ignored-drift ownership. diff --git a/infra/modules/aws/_shared/service/docs/connection-types.md b/infra/modules/aws/_shared/service/docs/connection-types.md new file mode 100644 index 00000000..9d7ffcde --- /dev/null +++ b/infra/modules/aws/_shared/service/docs/connection-types.md @@ -0,0 +1,28 @@ +# ECS Connection Types + +Choose connection type based on how the ECS service should be reached. + +## `internal` + +- use for internal services without API Gateway or shared-ALB traffic switching +- prefer `rolling` +- this shape is not compatible with this repo's ECS CodeDeploy path + +## `internal_dns` + +- use for load-balanced internal services that should be addressable through the shared internal ALB and DNS path +- supports ECS CodeDeploy in this repo + +## `vpc_link` + +- use for HTTP services exposed through the shared API Gateway via VPC link +- supports ECS CodeDeploy in this repo +- if JWT auth is enabled, the shared API Gateway authorizer is attached in this service shape + +## Feasibility Notes + +- ECS CodeDeploy requires a load-balanced service shape in this repo +- in practice that means `connection_type` must be `internal_dns` or `vpc_link` for CodeDeploy-backed ECS deploys +- in this repo, subpath ECS services need a dedicated ALB listener if they are meant to use CodeDeploy blue/green +- if `connection_type = "internal"`, prefer `rolling` +- for internal non-load-balanced services, the deploy workflow falls back to native ECS rolling updates diff --git a/infra/modules/aws/_shared/service/docs/deployment-strategies.md b/infra/modules/aws/_shared/service/docs/deployment-strategies.md new file mode 100644 index 00000000..1f63117a --- /dev/null +++ b/infra/modules/aws/_shared/service/docs/deployment-strategies.md @@ -0,0 +1,41 @@ +# ECS Deployment Strategies + +Choose deployment strategy based on connection type and whether the service is load-balanced in this repo's model. + +## `rolling` + +- use for ECS services that are not load-balanced in this repo's model, such as internal workers without `internal_dns` or `vpc_link` +- this uses native ECS rolling updates rather than ECS CodeDeploy + +## `all_at_once` + +- use for load-balanced ECS services when you want CodeDeploy but do not need gradual traffic shifting + +```hcl +deployment_strategy = "all_at_once" +``` + +## `canary` + +- use for load-balanced ECS services where you want partial traffic shifting before full promotion + +```hcl +deployment_strategy = "canary" +``` + +## `linear` + +- use for load-balanced ECS services where you want a gradual, repeated traffic shift + +```hcl +deployment_strategy = "linear" +``` + +## `blue_green` + +- use when you want explicit blue/green intent in the service configuration +- in the current repo shape this maps to the ECS CodeDeploy all-at-once traffic switch + +```hcl +deployment_strategy = "blue_green" +``` diff --git a/infra/modules/aws/_shared/service/docs/rollout-and-drift.md b/infra/modules/aws/_shared/service/docs/rollout-and-drift.md new file mode 100644 index 00000000..e4b384cb --- /dev/null +++ b/infra/modules/aws/_shared/service/docs/rollout-and-drift.md @@ -0,0 +1,35 @@ +# ECS Rollout And Drift + +## CI / Deploy Expectations + +- infrastructure applies create the stable service shape and any CodeDeploy wiring needed for load-balanced services +- deploy workflows register and promote real `task_*` revisions +- the deployment workflow applies the new task revision, uses CodeDeploy for load-balanced services, and uses native rolling deploys for internal services +- the shared module accepts `codedeploy_alarm_names` for automatic rollback + +## Rollback + +Use CloudWatch alarms with `codedeploy_alarm_names` when you want ECS CodeDeploy to roll back a load-balanced service deployment automatically. + +```hcl +codedeploy_alarm_names = [ + local.api_5xx_alarm_name +] +``` + +The alarm resources themselves are owned by the caller. This shared module consumes the alarm names and wires them into the ECS deployment group. + +## Drift / Ownership Rules + +The ECS service ignores: + +- `task_definition` +- `load_balancer` +- dedicated-listener `default_action` + +Reason: + +- deploy workflows own the live revision +- infra owns the stable service shape +- CodeDeploy ECS services reject `load_balancer` updates via `UpdateService` +- CodeDeploy also owns the live target-group switch on dedicated listeners diff --git a/infra/modules/aws/_shared/service/docs/scaling-patterns.md b/infra/modules/aws/_shared/service/docs/scaling-patterns.md new file mode 100644 index 00000000..1abf8af4 --- /dev/null +++ b/infra/modules/aws/_shared/service/docs/scaling-patterns.md @@ -0,0 +1,75 @@ +# ECS Scaling Patterns + +Use `desired_task_count` as the steady-state baseline and `scaling_strategy` when you want autoscaling above that baseline. + +## Fixed Task Count + +- use for predictable or low-volume services where a fixed number of tasks is enough +- leave `scaling_strategy = {}` + +```hcl +desired_task_count = 1 + +scaling_strategy = {} +``` + +## CPU Scaling + +- use when task CPU is the best leading signal for scale pressure +- best fit for internal workers or APIs whose load correlates with compute saturation + +```hcl +desired_task_count = 1 + +scaling_strategy = { + max_scaled_task_count = 4 + cpu = { + scale_out_threshold = 70 + scale_in_threshold = 30 + scale_out_adjustment = 1 + scale_in_adjustment = -1 + cooldown_out = 120 + cooldown_in = 300 + } +} +``` + +## SQS Scaling + +- use for queue-driven workers +- scale decisions are based on the named queue's visible-message count + +```hcl +desired_task_count = 1 + +scaling_strategy = { + max_scaled_task_count = 6 + sqs = { + queue_name = "my-worker-queue" + scale_out_threshold = 10 + scale_in_threshold = 0 + scale_out_adjustment = 1 + scale_in_adjustment = -1 + cooldown_out = 60 + cooldown_in = 300 + } +} +``` + +## ALB Request Scaling + +- use for load-balanced HTTP services +- scale decisions are based on target requests per task behind the ALB + +```hcl +desired_task_count = 2 + +scaling_strategy = { + max_scaled_task_count = 6 + alb = { + target_requests_per_task = 100 + cooldown_out = 60 + cooldown_in = 300 + } +} +``` From 1b4bc89a227edcbfe0224c6e99a7b483d45d68b6 Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Fri, 29 May 2026 10:30:02 +0100 Subject: [PATCH 06/12] chore: rm main sections --- README.md | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/README.md b/README.md index f7c94eef..c0e05f93 100644 --- a/README.md +++ b/README.md @@ -3,16 +3,6 @@ **Terraform + GitHub Actions for AWS serverless deployments.** Lambda + ECS with CodeDeploy rollouts, plus provisioned concurrency controls for Lambda — driven by clean module variables and `just` recipes. -## Sections - -- [Overview](#overview) -- [Using This Template With An AI Agent](#using-this-template-with-an-ai-agent) -- [Bootstrap-Friendly Plans](#bootstrap-friendly-plans) -- [Get Started Locally](#get-started-locally) -- [Infra Deployment Use Cases](#infra-deployment-use-cases) -- [Reference](#reference) -- [Read This Next](#read-this-next) - ## Overview - Terraform/Terragrunt stacks for a typical AWS application shape: APIs, workers, frontend, database, auth, and messaging From a3197e43b4eb20172a3002a93b4330f2f267ec17 Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Fri, 29 May 2026 10:33:26 +0100 Subject: [PATCH 07/12] chore: offer to rename project at bootstrap flow --- REPO_INSTRUCTIONS.md | 2 +- docs/agent/app-shaping.md | 5 +++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/REPO_INSTRUCTIONS.md b/REPO_INSTRUCTIONS.md index e0561fbf..b4366612 100644 --- a/REPO_INSTRUCTIONS.md +++ b/REPO_INSTRUCTIONS.md @@ -134,7 +134,7 @@ Keep this high-level contract in mind even before loading the detailed flow: - determine selected capabilities and list major unused capabilities rather than assuming they should stay forever - record durable app-shaping answers in `BOOTSTRAP_DECISIONS.md` - align local development, workflows, infra stacks, runtime code, docs, and verification commands with the selected app shape -- always surface public exposure, authentication, cost, and bootstrap implications before closing the task +- always surface public exposure, authentication, cost, bootstrap implications, and any needed README/context refresh before closing the task ## Bootstrap Operations diff --git a/docs/agent/app-shaping.md b/docs/agent/app-shaping.md index b616d3e7..f980a1f6 100644 --- a/docs/agent/app-shaping.md +++ b/docs/agent/app-shaping.md @@ -21,3 +21,8 @@ Use this shared flow when adapting an external app, replacing the placeholder ap - always consider security during app shaping; if a proposed API would be exposed to the public internet, say that explicitly and suggest at least one more secure option - do not assume a public unauthenticated API is acceptable just because it is the simplest technical shape - before closing an app-shaping task, explicitly name what remains, what was removed, what still needs operational setup, and any bootstrap commands the user should run +- at the end of a bootstrap, simplification, or replacement flow, offer to update the README and related context docs so they describe the selected app rather than the original template +- when replacement or bootstrap intent is confirmed, the root README title should become the app/product/repo name rather than a template name; if the right title is not obvious, ask the user to confirm it before renaming the title +- remove or rewrite stale references to "template", "placeholder", "boilerplate", demo apps, and unused capabilities in human-facing docs unless the reference is still intentionally describing this repo's reusable scaffolding behavior +- review all relevant README/docs at that point for human readability and agent parsability: clear title, short purpose statement, current capability list, accurate "read next" links, no stale runtime paths, and enough ownership/routing detail for future agents to load only the needed context +- when doc titles, product name, or app positioning are subjective, check the proposed title or naming with the user before making broad doc rewrites From 3f3c84000fc622ff64314c65a81e88741dc0d896 Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Fri, 29 May 2026 10:47:15 +0100 Subject: [PATCH 08/12] chore: update CONTRIBUTING.md --- CONTRIBUTING.md | 83 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 66 insertions(+), 17 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index dcc8336b..b8277c6e 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,26 +1,75 @@ -# Contributing +# Changing And Deploying Safely -## Docs Expectations +This repo separates infrastructure changes from feature-code rollouts. Treat that split as the default working model. -Keep documentation aligned with code changes: +## Before Changing Anything -- CI/CD behavior -- Terraform module inputs or outputs -- deployment strategy -- bootstrap behavior -- operator-facing commands +- Read the nearest owning README before editing code, Terraform, Terragrunt, workflows, or runtime behavior. +- Keep changes narrow: one infrastructure concern, runtime, workflow contract, or deployment path per PR when possible. +- Update docs in the same PR when behavior, commands, module inputs/outputs, workflow contracts, bootstrap order, or operator actions change. +- Prefer focused validation over broad run-all commands. Name any validation you could not run and why. -Also update the affected module `README.md` files under `infra/modules/**` whenever module responsibilities, dependencies, inputs, or outputs change. +## Safe Infrastructure Changes -## AI-Assisted Changes +Use infrastructure workflows for Terraform/Terragrunt shape changes only. Applying infra should create or update the stable deploy surface; it should not be the mechanism that rolls out new feature code. -AI-assisted changes should follow the same repo contracts as manual changes: +Recommended flow: -- read the nearest owning README before changing code -- keep docs aligned with workflow/module/runtime changes -- when HCL or Terraform dependencies change, run the smallest relevant `just tg plan` or `validate` when feasible (or call out why it could not be run) +1. Make the smallest module/live-stack change that owns the behavior. +2. Check dependency edges and mock outputs if a stack consumes another stack through Terragrunt `dependency`. +3. Run the smallest relevant local plan or validate when feasible, for example: -## Working Style +```sh +just tg dev aws/lambda_api plan +just tg dev aws/service_api plan +``` -- keep module READMEs short and operational -- prefer updating existing docs in the same PR rather than leaving follow-up documentation tasks +4. For workflow-managed environments, prefer saved-plan review before apply: + - `dev_infra_plan.yml` + - `dev_infra_apply_from_plan.yml` + - `prod_infra_plan.yml` + - `prod_infra_apply_from_plan.yml` +5. Use no-plan applies only when the change is low risk or already reviewed through another path: + - `dev_infra_apply_no_plan.yml` + - `prod_infra_apply_no_plan.yml` + +Saved plans are apply-intent artifacts. Do not reuse a saved plan if upstream real outputs have changed, if it captured mock outputs, or if artifact retention may have expired. + +## Deploying Code Without Changing Infra + +Use code deploy workflows for Lambda zips, ECS task images, and frontend assets. These workflows publish artifacts and roll them into infrastructure that already exists. + +- Dev code deploy: `dev_code_deploy.yml` +- Prod code deploy: `prod_code_deploy.yml` +- Release build and publish: `release.yml` + +For an individual runtime, deploy only the relevant artifact/version where the workflow input supports it. Typical targets are: + +- Lambda function code under `lambdas/` +- ECS service images under `containers/` +- frontend assets under `frontend` + +Do not bundle unrelated infra changes into a code-only deploy. If a code change needs a new environment variable, IAM permission, route, queue, table, database object, or service shape, apply the infra change first, then deploy code. + +## Runtime-Specific Checks + +- Lambda changes: confirm the matching live stack exists and the Lambda deploy matrix will include the function. +- ECS changes: confirm the `containers/` directory has matching `task_` and `service_` live stacks when it is a service runtime. +- Frontend changes: confirm the frontend artifact is published before deploying assets to the live bucket and invalidating CloudFront. +- Migration changes: keep migration invocation explicit; do not rely on unrelated runtime deploys to imply database changes. + +## PR Expectations + +PRs should make the rollout path obvious: + +- state whether the change is infra, code deploy, docs-only, or a combination +- list the exact local commands or workflows used for validation +- call out any skipped plan, skipped deploy, missing AWS access, or manual follow-up +- include docs updates for changed operator behavior + +The workflow docs own deeper CI contract detail: + +- entrypoints: `.github/docs/workflow-entrypoints.md` +- saved plans: `.github/docs/artifacts-and-plans.md` +- discovery and matrices: `.github/docs/discovery-and-matrices.md` +- reusable workflow contracts: `.github/docs/reusable-workflows.md` From e75267843ec87e7c58687bbf45070d9d2fd5faf8 Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Fri, 29 May 2026 10:48:32 +0100 Subject: [PATCH 09/12] chore: add links --- CONTRIBUTING.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b8277c6e..d990b810 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -69,7 +69,7 @@ PRs should make the rollout path obvious: The workflow docs own deeper CI contract detail: -- entrypoints: `.github/docs/workflow-entrypoints.md` -- saved plans: `.github/docs/artifacts-and-plans.md` -- discovery and matrices: `.github/docs/discovery-and-matrices.md` -- reusable workflow contracts: `.github/docs/reusable-workflows.md` +- entrypoints: [.github/docs/workflow-entrypoints.md](.github/docs/workflow-entrypoints.md) +- saved plans: [.github/docs/artifacts-and-plans.md](.github/docs/artifacts-and-plans.md) +- discovery and matrices: [.github/docs/discovery-and-matrices.md](.github/docs/discovery-and-matrices.md) +- reusable workflow contracts: [.github/docs/reusable-workflows.md](.github/docs/reusable-workflows.md) From b026fcdfae790a6644322bd92b7a0dad824fa489 Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Fri, 29 May 2026 11:08:42 +0100 Subject: [PATCH 10/12] chore: question/answer flow preferred --- REPO_INSTRUCTIONS.md | 3 +++ docs/agent/app-shaping.md | 6 ++++++ 2 files changed, 9 insertions(+) diff --git a/REPO_INSTRUCTIONS.md b/REPO_INSTRUCTIONS.md index b4366612..f31f69e2 100644 --- a/REPO_INSTRUCTIONS.md +++ b/REPO_INSTRUCTIONS.md @@ -132,6 +132,9 @@ Keep this high-level contract in mind even before loading the detailed flow: - determine additive versus replacement intent unless it is already clear - determine selected capabilities and list major unused capabilities rather than assuming they should stay forever +- guide the human through unresolved app-shaping decisions step by step instead of presenting a long questionnaire +- prefer selectable options with a recommended default when the interface supports them, and explain the consequence of each option briefly +- keep the human oriented during app-shaping and bootstrap work with short updates about what is being inspected, decided, changed, or run - record durable app-shaping answers in `BOOTSTRAP_DECISIONS.md` - align local development, workflows, infra stacks, runtime code, docs, and verification commands with the selected app shape - always surface public exposure, authentication, cost, bootstrap implications, and any needed README/context refresh before closing the task diff --git a/docs/agent/app-shaping.md b/docs/agent/app-shaping.md index f980a1f6..bccf694b 100644 --- a/docs/agent/app-shaping.md +++ b/docs/agent/app-shaping.md @@ -6,9 +6,15 @@ Use this shared flow when adapting an external app, replacing the placeholder ap - when the target repo is empty or effectively empty, enter this flow immediately; treat a repo as effectively empty when it has no meaningful app, infra, runtime, or workflow code beyond placeholders, starter files, or minimal scaffolding - determine the selected app capabilities, such as frontend, backend API, batch/worker runtime, database, auth, messaging, containers/ECS, Lambda, scheduled jobs, or static hosting - ask only the missing app-shaping questions that are not already answered in `BOOTSTRAP_DECISIONS.md` +- prefer a staged question-and-answer flow over asking every open question at once +- ask the smallest useful next question, usually one decision at a time, and only group questions when they are tightly coupled +- when the interface supports selectable options, present 2-3 concrete choices with a recommended option first; include an escape hatch for a custom answer when possible +- explain the practical effect of each option in one short sentence so the human can answer without already knowing this repo's architecture +- after each answer, restate the recorded decision briefly, then continue to the next unresolved decision - persist durable bootstrap, simplification, replacement, and capability-selection answers in `BOOTSTRAP_DECISIONS.md` so they do not need to be asked repeatedly - before asking a recorded app-shaping question, check `BOOTSTRAP_DECISIONS.md` first and reuse the recorded answer unless the user changes it - if the user gives an answer that conflicts with an existing entry in `BOOTSTRAP_DECISIONS.md`, warn that the recorded decision is changing, then update the file +- while in app-shaping or bootstrap flow, keep the human oriented with short progress updates that say what context is being inspected, what decision is being resolved, or what operation is about to run - if the user says the work is replacement, remove placeholder/demo code, docs, local services, infra stacks, workflow surface, and stale runtime paths that no longer serve the selected app shape - do not keep unused demo capabilities just because they existed in the template - do not delete or replace template/example code solely because a new feature request could be implemented more cleanly without it; replacement intent or a recorded decision must be clear From 0a796a026e36a0be7ebf24246cf521a595edac5e Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Fri, 29 May 2026 12:49:43 +0100 Subject: [PATCH 11/12] chore: rm agent bootstrap flow (mv'd to skill) --- BOOTSTRAP_DECISIONS.md | 9 ------ README.md | 16 +++------- REPO_INSTRUCTIONS.md | 64 ++------------------------------------- docs/agent/app-shaping.md | 34 --------------------- 4 files changed, 7 insertions(+), 116 deletions(-) delete mode 100644 BOOTSTRAP_DECISIONS.md delete mode 100644 docs/agent/app-shaping.md diff --git a/BOOTSTRAP_DECISIONS.md b/BOOTSTRAP_DECISIONS.md deleted file mode 100644 index bd0df30d..00000000 --- a/BOOTSTRAP_DECISIONS.md +++ /dev/null @@ -1,9 +0,0 @@ -# Bootstrap Decisions - -Use this file to record bootstrap-specific questions and confirmed user answers so they do not need to be re-asked. - -If a later user answer changes a recorded decision, warn about the change and update this file. - -## Decisions - -- No bootstrap decisions recorded yet. diff --git a/README.md b/README.md index c0e05f93..f602c820 100644 --- a/README.md +++ b/README.md @@ -10,23 +10,17 @@ Lambda + ECS with CodeDeploy rollouts, plus provisioned concurrency controls for - shared deployment patterns for Lambda and ECS, with repo-local `just` commands for local and CI operations - runtime and infrastructure layouts designed to be extended without having to rediscover the whole repo each time -## Using This Template With An AI Agent +## Using This As A Reference Template -If you are using an AI coding agent, start with plain-language requests like: +To bootstrap another repo from this one, use the `repo-reference-scaffold` skill. -```text -add a new environment called qa -``` - -```text -Give me a site with a backend and a database -``` +Placeholder prompt: ```text -look at ../sandbox and tell me how to deploy it with this repo +Use $repo-reference-scaffold with this repo as the reference. ``` -The agent instructions live in [REPO_INSTRUCTIONS.md](REPO_INSTRUCTIONS.md); these examples are human-friendly starting prompts. +The local repo instructions live in [REPO_INSTRUCTIONS.md](REPO_INSTRUCTIONS.md). ## Bootstrap-Friendly Plans diff --git a/REPO_INSTRUCTIONS.md b/REPO_INSTRUCTIONS.md index f31f69e2..42758ab2 100644 --- a/REPO_INSTRUCTIONS.md +++ b/REPO_INSTRUCTIONS.md @@ -8,15 +8,6 @@ These instructions apply to the entire repository. - if the user mentions another local repository or folder, treat it as external reference material unless the user explicitly says to move the work there - do not assume another repository inherits instructions from this repository -## Template Role - -- treat this repository as the deployable template and implementation target unless the user explicitly says otherwise -- when the user supplies a path to different source code, treat that code as reference input by default and make changes in this repository unless the user explicitly redirects the work -- when the user points to another repository, inspect that repository to understand the app shape, product behavior, and capability needs, then propose or implement the corresponding changes in this repository -- prefer translating the external app into this repository's existing platform patterns rather than copying code across verbatim -- treat external repositories as read-only unless the user explicitly requests edits there -- when the external app shape does not map cleanly to this repository, explain the gap, state the closest repo-native deployment shape, and ask the user to confirm before making broad changes - ## Keep `AGENTS.md` and `CLAUDE.md` identical `REPO_INSTRUCTIONS.md` is the shared source of truth for repo guidance. @@ -76,78 +67,27 @@ These instructions apply to the entire repository. | `frontend/**` | `frontend/README.md`, plus `infra/modules/aws/frontend/README.md` and `infra/modules/aws/cognito/README.md` when deployed hosting or auth changes | | `justfile.ci`, `justfile.deploy`, or reusable workflow behavior | `.github/docs/README.md`, then `reusable-workflows.md`, `artifacts-and-plans.md`, or `discovery-and-matrices.md` as relevant | | `justfile.destroy` | `.github/docs/README.md` and `.github/docs/destroy.md` before editing | -| external app adaptation, placeholder replacement, template simplification, or app bootstrapping | `docs/agent/app-shaping.md` | ## Task Interpretation - interpret brief requests using this repo's existing patterns and contracts rather than taking them literally -- when a request mentions external source code and asks how to build, make ready, or deploy it, interpret that as "understand the external app, then answer in terms of how this repository should implement or deploy it" unless the user explicitly redirects the work - read the relevant local contract docs before editing and follow them - prefer the smallest complete change that matches existing repo patterns - remove stale code, temporary helpers, and abandoned experiment residue as part of the same change rather than leaving dead paths behind - verify related workflows, infra, docs, and downstream dependencies when the request affects shared behavior - state material assumptions when the intended shape is not fully explicit - when ambiguity is material or a wrong assumption could cause the repo shape or contract to drift, ask the user a clarifying question before editing -- for broad product or app-shaping requests, provide a short pre-implementation summary of the inferred app shape, likely capability choices, major assumptions, important questions, and notable cost or security implications before making changes - -Example requests to interpret through these repo-native rules: - -```text -add a new environment called qa -``` - -```text -Give me a site with a backend and a database -``` - -```text -look at ../sandbox and tell me how to deploy -``` - -## Capability Selection - -- treat this repo as a menu of optional platform capabilities, not just a single fixed app shape -- infer which capabilities the user is selecting from the request, and which existing capabilities fall outside that target shape -- when the requested shape uses only a subset of the repo's current capabilities, explicitly list the major unused capabilities and ask whether they should be kept for future use or removed -- do not assume that unmentioned capabilities should stay forever, and do not remove them without confirmation -- when a user asks for a website or frontend with a backend but does not specify the backend runtime, prefer the simplest repo-native backend shape as the default starting assumption -- in this repo, default that assumption to a Lambda-backed API unless the user asks for ECS, long-running workers, containers, or another specific runtime -- state that assumption and ask for confirmation before making changes when backend choice materially affects infrastructure shape, cost, or security ## Runtime Network Placement - do not assume ECS services must run in private subnets -- when adapting an app that needs outbound internet access, explicitly ask whether the runtime should run in public subnets or private subnets before recommending NAT gateways +- when a service needs outbound internet access, explicitly ask whether the runtime should run in public subnets or private subnets before recommending NAT gateways - only recommend NAT gateways when private subnet placement is required, explicitly chosen, or otherwise necessary for the selected security model - if a service can safely run in public subnets, call out that public subnet placement with task public IPs may be the lower-cost deployment shape and explain the security implications - for public-subnet ECS services, require a clear ingress model before implementation: public load balancer or API Gateway path, security group restrictions, authentication requirements, and whether tasks should receive public IPs -- for scraper, polling, webhook, or external-API-heavy services, treat subnet placement as an app-shaping decision because outbound connectivity affects architecture, cost, and security +- for scraper, polling, webhook, or external-API-heavy services, treat subnet placement as an explicit architecture decision because outbound connectivity affects architecture, cost, and security - do not list NAT as an AWS prerequisite unless the selected runtime placement uses private subnets and needs outbound internet access -## App Shaping Flow - -When the user is adapting an external app, replacing the placeholder app, simplifying the template, or bootstrapping a new app from this repo, read and follow `docs/agent/app-shaping.md` before proposing or editing the app shape. - -Keep this high-level contract in mind even before loading the detailed flow: - -- determine additive versus replacement intent unless it is already clear -- determine selected capabilities and list major unused capabilities rather than assuming they should stay forever -- guide the human through unresolved app-shaping decisions step by step instead of presenting a long questionnaire -- prefer selectable options with a recommended default when the interface supports them, and explain the consequence of each option briefly -- keep the human oriented during app-shaping and bootstrap work with short updates about what is being inspected, decided, changed, or run -- record durable app-shaping answers in `BOOTSTRAP_DECISIONS.md` -- align local development, workflows, infra stacks, runtime code, docs, and verification commands with the selected app shape -- always surface public exposure, authentication, cost, bootstrap implications, and any needed README/context refresh before closing the task - -## Bootstrap Operations - -- at the end of app-shaping work, offer the next operational bootstrap steps needed to make the selected app shape real end to end -- for AWS-backed deployments, this usually includes creating or updating GitHub OIDC roles, applying foundational stacks in dependency order, deploying initial infrastructure, publishing first runtime artifacts, running migrations, and seeding initial users when Cognito is enabled -- before the first plan, apply, prerequisite check, or other AWS interaction in a task, confirm which AWS role, user, and account will be used -- do not run AWS-mutating bootstrap commands without explicit user approval -- when offering OIDC setup, name the exact commands, for example `just tg ci aws/oidc apply`, `just tg dev aws/oidc apply`, or `just tg prod aws/oidc apply` -- when offering first environment setup, separate infra bootstrap from code deployment and call out any prerequisite shared resources such as VPCs, tagged subnets, hosted zones, ECR images, code buckets, or Terraform state - ## CI OIDC Scope - treat `infra/live/ci/aws/oidc/terragrunt.hcl` as intentionally narrow diff --git a/docs/agent/app-shaping.md b/docs/agent/app-shaping.md deleted file mode 100644 index bccf694b..00000000 --- a/docs/agent/app-shaping.md +++ /dev/null @@ -1,34 +0,0 @@ -# App Shaping Flow - -Use this shared flow when adapting an external app, replacing the placeholder app, simplifying the template, or bootstrapping a new app from this repo. - -- first determine whether the work is additive or replacement unless the intent is already clear -- when the target repo is empty or effectively empty, enter this flow immediately; treat a repo as effectively empty when it has no meaningful app, infra, runtime, or workflow code beyond placeholders, starter files, or minimal scaffolding -- determine the selected app capabilities, such as frontend, backend API, batch/worker runtime, database, auth, messaging, containers/ECS, Lambda, scheduled jobs, or static hosting -- ask only the missing app-shaping questions that are not already answered in `BOOTSTRAP_DECISIONS.md` -- prefer a staged question-and-answer flow over asking every open question at once -- ask the smallest useful next question, usually one decision at a time, and only group questions when they are tightly coupled -- when the interface supports selectable options, present 2-3 concrete choices with a recommended option first; include an escape hatch for a custom answer when possible -- explain the practical effect of each option in one short sentence so the human can answer without already knowing this repo's architecture -- after each answer, restate the recorded decision briefly, then continue to the next unresolved decision -- persist durable bootstrap, simplification, replacement, and capability-selection answers in `BOOTSTRAP_DECISIONS.md` so they do not need to be asked repeatedly -- before asking a recorded app-shaping question, check `BOOTSTRAP_DECISIONS.md` first and reuse the recorded answer unless the user changes it -- if the user gives an answer that conflicts with an existing entry in `BOOTSTRAP_DECISIONS.md`, warn that the recorded decision is changing, then update the file -- while in app-shaping or bootstrap flow, keep the human oriented with short progress updates that say what context is being inspected, what decision is being resolved, or what operation is about to run -- if the user says the work is replacement, remove placeholder/demo code, docs, local services, infra stacks, workflow surface, and stale runtime paths that no longer serve the selected app shape -- do not keep unused demo capabilities just because they existed in the template -- do not delete or replace template/example code solely because a new feature request could be implemented more cleanly without it; replacement intent or a recorded decision must be clear -- keep or remove unused capabilities based on the recorded decision, and do not assume unmentioned capabilities should stay forever -- still confirm before removing expensive or shared infrastructure capabilities, such as load balancers, ECS clusters, databases, Cognito, Route53/CloudFront, or messaging, unless the user explicitly names them for removal -- when removal would affect major capabilities, briefly list what would remain and what would be removed before editing -- align local development, workflows, infra stacks, runtime code, docs, and verification commands with the selected app shape -- for AWS-backed deployment shapes, offer to check required deployment prerequisites at the point the selected environment/domain is known; expected checks include the VPC, tagged subnets, and Route53 hosted zone required by the selected domain -- before relying on a hosted zone, confirm the intended hosted zone name with the user and verify it matches the selected `domain_name`/frontend domain shape -- always consider security during app shaping; if a proposed API would be exposed to the public internet, say that explicitly and suggest at least one more secure option -- do not assume a public unauthenticated API is acceptable just because it is the simplest technical shape -- before closing an app-shaping task, explicitly name what remains, what was removed, what still needs operational setup, and any bootstrap commands the user should run -- at the end of a bootstrap, simplification, or replacement flow, offer to update the README and related context docs so they describe the selected app rather than the original template -- when replacement or bootstrap intent is confirmed, the root README title should become the app/product/repo name rather than a template name; if the right title is not obvious, ask the user to confirm it before renaming the title -- remove or rewrite stale references to "template", "placeholder", "boilerplate", demo apps, and unused capabilities in human-facing docs unless the reference is still intentionally describing this repo's reusable scaffolding behavior -- review all relevant README/docs at that point for human readability and agent parsability: clear title, short purpose statement, current capability list, accurate "read next" links, no stale runtime paths, and enough ownership/routing detail for future agents to load only the needed context -- when doc titles, product name, or app positioning are subjective, check the proposed title or naming with the user before making broad doc rewrites From 9b94ea56cc289ed89166c628e91f826f6c066875 Mon Sep 17 00:00:00 2001 From: chrispsheehan Date: Mon, 1 Jun 2026 10:58:01 +0100 Subject: [PATCH 12/12] chore: state lock instructions --- infra/README.md | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/infra/README.md b/infra/README.md index 55b657f6..484dba2a 100644 --- a/infra/README.md +++ b/infra/README.md @@ -46,6 +46,55 @@ stores state at: `dev/aws/task_worker/terraform.tfstate` +## Clearing State Locks + +This repo uses the Terraform S3 backend `use_lockfile = true` setting rather than DynamoDB locking. +That means each live stack lock is an S3 object next to the state key: + +```text +///terraform.tfstate.tflock +``` + +Only clear a lock after confirming no Terraform or Terragrunt command is still running for that stack. +Do not remove a lock just to bypass an active apply, plan, or destroy. + +The `infra/root.hcl` init hook prints both values during Terragrunt init: + +```text +STATE:/ LOCKFILE:.tflock +``` + +For example, the dev security stack lock is: + +```text +s3://700060376888-eu-west-2-aws-serverless-github-deploy-tfstate/dev/aws/security/terraform.tfstate.tflock +``` + +Check whether the lock exists: + +```sh +aws s3api head-object \ + --bucket 700060376888-eu-west-2-aws-serverless-github-deploy-tfstate \ + --key dev/aws/security/terraform.tfstate.tflock \ + --region eu-west-2 +``` + +If the command returns `404 Not Found`, there is no lock object to clear. +If the lock is stale and no Terragrunt/Terraform process is active, remove it: + +```sh +aws s3api delete-object \ + --bucket 700060376888-eu-west-2-aws-serverless-github-deploy-tfstate \ + --key dev/aws/security/terraform.tfstate.tflock \ + --region eu-west-2 +``` + +For another stack, keep the same bucket pattern and replace the key with: + +```text +/aws//terraform.tfstate.tflock +``` + ## Module Types - `_shared/*`