Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions BOOTSTRAP_DECISIONS.md

This file was deleted.

83 changes: 66 additions & 17 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,75 @@
# Contributing
# Changing And Deploying Safely

## Docs Expectations
This repo separates infrastructure changes from feature-code rollouts. Treat that split as the default working model.

Keep documentation aligned with code changes:
## Before Changing Anything

- CI/CD behavior
- Terraform module inputs or outputs
- deployment strategy
- bootstrap behavior
- operator-facing commands
- Read the nearest owning README before editing code, Terraform, Terragrunt, workflows, or runtime behavior.
- Keep changes narrow: one infrastructure concern, runtime, workflow contract, or deployment path per PR when possible.
- Update docs in the same PR when behavior, commands, module inputs/outputs, workflow contracts, bootstrap order, or operator actions change.
- Prefer focused validation over broad run-all commands. Name any validation you could not run and why.

Also update the affected module `README.md` files under `infra/modules/**` whenever module responsibilities, dependencies, inputs, or outputs change.
## Safe Infrastructure Changes

## AI-Assisted Changes
Use infrastructure workflows for Terraform/Terragrunt shape changes only. Applying infra should create or update the stable deploy surface; it should not be the mechanism that rolls out new feature code.

AI-assisted changes should follow the same repo contracts as manual changes:
Recommended flow:

- read the nearest owning README before changing code
- keep docs aligned with workflow/module/runtime changes
- when HCL or Terraform dependencies change, run the smallest relevant `just tg <env> <module> plan` or `validate` when feasible (or call out why it could not be run)
1. Make the smallest module/live-stack change that owns the behavior.
2. Check dependency edges and mock outputs if a stack consumes another stack through Terragrunt `dependency`.
3. Run the smallest relevant local plan or validate when feasible, for example:

## Working Style
```sh
just tg dev aws/lambda_api plan
just tg dev aws/service_api plan
```

- keep module READMEs short and operational
- prefer updating existing docs in the same PR rather than leaving follow-up documentation tasks
4. For workflow-managed environments, prefer saved-plan review before apply:
- `dev_infra_plan.yml`
- `dev_infra_apply_from_plan.yml`
- `prod_infra_plan.yml`
- `prod_infra_apply_from_plan.yml`
5. Use no-plan applies only when the change is low risk or already reviewed through another path:
- `dev_infra_apply_no_plan.yml`
- `prod_infra_apply_no_plan.yml`

Saved plans are apply-intent artifacts. Do not reuse a saved plan if upstream real outputs have changed, if it captured mock outputs, or if artifact retention may have expired.

## Deploying Code Without Changing Infra

Use code deploy workflows for Lambda zips, ECS task images, and frontend assets. These workflows publish artifacts and roll them into infrastructure that already exists.

- Dev code deploy: `dev_code_deploy.yml`
- Prod code deploy: `prod_code_deploy.yml`
- Release build and publish: `release.yml`

For an individual runtime, deploy only the relevant artifact/version where the workflow input supports it. Typical targets are:

- Lambda function code under `lambdas/<name>`
- ECS service images under `containers/<name>`
- frontend assets under `frontend`

Do not bundle unrelated infra changes into a code-only deploy. If a code change needs a new environment variable, IAM permission, route, queue, table, database object, or service shape, apply the infra change first, then deploy code.

## Runtime-Specific Checks

- Lambda changes: confirm the matching live stack exists and the Lambda deploy matrix will include the function.
- ECS changes: confirm the `containers/<name>` directory has matching `task_<name>` and `service_<name>` live stacks when it is a service runtime.
- Frontend changes: confirm the frontend artifact is published before deploying assets to the live bucket and invalidating CloudFront.
- Migration changes: keep migration invocation explicit; do not rely on unrelated runtime deploys to imply database changes.

## PR Expectations

PRs should make the rollout path obvious:

- state whether the change is infra, code deploy, docs-only, or a combination
- list the exact local commands or workflows used for validation
- call out any skipped plan, skipped deploy, missing AWS access, or manual follow-up
- include docs updates for changed operator behavior

The workflow docs own deeper CI contract detail:

- entrypoints: [.github/docs/workflow-entrypoints.md](.github/docs/workflow-entrypoints.md)
- saved plans: [.github/docs/artifacts-and-plans.md](.github/docs/artifacts-and-plans.md)
- discovery and matrices: [.github/docs/discovery-and-matrices.md](.github/docs/discovery-and-matrices.md)
- reusable workflow contracts: [.github/docs/reusable-workflows.md](.github/docs/reusable-workflows.md)
145 changes: 9 additions & 136 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,43 +3,24 @@
**Terraform + GitHub Actions for AWS serverless deployments.**
Lambda + ECS with CodeDeploy rollouts, plus provisioned concurrency controls for Lambda — driven by clean module variables and `just` recipes.

## Sections

- [Overview](#overview)
- [Using This Template With An AI Agent](#using-this-template-with-an-ai-agent)
- [Bootstrap-Friendly Plans](#bootstrap-friendly-plans)
- [Prerequisites](#prerequisites)
- [Setup](#setup)
- [Common Tasks](#common-tasks)
- [Local Development](#local-development)
- [Infra Deployment Use Cases](#infra-deployment-use-cases)
- [Reference](#reference)
- [Read This Next](#read-this-next)

## Overview

- Terraform/Terragrunt stacks for a typical AWS application shape: APIs, workers, frontend, database, auth, and messaging
- GitHub Actions workflows for infrastructure apply, artifact build, code deploy, and destroy
- shared deployment patterns for Lambda and ECS, with repo-local `just` commands for local and CI operations
- runtime and infrastructure layouts designed to be extended without having to rediscover the whole repo each time

## Using This Template With An AI Agent
## Using This As A Reference Template

If you are using an AI coding agent, start with plain-language requests like:
To bootstrap another repo from this one, use the `repo-reference-scaffold` skill.

```text
add a new environment called qa
```
Placeholder prompt:

```text
Give me a site with a backend and a database
Use $repo-reference-scaffold with this repo as the reference.
```

```text
look at ../sandbox and tell me how to deploy it with this repo
```

The agent instructions live in [REPO_INSTRUCTIONS.md](REPO_INSTRUCTIONS.md); these examples are human-friendly starting prompts.
The local repo instructions live in [REPO_INSTRUCTIONS.md](REPO_INSTRUCTIONS.md).

## Bootstrap-Friendly Plans

Expand All @@ -48,118 +29,9 @@ See [infra/README.md](infra/README.md#dependency-notes) for the dependency strat

Use [CONTRIBUTING.md](CONTRIBUTING.md) for expectations when changing the repo itself.

## Prerequisites

The AWS account must already have the landing-zone or StackSet network in place before deploying this repo.

- the Terraform in this repo reads the VPC and subnets with `data` sources rather than creating them
- the expected VPC and subnets must therefore already exist
- the private subnets must be tagged so the module lookups can find them, for example with names matching `*private*`
- ECS service stacks can optionally place tasks in public subnets with public IPs; public subnets must be tagged so the lookups can find names matching `*public*`
- if you plan to deploy the frontend custom domain, the matching Route53 hosted zone must also already exist
- the S3 Terraform state bucket should have bucket versioning enabled, because the repo uses the [Terraform S3 backend](https://developer.hashicorp.com/terraform/language/backend/s3) lockfile path rather than DynamoDB state locking

If those shared network or DNS resources do not exist yet, the infra applies in this repo will fail during data lookup or certificate/DNS creation.

Required shared prerequisites before a full environment deploy:

- pre-existing VPC
- tagged private subnets that the data lookups can resolve
- Route53 hosted zone for the deployed frontend domain when using the frontend custom domain path

## Setup

### One-Time CI Role Bootstrap

Before GitHub Actions can plan, apply, or deploy, bootstrap the GitHub OIDC roles once per environment:

```sh
just tg ci aws/oidc apply
just tg dev aws/oidc apply
just tg prod aws/oidc apply
```

Run these with local AWS credentials that can create or update IAM roles and policies.

After the roles exist, normal CI/CD workflows assume them through GitHub OIDC, and CI can update the roles when the OIDC module, trust policy, or allowed AWS permissions change.

The `ci` OIDC role is intentionally narrower than the `dev` and `prod` roles.

Detailed scope:

- [infra/modules/aws/_shared/oidc/README.md](infra/modules/aws/_shared/oidc/README.md)

Routing and runtime feasibility contracts:

- [infra/modules/aws/network/README.md](infra/modules/aws/network/README.md)
- [infra/modules/aws/frontend/README.md](infra/modules/aws/frontend/README.md)
- [infra/modules/aws/_shared/service/README.md](infra/modules/aws/_shared/service/README.md)
- [infra/modules/aws/_shared/task/README.md](infra/modules/aws/_shared/task/README.md)

## Common Tasks

The root [`justfile`](justfile) keeps local developer commands.

Split recipe files:

- CI-only helpers: [`justfile.ci`](justfile.ci)
- CI build/deploy helpers: [`justfile.deploy`](justfile.deploy)

Run split files locally with `--justfile`:

```sh
just --justfile justfile.ci tf-lint-check
just --justfile justfile.deploy lambda-get-version
just --justfile justfile.deploy frontend-build
```

### Local Plan Infra

After changing HCL, Terraform modules, live stack dependencies, or infra workflow ordering, run the dev environment plan:

```sh
just tg-all dev plan
```

Use `just tg <env> <module> ...` only for targeted debugging or focused follow-up operations. Detailed Terragrunt graph and saved-plan helper commands live in [infra/README.md](infra/README.md#terragrunt-graph-helpers).

Placeholder app runtime tasks live with the code that owns them:

- Lambda API message publishing: [lambdas/lambda_api/README.md](lambdas/lambda_api/README.md)
- Lambda worker queue publishing: [lambdas/lambda_worker/README.md](lambdas/lambda_worker/README.md)
- ECS worker publishing, database verification, and debug shells: [containers/worker/README.md](containers/worker/README.md)
- Database migration runtime and invocation: [lambdas/migrations/README.md](lambdas/migrations/README.md)
- Frontend auth and API proxy behavior: [frontend/README.md](frontend/README.md)

## Local Development

Start the local stack:

```sh
just start
```

This starts local PostgreSQL, queue emulation, Lambda/ECS runtimes, migrations, the frontend dev server, and log tailing.

Stop the local stack and remove Compose volumes:

```sh
just stop
```

Run only the frontend dev server:

```sh
just frontend
```

Local service notes:
## Get Started Locally

- frontend dev server and local API proxy: [frontend/README.md](frontend/README.md)
- Lambda runtime layout and local watch behavior: [lambdas/README.md](lambdas/README.md)
- ECS runtime layout and local watch behavior: [containers/README.md](containers/README.md)
- Lambda worker local queue publishing: [lambdas/lambda_worker/README.md](lambdas/lambda_worker/README.md)
- ECS worker local queue publishing and database verification: [containers/worker/README.md](containers/worker/README.md)
Local stack commands, common `just` tasks, AWS prerequisites, and OIDC bootstrap commands live in [Get Started Locally](docs/get-started-locally.md).

## Infra Deployment Use Cases

Expand All @@ -184,7 +56,7 @@ For ECS scaling patterns and `scaling_strategy` examples, see:

For the deployment model, runtime rollout split, and strategy overview, see:

- [infra/README.md](infra/README.md#deployment-model)
- [infra/docs/deployment-model.md](infra/docs/deployment-model.md)

## Read This Next

Expand All @@ -200,3 +72,4 @@ For the deployment model, runtime rollout split, and strategy overview, see:
- Frontend auth contract: [infra/modules/aws/cognito/README.md](infra/modules/aws/cognito/README.md)
- Frontend hosting contract: [infra/modules/aws/frontend/README.md](infra/modules/aws/frontend/README.md)
- Runtime log dashboard: [infra/modules/aws/observability/README.md](infra/modules/aws/observability/README.md)
- Get started locally, prerequisites, and bootstrap commands: [docs/get-started-locally.md](docs/get-started-locally.md)
Loading