Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ export default defineConfig({
{ label: 'Architecture', link: '/architecture/' },
{ label: 'How it is tested', link: '/testing/' },
{ label: 'Feature coverage matrix', link: '/coverage-matrix/' },
{ label: 'Release orchestration', link: '/release-orchestration/' },
{ label: 'Security & Hardening', link: '/security/hardening/' },
{ label: 'Versioning & Schema', link: '/versioning/' },
],
Expand Down
12 changes: 7 additions & 5 deletions docs/src/content/docs/coverage-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ covered at that layer by design, not by omission. The "why both layers" section
below explains those choices.

:::tip[Last validated]
This matrix was last validated against a fully green live-fleet run on `v0.5.0-rc.10`:
all ten example repos (primary, artifact-a, artifact-b, single-env, 2env, 3env, 4env,
release-only, no-env, callbacks) passed every probe, and the shared fail-closed reconcile
gate accounted for every run in each scenario window.
This matrix was last validated against the fully green live-fleet run behind `v0.5.1`:
all eleven example repos (primary, artifact-a, artifact-b, single-env, 2env, 3env, 4env,
release-only, no-env, callbacks, rollback-dispatch) passed every probe, and the shared
fail-closed reconcile gate accounted for every run in each scenario window.
:::

## Why two layers, restated for this matrix
Expand Down Expand Up @@ -62,13 +62,14 @@ only under real installation tokens on the fleet, never in the token-free harnes

| Feature | act plus gitea scenario | Live-fleet probe (repo) | Unit | What the layer proves |
|---|---|---|---|---|
| Orchestrate trunk build to release candidate | `01`, `02`, `03`, `04`, `34-extra-orchestrate-triggers` | every repo, orchestrate-on-merge (all 10) | `internal/orchestrate` | A trunk merge mints an RC draft and writes state, across every topology, on real Actions |
| Orchestrate trunk build to release candidate | `01`, `02`, `03`, `04`, `34-extra-orchestrate-triggers` | every repo, orchestrate-on-merge (all 11) | `internal/orchestrate` | A trunk merge mints an RC draft and writes state, across every topology, on real Actions |
| Default promotion (env to next env) | `04`, `promote/cascade-deploy-enabled` | `promote-staging` (2env, 3env, primary) | `internal/promote` | One promotion step copies source state into the target on a real release object |
| Cascade-mode promotion (atomic multi-step) | `04-cascade-promotion` | `lifecycle` dev to prod (4env) | `internal/promote` | The full ladder advances through intermediates and publishes at the top |
| Standalone release lane (draft, prerelease, publish) | `05-publish-callback`, `37`, `38` | dispatch prerelease then release (single-env); `release-only` | `internal/release` | A real release transitions draft to prerelease to published with RC reaping |
| Hotfix clean apply | `hotfix/hotfix-clean-apply`, `hotfix-multi-commit-clean`, `hotfix-multi-env-clean`, `hotfix-rejoin` | hotfix plan, apply, PR merge, finalize (3env) | `internal/hotfix` | A pinned-env fix lands, diverges state, and rejoins on real branches and PRs |
| Hotfix cherry-pick conflict and halt | `hotfix/hotfix-conflict-resolution`, `hotfix-multi-env-conflict-halt` | `probe_hotfix_conflict` (4env) | `internal/hotfix` | A guaranteed conflict raises the conflict label and halts the downstream lane |
| Rollback to prior version or SHA | `rollback/*` (8 scenarios) | `probe_rollback` (4env), `rollback-check` (2env) | `internal/rollback` | An env rewinds, is marked diverged, and the ring snapshot advances |
| External rollback via `repository_dispatch` | | repository_dispatch rollback, state revert asserted (rollback-dispatch) | `internal/rollback` | A real `repository_dispatch` payload drives the automated rollback entry point and the target env's state is read back reverted |
| Drift check and comment | `22-verify-drift`, `27-verify-orphan`, `28-drift-check` | `probe_drift` (4env) | `internal/verify`, `internal/generate` | Generated-vs-committed drift is detected and surfaced on a real run |
| Validate gate | `14-validate-check`, `17-validate-callback` | `probe_validate` (4env); pre-build validate gate (3env) | `internal/generate` | A validate callback gates the build before it proceeds |
| Merge queue | `15-merge-queue` | `probe_merge_queue` (4env) | `internal/generate` | The merge-queue lane is emitted and runs (harness covers the no-configured-queue case) |
Expand Down Expand Up @@ -136,6 +137,7 @@ from a general case.
| Release-only (no deploy environments) | `cascade-example-release-only` | none specific (release lane via `37`, `38`) |
| No-environment library shape | covered in harness only by design | `01-no-env-repo` |
| Primary plus artifact satellites (cross-repo graph) | `cascade-example-primary`, `cascade-example-artifact-a`, `cascade-example-artifact-b` | `multi-repo/*`, `21-cross-repo-callback` |
| External rollback entry point (`repository_dispatch`) | `cascade-example-rollback-dispatch` | `rollback/*` |

The no-environment library shape is covered in the act plus gitea harness; a live
`cascade-example-no-env` suite also asserts that orchestrate goes straight from a
Expand Down
129 changes: 129 additions & 0 deletions docs/src/content/docs/release-orchestration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
title: Release orchestration
description: How cascade validates and ships its own releases. The live fleet fans out in sequenced lanes so peak concurrency on one shared token stays low, a repos selector runs a single lane for development, and a nightly gate cuts and promotes a release only when a fully green fleet agrees and main has accumulated release-worthy changes.
---

This page describes how cascade releases itself. It is maintainer CI: hand-written
tooling that lives in cascade's own repository, alongside `fleet-e2e.yaml`,
`auto-promote.yaml`, and `nightly-release.yaml`. None of it is part of cascade's
generated output. If you are adopting cascade for your own pipelines, this is
background on how the project proves and ships each version, not a feature you
configure.

The release chain is four workflows in sequence. Orchestrate cuts a release
candidate and pushes its tag. Release builds and publishes that tag's assets. The
fleet fans out across every example repository to validate the published binary.
Auto-promote publishes the final version, but only when the entire fleet is green.

## The staged fleet fan-out

The fleet ([`.github/workflows/fleet-e2e.yaml`](https://github.com/stablekernel/cascade/blob/main/.github/workflows/fleet-e2e.yaml))
revalidates the downstream `cascade-example-*` fleet on live GitHub. Every example
repository dispatches its own `scenario-suite.yaml` under one shared fleet token. A
green run means this cascade version validated across all eleven example repositories,
each running its own scenario suite in its own repository context.

Dispatching all eleven repositories at once tripped transient GitHub API failures
(401, 403, and 500 responses) on a rotating repository each run, because they all
draw on the same token. The fan-out is therefore split into sequenced lanes that
hold peak live concurrency near two repositories at a time. A `gh()` transient-retry
wrapper inside each suite remains the per-call backstop; the staging fixes the
structural burst that the wrapper alone could not absorb.

```mermaid
flowchart LR
plan[plan] --> resolve[resolve]
resolve --> repin[repin]
repin --> primary[primary]
primary --> dependents[dependents x2]
dependents --> heavy[4env alone]
heavy --> remainder[remainder, max 2]
remainder --> aggregate[Fleet gate]
```

| Stage | What it does |
|---|---|
| `plan` | Parses the `repos` selector once and emits the lane gates and matrices every fan-out job keys off. This is the single place the fleet roster lives. |
| `resolve` | Gates the run and resolves the cascade version under test, then writes `version-under-test.txt` and a `full-run.txt` completeness marker for auto-promote to read. |
| `repin` | Pins every example repository to the candidate, regenerates its workflows, and pushes the pin to each repository's main. It always covers the full roster regardless of the selector, because pinning is cheap, idempotent, and sequential, so it adds nothing to live fan-out concurrency. Every suite job gates on a green repin so none runs against a stale pin. |
| `primary` | Runs first and must pass before its dependents start. |
| `dependents` | `artifact-a` and `artifact-b` mutate the primary's shared external state, so they run only after the primary is green. The two run together, which is the lane that defines the fleet's peak of about two repositories. |
| `heavy` | `4env` is the heaviest and most fragile repository, so it runs alone in its own job, sequenced after the dependents lane so the two never stack. |
| `remainder` | The light repositories (`3env`, `2env`, `single-env`, `release-only`, `no-env`, `callbacks`, `rollback-dispatch`) run in a matrix capped at two in flight via `max-parallel`, sequenced after the heavy lane. |
| `aggregate` | The Fleet gate. It needs every lane, so a green gate means every selected repository passed. Auto-promote keys off this conclusion. |

The fleet triggers on completion of the Release workflow (the dependable signal that
a candidate tag's assets actually reached the releases page) and on manual dispatch.

## Running a single lane with the repos selector

A full fan-out is the right gate for a release, but it is heavy for developing one
example repository's suite. The `workflow_dispatch` path accepts a `repos` selector
that runs a subset of lanes:

```sh
gh workflow run fleet-e2e.yaml -f repos=4env
```

The selector accepts a single short name, or a comma or space separated list. The
default (no input, which is also the value on the Release-triggered path) is `all`,
which runs the full fleet. The `repin` stage always covers the full roster; only the
suite lanes honor the selector. A lane the selector skips reports `skipped` and the
gate treats it as satisfied, so a subset run still produces a meaningful verdict over
exactly the lanes that ran.

A selective run never auto-promotes. The `plan` stage sets `full_run=true` only when
the selector resolves to `all`, the `resolve` stage records that marker in the
`full-run.txt` artifact, and auto-promote refuses to publish from anything other than
a full run. Only a complete fleet validation is a safe release signal.

## The nightly-gated release

Cascade's orchestrate workflow is dispatch-only, set through `release_trigger: dispatch`
in `.github/manifest.yaml`. A trunk merge no longer cuts a release candidate on its
own, which removes the per-merge candidate churn. The single gate that decides whether
to release is [`nightly-release.yaml`](https://github.com/stablekernel/cascade/blob/main/.github/workflows/nightly-release.yaml).

It runs on a schedule (07:00 UTC daily, off-peak, after late-day merges settle) and
owns only two jobs, `decide` and `dispatch`. Everything from Release onward is the
existing chain, reused unchanged.

`decide` measures whether main has accumulated release-worthy changes since the last
published release:

- The diff base is the latest final release tag, matching `vX.Y.Z` exactly so that a
candidate (`-rc.`) or a leftover dry-run (`-dryrun.`) tag can never become the base.
With no final release yet, or an unresolvable ref, it fails open and proceeds rather
than silently skipping a real release.
- It diffs the base against `origin/main` and classifies each changed path. Code and
the shipped action surface (`cmd/**`, `internal/**`, `go.mod`, `go.sum`,
`.github/actions/**`) count as release-worthy. The manifest counts only when its
non-state subtree changed, so a routine state commit alone is not release-worthy.
Documentation, Markdown, and similar paths never trigger a release on their own.
- If nothing release-worthy changed, the run skips. A missed night just defers: the
diff is always measured against the last release, so accumulated changes still
release on the next run.

When `decide` says to proceed, `dispatch` dispatches orchestrate using the
`CASCADE_STATE_TOKEN`, so the candidate tag push fires Release and the chain continues.
Orchestrate cuts the candidate, Release publishes its assets, the full fleet fans out,
and auto-promote publishes the final version only on a green full run.

### On-demand inputs: force and dry_run

`nightly-release.yaml` also runs on `workflow_dispatch` with two inputs for testing
the path on demand:

- `force` bypasses the change-since-last-release skip, so an unchanged main still
cuts a candidate. It lives entirely inside `decide` and changes nothing downstream.
- `dry_run` rehearses the whole path without publishing. The candidate is cut as a
`vX.Y.Z-dryrun.N` prerelease instead of an `-rc.` candidate. The fleet's `resolve`
gate accepts `-dryrun.` tags, so a dry run fans out across the full fleet and writes
its artifacts exactly like a real candidate. Auto-promote's publish gate stays
`-rc.`-only, so a dry-run tag can validate end to end yet is frozen out of
publication. The `full_run` guard is a second, independent backstop.

A `force` plus `dry_run` dispatch therefore exercises every component of the real
path (the change gate bypass, the candidate cut, Release, the full fleet, the artifact
handoff, and the auto-promote wiring) while proving, by tag identity alone, that
nothing publishes.
3 changes: 3 additions & 0 deletions docs/src/content/docs/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ The fleet (`.github/workflows/fleet-e2e.yaml`) fans out to a set of purpose-buil

The fleet proves the things that only real GitHub can prove: a real release object transitioning from draft to prerelease to published, real release-candidate tags being reaped on publish, the Contents API state-write path, cross-repo dispatch between real repositories, and real branch protection being written through a scoped token.

The fleet fans out in sequenced lanes so peak live concurrency on its one shared token stays low, accepts a `repos` selector for running a single lane during development, and is cut and promoted under a nightly gate that releases only on a fully green fleet. The [Release orchestration](/cascade/release-orchestration/) page documents that machinery in full.

### Unit tests (pure logic)

Underneath both layers, the Go packages carry conventional unit tests for the pure logic: version calculation, change detection, changelog assembly, manifest parsing and validation. These run on every build and are the fastest feedback loop.
Expand Down Expand Up @@ -57,6 +59,7 @@ The example repositories span the supported pipeline shapes, so each topology is
| `cascade-example-release-only` | Release-only repository (no deploy environments), changelog and contributor assembly |
| `cascade-example-primary` | Primary repository receiving external-update and notify handoffs from satellites |
| `cascade-example-artifact-a`, `cascade-example-artifact-b` | Satellite repositories in an artifact-dependency graph that notify the primary |
| `cascade-example-rollback-dispatch` | The automated rollback entry point, where a real `repository_dispatch` payload drives a rollback and the reverted state is read back |

The `no-environment` library shape is covered today in the act plus gitea harness; the other topologies above are validated in both layers.

Expand Down
Loading