Skip to content

feat: pki rotation commands + operator runbooks#121

Merged
pedromvgomes merged 4 commits into
mainfrom
feature/pki-rotation
Jun 15, 2026
Merged

feat: pki rotation commands + operator runbooks#121
pedromvgomes merged 4 commits into
mainfrom
feature/pki-rotation

Conversation

@pedromvgomes

Copy link
Copy Markdown
Contributor

Final slice of the PKI epic (#104, ADR-0024): rotation/recovery tooling + operator runbooks. Closes the lifecycle started by #105#109/#120.

CLI

  • inforge pki rotate <env> <name> — exactly one of:
    • --leaf — documents leaf renewal (short-TTL leaves rotate via inforge pki renew; expiry is revocation, no CRL/OCSP). Informational, no mutation.
    • --intermediate <scope> — re-mints one scope's intermediate from the cold root (offline, needs INFORGE_PKI_ROOT_KEY). Invisible to other regions via the mesh's regional boundary.
    • --rootdual-root overlap: mints a new cold root, retains the old one, and re-signs every intermediate from the new root with keys preserved (live leaves keep verifying). --root --finalize drops the old root once consumers have converged.
  • inforge pki recover-intermediate <env> <name> <scope> — compromise recovery: same fresh-key re-mint, but a no-overlap, act-now posture (immediate renew + forced host re-projection).

Custody split preserved: only the holder of INFORGE_PKI_ROOT_KEY can sign with the root; --root verifies the caller holds the current root before rotating. CI never gains root-signing ability.

internal/pki

  • ReSignIntermediate — re-issues an intermediate cert over its existing public key, signed by a new root (key preserved → existing leaves stay valid).
  • PKI gains PreviousRoots + PreviousIntermediates (overlap state, cleared by --finalize) and RootCerts() (the anchor set for root-anchoring consumers during overlap).

Design note

The merged #107 amendment made the mesh anchor on per-scope intermediate bundles, not the root. So the runbooks state the mechanics accurately:

  • re-minting an intermediate is invisible to other regions because of the regional boundary (not root-anchoring);
  • root rotation needs the dual-root overlap for root-anchoring consumers (the daemon fleet, cross-repo #610), not for the mesh itself.

Runbooks

New pages under website/docs/runbooks/ (add a region, rotate a leaf, rotate an intermediate, rotate the root, recover a compromised intermediate), linked from cli/pki.md and ADR-0024. The root runbook calls out the cross-repo coordination caveat: mesh-root rotation requires a daemon-fleet update before the old root is retired.

Acceptance

  • go build ./..., go test -race ./..., golangci-lint run ./..., gofmt — all clean.
  • Tests prove old + new roots both verify during the overlap (same live leaf, both chains).

Refs #110

Add the final PKI lifecycle slice (epic #104, ADR-0024): rotation/recovery
tooling and operator runbooks.

CLI:
- inforge pki rotate <env> <name> --leaf|--intermediate <scope>|--root
  - --leaf documents leaf renewal (short-TTL; expiry is revocation)
  - --intermediate re-mints one scope's intermediate from the cold root
    (offline); invisible to other regions via the regional boundary
  - --root runs a dual-root overlap: mint a new cold root, retain the old,
    re-sign every intermediate from the new root with keys preserved so live
    leaves keep verifying; --finalize drops the old root
- inforge pki recover-intermediate <env> <name> <scope>: compromise recovery
  (fresh-key re-mint + forced immediate host re-projection)

The cold-root custody split is preserved: intermediate/root ops require the
offline INFORGE_PKI_ROOT_KEY; --root verifies the caller holds the current
root before rotating.

internal/pki:
- ReSignIntermediate re-issues an intermediate cert over its existing public
  key signed by a new root (key preserved)
- PKI gains PreviousRoots + PreviousIntermediates (dual-root overlap state)
  and RootCerts() (active + retained roots)

Tests prove the overlap property: the same live leaf verifies under both the
old and the new root during the window.

Runbooks under website/docs/runbooks/ (add a region, rotate a leaf, rotate an
intermediate, rotate the root, recover a compromised intermediate), linked
from cli/pki.md and ADR-0024. The root runbook calls out that mesh-root
rotation is cross-repo coordination — root-anchoring consumers (the daemon
fleet) must trust both roots before the old one is retired.

Refs #110
- Guard intermediate rotation/recovery against an active root overlap:
  reissueIntermediate refuses to run while previousRoots is populated, since
  rolling the intermediate key mid-overlap would orphan the old-key leaves the
  dual-root overlap is meant to keep verifying.
- Custody-gate the --finalize path like the begin-overlap path: dropping the
  old root now requires INFORGE_PKI_ROOT_KEY, so CI or a stale checkout cannot
  retire the old root before consumers have migrated.
- Factor the shared CA-template into pki.intermediateTemplate (used by both
  GenerateIntermediate and ReSignIntermediate) so the intermediate's
  constraints can't drift across a root rotation.
- Factor the root-decrypt -> GenerateIntermediate -> encrypt-to-CI custody
  sequence into mintScopeIntermediate, shared by the first-mint and re-mint
  paths so the offline-identity decrypt, CN convention, and CI recipient live
  in one place.
- Runbooks: note the finalize custody gate and the mid-overlap block on
  intermediate rotation/recovery.

Refs #110
wrap-session: document slice #110's rotation/recovery surface in the Mesh PKI
section of AGENTS.md, and add two rules capturing the load-bearing invariants:
- cold-root custody gates both begin and finalize of a root rotation
- no intermediate re-mint while a root overlap is active

Refs #110
Second code-review round on the rotation slice:
- runPkiIntermediate (first-mint of a NEW scope) now refuses while a root
  overlap is active, matching the re-mint guard. Mid-overlap the cold root is
  the new root, so a first-mint would chain only to the new root — invisible to
  consumers still anchored on the old one — with no PreviousIntermediates
  counterpart. reissueIntermediate's own 'mint one with inforge pki
  intermediate' hint previously steered operators straight into this gap.
- Extract requireRootCustody: the begin-overlap and finalize paths shared a
  byte-identical offline-identity decrypt gate; factoring it prevents the two
  security checks from drifting apart.
- Broaden the no-remint-during-overlap rule to cover first-mint; note the block
  in the add-region runbook.

Refs #110
@pedromvgomes pedromvgomes merged commit fc5b8c4 into main Jun 15, 2026
2 checks passed
@pedromvgomes pedromvgomes deleted the feature/pki-rotation branch June 15, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant