fix(audit): install SequenceCounter before auth so auth_verify lands seq=1 (closes #174)#176
Merged
Merged
Conversation
…seq=1 (closes #174) Pre-fix: auth_verify / auth_fail audit events lacked the seq field because the SequenceCounter was installed downstream (in the runner's per-A2A-request setup) of the auth middleware. The auth chain's OnAuth callback used plain AuditLogger.Emit and couldn't reach a counter that didn't exist yet on req.Context(). Symptom (user-visible on v0.15.0): auth_verify (no seq) ← BUG session_start seq=1 guardrail_check seq=2 llm_call seq=3 ... The FWS-8 contract promises every event under a given correlation_id exposes a monotonically increasing seq for gap detection. auth_verify sitting outside the sequence breaks that. Fix: install the SequenceCounter on r.Context() BEFORE the auth chain runs, via a thin middleware wrapper around auth.Middleware in the runner. The auth callback now uses EmitFromContext(req.Context(), ...) and stamps seq=1 on auth_verify. The runner's per-A2A-request setup calls coreruntime.EnsureSequenceCounter (new helper) which detects the existing counter and reuses it — so session_start lands seq=2 and the sequence is gap-free. Three pieces: - forge-core/runtime/audit_schema.go gains EnsureSequenceCounter: returns ctx unchanged when a counter is present, installs a fresh one when absent. Used at every invocation-entry point that may run downstream of an upstream middleware. - forge-cli/runtime/sequence_counter_middleware.go (new) is the thin wrapper: composes (next) → authMW(next) → install counter → serve. Cost: ~24B counter allocation per request, even on skip paths — simpler than threading skip-path knowledge into the wrapper, and the alloc is in the same ballpark as the request struct. - forge-cli/runtime/runner.go: * AuthMiddleware: auth.Middleware(authCfg) → installSequenceCounterMiddleware(auth.Middleware(authCfg)) * makeAuthAuditCallback: Emit → EmitFromContext(req.Context()) * Three in-request WithSequenceCounter(ctx, new(...)) calls → EnsureSequenceCounter(ctx). Counter is reused from the auth wrapper (auth ON path) or installed fresh (--no-auth path). * Scheduler dispatcher's WithSequenceCounter call is left alone — it runs outside any HTTP request, no upstream counter. Three other plain-Emit sites stay as documented in PR #173: - source=proxy egress: subprocess HTTP CONNECT has no Go ctx - mcp_tool_conflict: startup-time, pre-invocation - schedule_fire / schedule_complete: scheduler tick, no A2A scope End-to-end smoke (--no-auth path): session_start seq=1 session_end seq=2 invocation_complete seq=3 End-to-end smoke (auth on, internal token): auth_verify seq=1 ← FIXED session_start seq=2 session_end seq=3 invocation_complete seq=4 Tests: - TestAuthAudit_SeqStampedWhenCounterInstalled — the #174 pin: with a counter on req.Context(), auth_verify lands seq=1 - TestAuthAudit_NoSeqWhenCounterAbsent — back-compat: no counter → omitempty drops the field (legacy embedders without the wrapper) - TestSequenceCounterMiddleware_InstallsCounterBeforeNext — the wrapper installs the counter before delegating - TestEnsureSequenceCounter_ReusesExisting — runner must not clobber the auth-installed counter - TestEnsureSequenceCounter_InstallsFresh — --no-auth path: install a fresh counter when none is present Schema impact: none. The events were already declared to carry seq.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #174. The `auth_verify` / `auth_fail` audit events were emitted without a `seq` field because the per-invocation `SequenceCounter` was installed downstream of the auth middleware. Operators saw `auth_verify` sitting outside the per-correlation_id sequence, breaking the FWS-8 gap-detection contract.
Symptom (user-visible, v0.15.0)
```
auth_verify (no seq) ← BUG
session_start seq=1
guardrail_check seq=2
llm_call seq=3
...
```
Fix
Install the `SequenceCounter` on `r.Context()` before the auth chain runs, via a thin middleware wrapper. The auth chain's `OnAuth` callback now uses `EmitFromContext(req.Context(), ...)` and stamps `seq=1` on `auth_verify`. The runner's per-A2A-request setup calls a new `coreruntime.EnsureSequenceCounter` helper that detects the existing counter and reuses it, so `session_start` lands `seq=2` and the per-correlation_id sequence is gap-free.
After
```
auth_verify seq=1 ← FIXED
session_start seq=2
guardrail_check seq=3
llm_call seq=4
...
```
Pieces
What stays as plain `Emit` (intentional, documented in PR #173)
After this PR, the only "in-scope-but-plain-Emit" pattern that motivated #175 (the lint) is gone. The three remaining sites are genuinely scope-less.
Cost
The wrapper allocates a `SequenceCounter` (~24 bytes) on every HTTP request, including auth-skipped paths (`/.well-known/agent-card.json`, `/healthz`). The counter is unused on those paths but allocation is in the same ballpark as the request struct itself; threading skip-path knowledge into the wrapper would be more complexity than the savings warrant.
End-to-end smoke
`--no-auth` path (wrapper still runs, but auth middleware is a passthrough):
```
session_start seq=1
session_end seq=2
invocation_complete seq=3
```
Auth on (internal-token loopback, like channel-adapter callbacks and CronJob trigger pods):
```
auth_verify seq=1
session_start seq=2
session_end seq=3
invocation_complete seq=4
```
Test plan
Schema impact
None. The events were already declared to carry `seq` — the field's `omitempty` JSON tag just dropped it when the value was zero. After the fix, consumers see the field they were already expecting on every per-invocation event; legacy consumers that ignored unknown keys see no behavioral change.
Follow-on
When this merges, the third row of the four-row "plain Emit is intentional" table in #175's lint allow-list disappears (auth callback becomes `EmitFromContext`). The three remaining sites — egress proxy `source: proxy`, `mcp_tool_conflict`, scheduler dispatcher — are genuinely scope-less.