Summary
After the provider and Gateway cache controls are wired, downstream orchestrators need a cheap way to prove the cache layer is actually being used in edge-native deployments.
For coding-agent workloads, the useful metrics differ by cache layer:
- Workers AI prompt/prefix cache: cached input tokens and lower time-to-first-token on repeated prefixes with the same
x-session-affinity.
- AI Gateway response cache:
cf-aig-cache-status (HIT / MISS) and lower end-to-end latency on exact repeated requests.
- Factory/local response cache: adapter hit/miss and zero provider spend on hits.
Docs:
Proposed work
- Extend observability hook payloads or response metadata so cache fields are easy to log without provider-specific parsing:
- provider prompt cache:
cachedInputTokens, cacheReadInputTokens, cacheWriteInputTokens
- AI Gateway response cache: cache status (
HIT, MISS, or absent)
- factory response cache: local adapter hit/miss
- Add a small test/canary recipe that sends a cold request and a warm repeated request using:
- stable system prompt and tool definitions at the front
- dynamic user content at the end
- a stable
cache.sessionId
- optional Gateway
cacheKey/cacheTtl for exact-response-cache cases
- Update README guidance so downstream agent orchestrators know how to structure prompts for prefix caching:
- static content first
- avoid timestamps in system prompts
- keep tool definitions stable
- append dynamic/user-specific content later
Acceptance criteria
- Consumers can log cache hit/miss behavior from the normalized response/hook surface without inspecting raw provider responses.
- A documented canary can verify cold/warm behavior for Workers AI prefix caching and AI Gateway response caching separately.
- README examples distinguish prefix-cache optimization from response-cache optimization.
- Tests cover metadata/hook behavior for cached responses and provider responses with cached token fields.
Notes
This should be a proof/observability layer, not a production deploy task. Live Workers AI / AI Gateway verification can happen downstream in aegis-daemon or another Cloudflare Worker once the library exposes the needed surfaces.
Summary
After the provider and Gateway cache controls are wired, downstream orchestrators need a cheap way to prove the cache layer is actually being used in edge-native deployments.
For coding-agent workloads, the useful metrics differ by cache layer:
x-session-affinity.cf-aig-cache-status(HIT/MISS) and lower end-to-end latency on exact repeated requests.Docs:
Proposed work
cachedInputTokens,cacheReadInputTokens,cacheWriteInputTokensHIT,MISS, or absent)cache.sessionIdcacheKey/cacheTtlfor exact-response-cache casesAcceptance criteria
Notes
This should be a proof/observability layer, not a production deploy task. Live Workers AI / AI Gateway verification can happen downstream in
aegis-daemonor another Cloudflare Worker once the library exposes the needed surfaces.