Add cache observability and cold/warm canary guidance

## Summary

After the provider and Gateway cache controls are wired, downstream orchestrators need a cheap way to prove the cache layer is actually being used in edge-native deployments.

For coding-agent workloads, the useful metrics differ by cache layer:

- Workers AI prompt/prefix cache: cached input tokens and lower time-to-first-token on repeated prefixes with the same `x-session-affinity`.
- AI Gateway response cache: `cf-aig-cache-status` (`HIT` / `MISS`) and lower end-to-end latency on exact repeated requests.
- Factory/local response cache: adapter hit/miss and zero provider spend on hits.

Docs:

- https://developers.cloudflare.com/workers-ai/features/prompt-caching/
- https://developers.cloudflare.com/ai-gateway/features/caching/

## Proposed work

1. Extend observability hook payloads or response metadata so cache fields are easy to log without provider-specific parsing:
   - provider prompt cache: `cachedInputTokens`, `cacheReadInputTokens`, `cacheWriteInputTokens`
   - AI Gateway response cache: cache status (`HIT`, `MISS`, or absent)
   - factory response cache: local adapter hit/miss
2. Add a small test/canary recipe that sends a cold request and a warm repeated request using:
   - stable system prompt and tool definitions at the front
   - dynamic user content at the end
   - a stable `cache.sessionId`
   - optional Gateway `cacheKey`/`cacheTtl` for exact-response-cache cases
3. Update README guidance so downstream agent orchestrators know how to structure prompts for prefix caching:
   - static content first
   - avoid timestamps in system prompts
   - keep tool definitions stable
   - append dynamic/user-specific content later

## Acceptance criteria

- Consumers can log cache hit/miss behavior from the normalized response/hook surface without inspecting raw provider responses.
- A documented canary can verify cold/warm behavior for Workers AI prefix caching and AI Gateway response caching separately.
- README examples distinguish prefix-cache optimization from response-cache optimization.
- Tests cover metadata/hook behavior for cached responses and provider responses with cached token fields.

## Notes

This should be a proof/observability layer, not a production deploy task. Live Workers AI / AI Gateway verification can happen downstream in `aegis-daemon` or another Cloudflare Worker once the library exposes the needed surfaces.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cache observability and cold/warm canary guidance #83

Summary

Proposed work

Acceptance criteria

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add cache observability and cold/warm canary guidance #83

Description

Summary

Proposed work

Acceptance criteria

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions