Skip to content

Add cache observability and cold/warm canary guidance #83

@stackbilt-admin

Description

@stackbilt-admin

Summary

After the provider and Gateway cache controls are wired, downstream orchestrators need a cheap way to prove the cache layer is actually being used in edge-native deployments.

For coding-agent workloads, the useful metrics differ by cache layer:

  • Workers AI prompt/prefix cache: cached input tokens and lower time-to-first-token on repeated prefixes with the same x-session-affinity.
  • AI Gateway response cache: cf-aig-cache-status (HIT / MISS) and lower end-to-end latency on exact repeated requests.
  • Factory/local response cache: adapter hit/miss and zero provider spend on hits.

Docs:

Proposed work

  1. Extend observability hook payloads or response metadata so cache fields are easy to log without provider-specific parsing:
    • provider prompt cache: cachedInputTokens, cacheReadInputTokens, cacheWriteInputTokens
    • AI Gateway response cache: cache status (HIT, MISS, or absent)
    • factory response cache: local adapter hit/miss
  2. Add a small test/canary recipe that sends a cold request and a warm repeated request using:
    • stable system prompt and tool definitions at the front
    • dynamic user content at the end
    • a stable cache.sessionId
    • optional Gateway cacheKey/cacheTtl for exact-response-cache cases
  3. Update README guidance so downstream agent orchestrators know how to structure prompts for prefix caching:
    • static content first
    • avoid timestamps in system prompts
    • keep tool definitions stable
    • append dynamic/user-specific content later

Acceptance criteria

  • Consumers can log cache hit/miss behavior from the normalized response/hook surface without inspecting raw provider responses.
  • A documented canary can verify cold/warm behavior for Workers AI prefix caching and AI Gateway response caching separately.
  • README examples distinguish prefix-cache optimization from response-cache optimization.
  • Tests cover metadata/hook behavior for cached responses and provider responses with cached token fields.

Notes

This should be a proof/observability layer, not a production deploy task. Live Workers AI / AI Gateway verification can happen downstream in aegis-daemon or another Cloudflare Worker once the library exposes the needed surfaces.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2-mediumMedium priority improvementdocumentationImprovements or additions to documentationenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions