Skip to content

Add provider-specific prompt caching support in owrap.ai.js (OpenAI + Anthropic)#1841

Merged
nmaguiar merged 7 commits into
masterfrom
copilot/improve-prompt-caching
May 22, 2026
Merged

Add provider-specific prompt caching support in owrap.ai.js (OpenAI + Anthropic)#1841
nmaguiar merged 7 commits into
masterfrom
copilot/improve-prompt-caching

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 21, 2026

This PR extends GPT provider handling in js/owrap.ai.js to properly surface prompt-cache token accounting and enable Anthropic prompt caching controls without changing default behavior. Gemini/Ollama behavior remains unchanged aside from documentation clarity.

  • OpenAI: capture cache-aware usage stats

    • Extended _captureStats to include nested usage fields when present:
      • usage.prompt_tokens_details.cached_tokenstokens.cached
      • usage.prompt_tokens_details.audio_tokenstokens.audio
      • usage.completion_tokens_details.reasoning_tokenstokens.reasoning
  • Anthropic: opt-in prompt caching support

    • Added aOptions.promptCaching (default false).
    • When enabled, both _request and _requestStream send:
      • anthropic-beta: prompt-caching-2024-07-31
    • Extended _captureStats with:
      • usage.cache_creation_input_tokenstokens.cacheCreation
      • usage.cache_read_input_tokenstokens.cacheRead
  • Anthropic: cache boundary hints in payload

    • When promptCaching is enabled:
      • system is emitted as content blocks with cache_control: { type: "ephemeral" }.
      • The last cacheable user message/block is marked with cache_control: { type: "ephemeral" } (while avoiding tool_result-only blocks).
  • ODoc updates (ow.ai.gpt + $gpt)

    • Documented promptCaching option for Anthropic.
    • Documented that getLastStats() now includes OpenAI cached-token accounting and Anthropic cache read/creation token counters.
    • Added note that Gemini may implicitly cache large system instructions.
  • Focused AI tests

    • Added tests for:
      • OpenAI cached/audio/reasoning token capture.
      • Anthropic prompt-caching beta header behavior.
      • Anthropic cache_control payload shaping and cache stats extraction.
if (isMap(aResponse.usage.prompt_tokens_details)) {
  if (isDef(aResponse.usage.prompt_tokens_details.cached_tokens))
    tokens.cached = aResponse.usage.prompt_tokens_details.cached_tokens;
}
if (isMap(aResponse.usage.completion_tokens_details)) {
  if (isDef(aResponse.usage.completion_tokens_details.reasoning_tokens))
    tokens.reasoning = aResponse.usage.completion_tokens_details.reasoning_tokens;
}
Original prompt

Overview

Improve js/owrap.ai.js to handle prompt caching for each of the GPT provider implementations (openai, gemini, anthropic, ollama).


Changes Required

1. OpenAI — Capture cached token stats

OpenAI returns prompt_tokens_details.cached_tokens in the usage response for models that support prompt caching (e.g. gpt-4o). Update _captureStats in the openai implementation to capture this:

if (isMap(aResponse.usage.prompt_tokens_details)) {
    if (isDef(aResponse.usage.prompt_tokens_details.cached_tokens))
        tokens.cached = aResponse.usage.prompt_tokens_details.cached_tokens
    if (isDef(aResponse.usage.prompt_tokens_details.audio_tokens))
        tokens.audio = aResponse.usage.prompt_tokens_details.audio_tokens
}

Also capture completion_tokens_details if present (e.g. reasoning_tokens for o1/o3 models):

if (isMap(aResponse.usage.completion_tokens_details)) {
    if (isDef(aResponse.usage.completion_tokens_details.reasoning_tokens))
        tokens.reasoning = aResponse.usage.completion_tokens_details.reasoning_tokens
}

2. Anthropic — Full prompt caching support

Anthropic's prompt caching requires opt-in via a beta header and explicit cache_control markers on content blocks.

a) New option: aOptions.promptCaching (boolean, default false)

When true:

  • Add the header "anthropic-beta": "prompt-caching-2024-07-31" to both _request and _requestStream.

b) Capture cache stats in _captureStats

if (isMap(aResponse.usage)) {
    // existing...
    if (isDef(aResponse.usage.cache_creation_input_tokens))
        tokens.cacheCreation = aResponse.usage.cache_creation_input_tokens
    if (isDef(aResponse.usage.cache_read_input_tokens))
        tokens.cacheRead = aResponse.usage.cache_read_input_tokens
}

c) Allow cache_control on system prompt

When promptCaching is enabled and the system field is set, wrap the system content as a content block array with cache_control on the last block:

// Instead of:
body.system = _systemText

// When promptCaching is enabled:
body.system = [{ type: "text", text: _systemText, cache_control: { type: "ephemeral" } }]

d) Allow cache_control on messages

When promptCaching is enabled, mark the last user message (or the last large content block) with cache_control: { type: "ephemeral" } to hint to Anthropic where the cache boundary should be.


3. Gemini — Already captures cachedContentTokenCount

No changes required for stats capture. Optionally, add a note in the addSystemPrompt ODoc that Gemini performs implicit caching on large system instructions.


4. Ollama — No changes needed

Ollama uses local KV cache implicitly. No API-level prompt caching changes are needed.


5. Update ODoc comments

Update the ow.ai.gpt(aType, aOptions) ODoc to document the new promptCaching option for Anthropic and note that OpenAI cached token stats are now captured in getLastStats().

Update the $gpt ODoc similarly.


Files to Modify

  • js/owrap.ai.js

Notes

  • All changes must be backward compatible — promptCaching defaults to false.
  • The tokens.cached field in getLastStats() for OpenAI and tokens.cacheRead / tokens.cacheCreation for Anthropic allow callers to track cache efficiency.
  • Anthropic's cache_control support should only be applied when aOptions.promptCaching === true to avoid sending unsupported fields to older API versions or incompatible models.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Could owrap.ai.js be improved for each of the gpt implantations to handle prompt caching in the different providers?

Assistant: Let me start by finding and reading the owrap.ai.js file in the repository.

Assistant: Here's a detailed analysis of how prompt caching could be improved for each provider in js/owrap.ai.js:


Current State

✅ Gemini — Partially handled

Gemini already captures cachedContentTokenCount in _captureStats (line 875):

if (isDef(aResponse.usageMetadata.cachedContentTokenCount)) tokens.cached = aResponse.usageMetadata.cachedContentTokenCount

However, explicit Context Cache creation (Gemini's cachedContents API for large system prompts) is not supported. Users must rely on implicit caching only.


❌ OpenAI — Not captured

OpenAI returns prompt_tokens_details.cached_tokens when prompt caching kicks in (for models that support it, e.g. gpt-4o), but _captureStats only reads the top-level usage fields. The fix would be:

// In openai _captureStats
if (isMap(aResponse.usage.prompt_tokens_details)) {
    if (isDef(aResponse.usage.prompt_tokens_details.cached_tokens))
        tokens.cached = aResponse.usage.prompt_...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

Copilot AI and others added 4 commits May 21, 2026 13:55
Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40

Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>
Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40

Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>
Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40

Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve prompt caching for GPT provider implementations Add provider-specific prompt caching support in owrap.ai.js (OpenAI + Anthropic) May 21, 2026
Copilot AI requested a review from nmaguiar May 21, 2026 14:04
@nmaguiar nmaguiar marked this pull request as ready for review May 22, 2026 01:08
@nmaguiar nmaguiar requested a review from Copilot May 22, 2026 01:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends owrap.ai.js GPT provider implementations to expose provider-specific prompt-caching token accounting (OpenAI + Anthropic) and adds opt-in Anthropic prompt caching controls, alongside new focused regression tests and small ODoc updates.

Changes:

  • OpenAI: _captureStats now captures cache/audio/reasoning token details from nested usage.*_tokens_details fields.
  • Anthropic: introduces promptCaching option to send the beta header and inject cache_control markers; _captureStats now captures cache read/creation token counters.
  • Adds automated tests covering OpenAI stats capture and Anthropic prompt-caching header/body/stat behavior; updates ODoc to document the new option and stats fields.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
js/owrap.ai.js Adds OpenAI nested usage capture; Anthropic opt-in prompt caching header + payload shaping; documents promptCaching and expanded stats.
tests/autoTestAll.AI.js Adds unit tests validating OpenAI cached/audio/reasoning token capture and Anthropic prompt-caching header/body/stat handling.
tests/autoTestAll.AI.yaml Wires new AI tests into the test job list and TODO list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/autoTestAll.AI.js Outdated
Comment thread js/owrap.ai.js Outdated
nmaguiar and others added 2 commits May 22, 2026 02:18
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@nmaguiar nmaguiar merged commit 9d61ffc into master May 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants