Add provider-specific prompt caching support in `owrap.ai.js` (OpenAI + Anthropic) by Copilot · Pull Request #1841 · OpenAF/openaf

Copilot · 2026-05-21T13:42:50Z

This PR extends GPT provider handling in js/owrap.ai.js to properly surface prompt-cache token accounting and enable Anthropic prompt caching controls without changing default behavior. Gemini/Ollama behavior remains unchanged aside from documentation clarity.

OpenAI: capture cache-aware usage stats
- Extended _captureStats to include nested usage fields when present:
  - usage.prompt_tokens_details.cached_tokens → tokens.cached
  - usage.prompt_tokens_details.audio_tokens → tokens.audio
  - usage.completion_tokens_details.reasoning_tokens → tokens.reasoning
Anthropic: opt-in prompt caching support
- Added aOptions.promptCaching (default false).
- When enabled, both _request and _requestStream send:
  - anthropic-beta: prompt-caching-2024-07-31
- Extended _captureStats with:
  - usage.cache_creation_input_tokens → tokens.cacheCreation
  - usage.cache_read_input_tokens → tokens.cacheRead
Anthropic: cache boundary hints in payload
- When promptCaching is enabled:
  - system is emitted as content blocks with cache_control: { type: "ephemeral" }.
  - The last cacheable user message/block is marked with cache_control: { type: "ephemeral" } (while avoiding tool_result-only blocks).
ODoc updates (ow.ai.gpt + $gpt)
- Documented promptCaching option for Anthropic.
- Documented that getLastStats() now includes OpenAI cached-token accounting and Anthropic cache read/creation token counters.
- Added note that Gemini may implicitly cache large system instructions.
Focused AI tests
- Added tests for:
  - OpenAI cached/audio/reasoning token capture.
  - Anthropic prompt-caching beta header behavior.
  - Anthropic cache_control payload shaping and cache stats extraction.

if (isMap(aResponse.usage.prompt_tokens_details)) {
  if (isDef(aResponse.usage.prompt_tokens_details.cached_tokens))
    tokens.cached = aResponse.usage.prompt_tokens_details.cached_tokens;
}
if (isMap(aResponse.usage.completion_tokens_details)) {
  if (isDef(aResponse.usage.completion_tokens_details.reasoning_tokens))
    tokens.reasoning = aResponse.usage.completion_tokens_details.reasoning_tokens;
}

Original prompt

Overview

Improve js/owrap.ai.js to handle prompt caching for each of the GPT provider implementations (openai, gemini, anthropic, ollama).

Changes Required

1. OpenAI — Capture cached token stats

OpenAI returns prompt_tokens_details.cached_tokens in the usage response for models that support prompt caching (e.g. gpt-4o). Update _captureStats in the openai implementation to capture this:

if (isMap(aResponse.usage.prompt_tokens_details)) {
    if (isDef(aResponse.usage.prompt_tokens_details.cached_tokens))
        tokens.cached = aResponse.usage.prompt_tokens_details.cached_tokens
    if (isDef(aResponse.usage.prompt_tokens_details.audio_tokens))
        tokens.audio = aResponse.usage.prompt_tokens_details.audio_tokens
}

Also capture completion_tokens_details if present (e.g. reasoning_tokens for o1/o3 models):

if (isMap(aResponse.usage.completion_tokens_details)) {
    if (isDef(aResponse.usage.completion_tokens_details.reasoning_tokens))
        tokens.reasoning = aResponse.usage.completion_tokens_details.reasoning_tokens
}

2. Anthropic — Full prompt caching support

Anthropic's prompt caching requires opt-in via a beta header and explicit cache_control markers on content blocks.

a) New option: `aOptions.promptCaching` (boolean, default `false`)

When true:

Add the header "anthropic-beta": "prompt-caching-2024-07-31" to both _request and _requestStream.

b) Capture cache stats in `_captureStats`

if (isMap(aResponse.usage)) {
    // existing...
    if (isDef(aResponse.usage.cache_creation_input_tokens))
        tokens.cacheCreation = aResponse.usage.cache_creation_input_tokens
    if (isDef(aResponse.usage.cache_read_input_tokens))
        tokens.cacheRead = aResponse.usage.cache_read_input_tokens
}

c) Allow `cache_control` on system prompt

When promptCaching is enabled and the system field is set, wrap the system content as a content block array with cache_control on the last block:

// Instead of:
body.system = _systemText

// When promptCaching is enabled:
body.system = [{ type: "text", text: _systemText, cache_control: { type: "ephemeral" } }]

d) Allow `cache_control` on messages

When promptCaching is enabled, mark the last user message (or the last large content block) with cache_control: { type: "ephemeral" } to hint to Anthropic where the cache boundary should be.

3. Gemini — Already captures `cachedContentTokenCount` ✅

No changes required for stats capture. Optionally, add a note in the addSystemPrompt ODoc that Gemini performs implicit caching on large system instructions.

4. Ollama — No changes needed

Ollama uses local KV cache implicitly. No API-level prompt caching changes are needed.

5. Update ODoc comments

Update the ow.ai.gpt(aType, aOptions) ODoc to document the new promptCaching option for Anthropic and note that OpenAI cached token stats are now captured in getLastStats().

Update the $gpt ODoc similarly.

Files to Modify

js/owrap.ai.js

Notes

All changes must be backward compatible — promptCaching defaults to false.
The tokens.cached field in getLastStats() for OpenAI and tokens.cacheRead / tokens.cacheCreation for Anthropic allow callers to track cache efficiency.
Anthropic's cache_control support should only be applied when aOptions.promptCaching === true to avoid sending unsupported fields to older API versions or incompatible models.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Could owrap.ai.js be improved for each of the gpt implantations to handle prompt caching in the different providers?

Assistant: Let me start by finding and reading the owrap.ai.js file in the repository.

Assistant: Here's a detailed analysis of how prompt caching could be improved for each provider in js/owrap.ai.js:

Current State

✅ Gemini — Partially handled

Gemini already captures cachedContentTokenCount in _captureStats (line 875):

if (isDef(aResponse.usageMetadata.cachedContentTokenCount)) tokens.cached = aResponse.usageMetadata.cachedContentTokenCount

However, explicit Context Cache creation (Gemini's cachedContents API for large system prompts) is not supported. Users must rely on implicit caching only.

❌ OpenAI — Not captured

OpenAI returns prompt_tokens_details.cached_tokens when prompt caching kicks in (for models that support it, e.g. gpt-4o), but _captureStats only reads the top-level usage fields. The fix would be:

// In openai _captureStats
if (isMap(aResponse.usage.prompt_tokens_details)) {
    if (isDef(aResponse.usage.prompt_tokens_details.cached_tokens))
        tokens.cached = aResponse.usage.prompt_...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40 Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>

Copilot

Pull request overview

Extends owrap.ai.js GPT provider implementations to expose provider-specific prompt-caching token accounting (OpenAI + Anthropic) and adds opt-in Anthropic prompt caching controls, alongside new focused regression tests and small ODoc updates.

Changes:

OpenAI: _captureStats now captures cache/audio/reasoning token details from nested usage.*_tokens_details fields.
Anthropic: introduces promptCaching option to send the beta header and inject cache_control markers; _captureStats now captures cache read/creation token counters.
Adds automated tests covering OpenAI stats capture and Anthropic prompt-caching header/body/stat behavior; updates ODoc to document the new option and stats fields.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`js/owrap.ai.js`	Adds OpenAI nested usage capture; Anthropic opt-in prompt caching header + payload shaping; documents `promptCaching` and expanded stats.
`tests/autoTestAll.AI.js`	Adds unit tests validating OpenAI cached/audio/reasoning token capture and Anthropic prompt-caching header/body/stat handling.
`tests/autoTestAll.AI.yaml`	Wires new AI tests into the test job list and TODO list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Initial plan

c366a96

Copilot AI assigned Copilot and nmaguiar May 21, 2026

Copilot started work on behalf of nmaguiar May 21, 2026 13:42 View session

Copilot AI and others added 4 commits May 21, 2026 13:55

Add GPT prompt caching stats and Anthropic promptCaching support

4a2b1d1

Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40 Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>

Refine prompt caching docs and strict checks

2d402f5

Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40 Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>

Simplify Anthropic cache control conditions

25e89e9

Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40 Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>

Clarify Anthropic JSON steering text usage

0eaf142

Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40 Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Improve prompt caching for GPT provider implementations~~ Add provider-specific prompt caching support in owrap.ai.js (OpenAI + Anthropic) May 21, 2026

Copilot finished work on behalf of nmaguiar May 21, 2026 14:04

Copilot AI requested a review from nmaguiar May 21, 2026 14:04

nmaguiar marked this pull request as ready for review May 22, 2026 01:08

nmaguiar requested a review from Copilot May 22, 2026 01:08

Copilot started reviewing on behalf of nmaguiar May 22, 2026 01:08 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread tests/autoTestAll.AI.js Outdated

Comment thread js/owrap.ai.js Outdated

nmaguiar and others added 2 commits May 22, 2026 02:18

Potential fix for pull request finding

13b9199

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

edf5ac5

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

nmaguiar merged commit 9d61ffc into master May 22, 2026
1 check passed

nmaguiar mentioned this pull request May 22, 2026

Revert "Add provider-specific prompt caching support in owrap.ai.js (OpenAI + Anthropic)" #1843

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add provider-specific prompt caching support in `owrap.ai.js` (OpenAI + Anthropic)#1841

Add provider-specific prompt caching support in `owrap.ai.js` (OpenAI + Anthropic)#1841
nmaguiar merged 7 commits into
masterfrom
copilot/improve-prompt-caching

Copilot AI commented May 21, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes Required

1. OpenAI — Capture cached token stats

2. Anthropic — Full prompt caching support

a) New option: aOptions.promptCaching (boolean, default false)

b) Capture cache stats in _captureStats

c) Allow cache_control on system prompt

d) Allow cache_control on messages

3. Gemini — Already captures cachedContentTokenCount ✅

4. Ollama — No changes needed

5. Update ODoc comments

Files to Modify

Notes

Current State

✅ Gemini — Partially handled

❌ OpenAI — Not captured

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 21, 2026 •

edited

Loading

a) New option: `aOptions.promptCaching` (boolean, default `false`)

b) Capture cache stats in `_captureStats`

c) Allow `cache_control` on system prompt

d) Allow `cache_control` on messages

3. Gemini — Already captures `cachedContentTokenCount` ✅