DRAFT: feat(observability): cache hit/write rate + non-caching-model warning by juanmichelini · Pull Request #3566 · OpenHands/software-agent-sdk

juanmichelini · 2026-06-08T06:38:01Z

Why

On an eval run with a 550B Nemotron (no prompt-cache support), ~95% of the bill was repeated prompt tokens — the SDK already knew the model wasn't in PROMPT_CACHE_MODELS (via LLM.is_caching_prompt_active()), but nothing surfaced it. Operators only found out from the invoice.

What

Two complementary, additive improvements:

1. `TokenUsage.cache_hit_rate` / `cache_write_rate`

Derived properties returning float | None:

cache_hit_rate = cache_read_tokens / prompt_tokens (clamped to [0, 1])
cache_write_rate = cache_write_tokens / prompt_tokens (clamped to [0, 1])
Both return None when no prompt tokens have been recorded so callers can distinguish "no data yet" from "0% hit rate".

The clamp matters: some providers double-count cache reads outside prompt_tokens. Without the clamp, dashboards see hit rates >100%.

2. One-time non-caching-model warning at conversation construction

LocalConversation.__init__ now fires a logger.warning once when agent.llm.is_caching_prompt_active() == False, naming the model so log aggregation can group by it. Wrapped in try/except so a logging failure can never break conversation construction.

Tests

tests/sdk/llm/test_token_usage_cache_rates.py — 9 tests including the parametric hit-rate sweep and the >100% clamp.
tests/sdk/conversation/local/test_conversation_prompt_cache_warning.py — 4 tests covering the warning firing for a non-cached model, staying silent for a cached one, firing for a cached model with caching explicitly disabled, and the once-per-conversation contract.

All 14 new tests pass; existing test_llm_metrics.py (51 tests) regression-clean.

Risk / blast radius

Two new read-only properties on TokenUsage — no schema change, no existing call sites affected.
One additional WARN log per LocalConversation() construction (only when caching is inactive). No structured event yet; can be promoted to a ConversationErrorEvent separately if there's appetite.

This PR was created by an AI agent (OpenHands) on behalf of the user investigating low throughput in eval run swebench/litellm_proxy-nemotron-3-ultra-550b-a55b-or-paid/27052068762.

@juanmichelini can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:46a5bff-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-46a5bff-python \
  ghcr.io/openhands/agent-server:46a5bff-python

All tags pushed for this build

ghcr.io/openhands/agent-server:46a5bff-golang-amd64
ghcr.io/openhands/agent-server:46a5bffbb7ede0dc4acb3c8ed24ba62063e8ab28-golang-amd64
ghcr.io/openhands/agent-server:openhands-sdk-3-prompt-cache-observability-golang-amd64
ghcr.io/openhands/agent-server:46a5bff-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:46a5bff-golang-arm64
ghcr.io/openhands/agent-server:46a5bffbb7ede0dc4acb3c8ed24ba62063e8ab28-golang-arm64
ghcr.io/openhands/agent-server:openhands-sdk-3-prompt-cache-observability-golang-arm64
ghcr.io/openhands/agent-server:46a5bff-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:46a5bff-java-amd64
ghcr.io/openhands/agent-server:46a5bffbb7ede0dc4acb3c8ed24ba62063e8ab28-java-amd64
ghcr.io/openhands/agent-server:openhands-sdk-3-prompt-cache-observability-java-amd64
ghcr.io/openhands/agent-server:46a5bff-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:46a5bff-java-arm64
ghcr.io/openhands/agent-server:46a5bffbb7ede0dc4acb3c8ed24ba62063e8ab28-java-arm64
ghcr.io/openhands/agent-server:openhands-sdk-3-prompt-cache-observability-java-arm64
ghcr.io/openhands/agent-server:46a5bff-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:46a5bff-python-amd64
ghcr.io/openhands/agent-server:46a5bffbb7ede0dc4acb3c8ed24ba62063e8ab28-python-amd64
ghcr.io/openhands/agent-server:openhands-sdk-3-prompt-cache-observability-python-amd64
ghcr.io/openhands/agent-server:46a5bff-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:46a5bff-python-arm64
ghcr.io/openhands/agent-server:46a5bffbb7ede0dc4acb3c8ed24ba62063e8ab28-python-arm64
ghcr.io/openhands/agent-server:openhands-sdk-3-prompt-cache-observability-python-arm64
ghcr.io/openhands/agent-server:46a5bff-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:46a5bff-golang
ghcr.io/openhands/agent-server:46a5bffbb7ede0dc4acb3c8ed24ba62063e8ab28-golang
ghcr.io/openhands/agent-server:openhands-sdk-3-prompt-cache-observability-golang
ghcr.io/openhands/agent-server:46a5bff-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:46a5bff-java
ghcr.io/openhands/agent-server:46a5bffbb7ede0dc4acb3c8ed24ba62063e8ab28-java
ghcr.io/openhands/agent-server:openhands-sdk-3-prompt-cache-observability-java
ghcr.io/openhands/agent-server:46a5bff-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:46a5bff-python
ghcr.io/openhands/agent-server:46a5bffbb7ede0dc4acb3c8ed24ba62063e8ab28-python
ghcr.io/openhands/agent-server:openhands-sdk-3-prompt-cache-observability-python
ghcr.io/openhands/agent-server:46a5bff-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., 46a5bff-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 46a5bff-python-amd64) are also available if needed

Surfaces 'this run will pay full prompt-token cost on every step' as a WARN at conversation construction, and exposes hit/write rate as first-class derived metrics so dashboards and eval harnesses can flag non-cached runs without per-provider arithmetic. Changes: * TokenUsage.cache_hit_rate / cache_write_rate properties returning float in [0, 1] or None when no prompt tokens were recorded. Defensive min(1.0, ...) clamp keeps dashboards sane when providers double-count cache reads (some bill them inside prompt_tokens, some don't). * LocalConversation._warn_if_prompt_cache_inactive() fires once at __init__ when llm.is_caching_prompt_active() is False, naming the model so log aggregation can group by it. Wrapped in try/except so a logging failure can never break conversation construction. Motivated by an eval run on a 550B model with no prompt-cache support where 95%+ of the bill was repeated prompt tokens — the SDK already knew the model didn't support caching but nothing surfaced it. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-08T06:38:31Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-08T06:38:39Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-08T06:40:39Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/conversation/impl
local_conversation.py	733	57	92%	92, 413–414, 435, 440, 585, 631, 700, 716, 792, 1069–1070, 1147–1148, 1151, 1271, 1274–1275, 1299, 1332–1333, 1336, 1342, 1423, 1426, 1430–1431, 1435–1436, 1439, 1446, 1471, 1475, 1478, 1497, 1549, 1552, 1591, 1599, 1603–1605, 1612, 1724, 1729, 1839, 1841, 1845–1846, 1857–1858, 1883, 2078, 2082, 2152, 2159–2160
openhands-sdk/openhands/sdk/llm/utils
metrics.py	122	3	97%	17, 149, 343
TOTAL	30203	8453	72%

all-hands-bot · 2026-06-08T14:42:40Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

Code Review: PR #3566 — DRAFT: feat(observability): cache hit/write rate + non-caching-model warning

Taste Rating

🟢 Good taste — Elegant, minimal additions that solve a real operational problem.

Summary

Two clean, additive improvements to the SDK's observability surface:

TokenUsage.cache_hit_rate / cache_write_rate — Derived properties that expose cache efficiency as normalized floats [0, 1], clamping to 1.0 to handle provider quirks. The float | None return type correctly distinguishes "no data yet" from "0% hit rate".
LocalConversation non-caching warning — One-time logger.warning at conversation construction when is_caching_prompt_active() == False. The try/except guard is the right defensive posture — a logging failure should never break conversation construction.

The PR description is exemplary: it names the real incident (Nemotron 550B eval run, 95% repeated prompt tokens), explains the blast radius clearly, and documents the design rationale for the clamp. This is how PRs should be written.

Inline Comments

Priority	File	Line	Comment
🟡 Suggestion	`local_conversation.py`	~766-796	The 20-line docstring on `_warn_if_prompt_cache_inactive()` is more PR-description than implementation doc. Consider trimming to 3-4 lines explaining what the method does; the why belongs in the PR body or a single-line comment at the call site.

What Was Done Well

metrics.py: Clean property implementations. No branching beyond the guard clause. min(1.0, ...) clamp is the right call — it future-proofs against any provider that double-counts cache reads.
Warning fires for caching=False on a supported model — Correct behavior. If the user explicitly disabled caching, they're making a trade-off and should be reminded loudly.
Tests: Edge-case coverage is solid — zero prompt tokens, >100% clamp, parametric sweep, once-per-conversation contract.
No breaking changes: Read-only properties on an existing model, no schema changes.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟢 LOW
No breaking changes. Read-only properties. No new dependencies. The one new log path is guarded by try/except. The PR author correctly notes it can be promoted to a ConversationErrorEvent later if desired.

VERDICT:
✅ Worth merging — Core logic is sound, tests are thorough, and the PR solves a real observability gap.

KEY INSIGHT:
The min(1.0, ...) clamp is a pragmatic guard against provider inconsistency that makes dashboards safe without adding complexity.

Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.

Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Was this review helpful? React with 👍 or 👎 to give feedback.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

all-hands-bot · 2026-06-08T14:44:37Z

@@ -763,6 +764,38 @@ def _pin_prompt_cache_key(self) -> None:
        if self.agent.llm._prompt_cache_key is None:
            self.agent.llm._prompt_cache_key = str(self._state.id)



🟡 Suggestion: The 20-line docstring on _warn_if_prompt_cache_inactive() is more PR-description than implementation doc. Consider trimming to 3-4 lines explaining what the method does; the why belongs in the PR body or a single-line comment at the call site.

all-hands-bot

⚠️ QA Report: PASS WITH ISSUES

Functional verification passed: the SDK now exposes cache hit/write rates and emits the promised inactive prompt-cache warning, but the PR currently has a failing CI pre-commit check.

Does this PR achieve its stated goal?

Yes. Exercising the SDK as a user would showed that TokenUsage now reports None when there are no prompt tokens, computes 0.75 / 0.2 for normal cache read/write samples, and clamps provider over-counting to 1.0. Constructing real Conversation objects also now emits warnings only when prompt caching is inactive: once for the non-caching Nemotron model and once when caching is explicitly disabled on a supported Claude model, with no warning for the supported active-cache case.

Phase	Result
Environment Setup	✅ `make build` completed successfully with `uv sync --dev` and local packages installed
CI Status	⚠️ Current PR checks show `pre-commit` failing and `qa-changes` in progress; other SDK/package checks shown by `gh pr checks` were passing or skipped
Functional Verification	✅ Before/after SDK scripts confirmed both claimed observability behaviors

Functional Verification

Test 1: `TokenUsage.cache_hit_rate` / `cache_write_rate`

Step 1 — Establish baseline without the fix:
Checked out origin/main and ran uv run python /tmp/qa_prompt_cache_observability.py against real SDK imports. Relevant output:

{
  "rates": {
    "no_prompt": {
      "cache_hit_rate": "MISSING: AttributeError",
      "cache_write_rate": "MISSING: AttributeError"
    },
    "overcount": {
      "cache_hit_rate": "MISSING: AttributeError",
      "cache_write_rate": "MISSING: AttributeError"
    },
    "partial": {
      "cache_hit_rate": "MISSING: AttributeError",
      "cache_write_rate": "MISSING: AttributeError"
    }
  },
  "warnings": []
}

This confirms the baseline SDK had no user-accessible cache-rate properties.

Step 2 — Apply the PR changes:
Checked out cddcca7e8e3ffbd5cfaefff363fb13b63de38b19.

Step 3 — Re-run with the fix in place:
Ran the same uv run python /tmp/qa_prompt_cache_observability.py. Relevant output:

{
  "rates": {
    "no_prompt": {
      "cache_hit_rate": null,
      "cache_write_rate": null
    },
    "overcount": {
      "cache_hit_rate": 1.0,
      "cache_write_rate": 1.0
    },
    "partial": {
      "cache_hit_rate": 0.75,
      "cache_write_rate": 0.2
    }
  }
}

This confirms the new properties behave as described, including the no-data None case and >100% clamp.

Test 2: Inactive prompt-cache warning at conversation construction

Step 1 — Establish baseline without the fix:
On origin/main, the same script constructed real Conversation objects for:

nemotron-3-ultra-550b
claude-sonnet-4-5
claude-sonnet-4-5 with caching_prompt=False

Relevant output:

{
  "warnings": []
}

This confirms the old behavior did not surface inactive prompt caching at construction time.

Step 2 — Apply the PR changes:
Checked out cddcca7e8e3ffbd5cfaefff363fb13b63de38b19.

Step 3 — Re-run with the fix in place:
Ran the same script. Relevant output:

{
  "warnings": [
    "Prompt caching is not active for model 'nemotron-3-ultra-550b' on this conversation. Long agent loops will pay full prompt-token cost on every step. Consider tightening `max_iteration_per_run` or switching to a model with prompt-cache support for long-running tasks.",
    "Prompt caching is not active for model 'claude-sonnet-4-5' on this conversation. Long agent loops will pay full prompt-token cost on every step. Consider tightening `max_iteration_per_run` or switching to a model with prompt-cache support for long-running tasks."
  ]
}

This confirms the warning appears for an unsupported model and for explicitly disabled caching, while the supported active-cache Claude construction did not add a warning.

Test 3: Warning logging failure does not block construction

On the PR commit, I attached a logging handler that raises on WARNING and constructed a non-caching Nemotron conversation via uv run python /tmp/qa_prompt_cache_logging_failure.py. Relevant output:

conversation_created_despite_logging_failure

This confirms conversation construction still succeeds even if the warning path encounters a logging backend failure.

Issues Found

🟠 Issue: Current GitHub checks show pre-commit failing and qa-changes still in progress. I did not rerun linters/pre-commit locally per the QA instructions, but this CI status should be resolved before merge.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot · 2026-06-09T00:29:33Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

Code Review: cache hit/write rate + non-caching-model warning

Taste Rating: 🟢 Good taste — elegant, simple solution

Summary

This PR adds observability tooling for prompt caching: cache hit/write rate derived properties on TokenUsage, and a one-time warning at Conversation construction when the primary LLM doesn't support prompt caching.

Both additions are clean, well-scoped, and come with good test coverage. No blocking issues found.

Evaluation Checklist

Category	Status	Notes
Data structures	✅	`TokenUsage` properties are stateless derived values — no mutation risk, no ownership complexity.
Breaking changes	✅	No public API changes. `TokenUsage.cache_hit_rate` and `cache_write_rate` are new properties, existing code is unaffected.
Complexity	✅	Single-purpose methods, clear control flow, no nesting > 2 levels.
Pragmatism	✅	Addresses a real cost visibility problem. The warning in particular solves something teams discover too late (on their invoice).
Testing	✅	Both test files cover happy paths, edge cases, and idempotency. The cache-rate tests use parametric cases which is good.
Defensive coding	✅	`_warn_if_prompt_cache_inactive` has a broad but appropriate `try/except` guard that never propagates.
Documentation	✅	Docstrings are thorough and explain the "why" (e.g., why clamp, when `None` is returned).

Minor Suggestions (non-blocking)

local_conversation.py line 789: The except Exception in _warn_if_prompt_cache_inactive catches everything. Consider documenting why the broad catch is intentional (e.g., "never fail init on a log"). The exc_info=True in the debug log is good. |
metrics.py line 94: The min(1.0, ...) clamping is correct. Consider a brief inline comment explaining the defensive clamp (some providers double-count cache reads inside prompt_tokens), though the docstring and tests already cover this. |

Risk Assessment

⚠️ Overall PR Risk: 🟢 LOW

No breaking changes, no new external dependencies, no security-sensitive code paths. The additions are purely additive observability features with their own tests.

Verdict

✅ Worth merging: Core logic is sound, tests are solid, no action required.

Key insight: The cache rate properties use a clean stateless derivation pattern that composes well — TokenUsage objects can be added together and the rates recompute correctly from accumulated totals, as the test suite verifies.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

all-hands-bot · 2026-06-09T00:30:55Z

+    @property
+    def cache_write_rate(self) -> float | None:
+        """Fraction of prompt tokens that produced a new cache write.
+


🟡 Suggestion: The min(1.0, ...) clamping is correct but worth a brief inline comment explaining why it's defensive (some providers double-count cache reads inside prompt_tokens). The test covers it, so the docstring could also note this edge case briefly.

all-hands-bot · 2026-06-09T00:30:55Z

+                "Prompt caching is not active for model %r on this "
+                "conversation. Long agent loops will pay full prompt-token "
+                "cost on every step. Consider tightening `max_iteration_per_run` "
+                "or switching to a model with prompt-cache support for "


🟡 Suggestion: Consider documenting why the broad except Exception is intentional (e.g., "never fail init on a log"). The exc_info=True in the debug branch is good practice.

all-hands-bot

⚠️ QA Report: PASS WITH ISSUES

Functional SDK verification passed for the new cache-rate observability and prompt-cache inactive warning, but the PR is not currently CI-green.

Does this PR achieve its stated goal?

Yes. As a real SDK user, I imported and used TokenUsage and constructed Conversation objects with supported and unsupported LLM configurations. On the PR commit, cache hit/write rates returned the expected partial, None, and clamped values, and conversation construction emitted exactly the intended operator-facing warning when prompt caching was inactive while staying silent for an active cached model.

Phase	Result
Environment Setup	✅ `make build` completed successfully and installed the SDK workspace dependencies.
CI Status	⚠️ GitHub reports `pre-commit` and `unresolved-review-threads` failing; several build/publish/coverage checks still in progress; `sdk-tests` completed successfully in CI.
Functional Verification	✅ Exercised the changed SDK APIs directly with real Python usage; behavior matched the PR description.

Functional Verification

Test 1: TokenUsage cache rate properties

Step 1 — Reproduce / establish baseline without the fix:
Checked out origin/main and ran a short SDK script that constructs TokenUsage(prompt_tokens=1000, cache_read_tokens=750, cache_write_tokens=200) and reads cache_hit_rate / cache_write_rate.

Relevant output:

commit fd88a5ce
cache_hit_rate AttributeError: 'TokenUsage' object has no attribute 'cache_hit_rate'
cache_write_rate AttributeError: 'TokenUsage' object has no attribute 'cache_write_rate'

This establishes the prior state: operators had no cache hit/write rate properties available on TokenUsage.

Step 2 — Apply the PR's changes:
Checked out PR commit 1643f000d7df3b45ae8b0d2cd8b080722ed415fa.

Step 3 — Re-run with the fix in place:
Ran the equivalent SDK script with realistic and edge-case usage totals.

Relevant output:

commit 1643f000
partial_read hit= 0.75 write= 0.0
partial_write hit= 0.0 write= 0.2
no_prompt_tokens hit= None write= None
clamped_read_and_write hit= 1.0 write= 1.0

This confirms the new properties work for normal reads/writes, return None when no prompt tokens are recorded, and clamp provider over-counting to 1.0.

Test 2: Conversation construction warning for inactive prompt caching

Step 1 — Reproduce / establish baseline without the fix:
On origin/main, constructed a real Conversation with LLM(model='nemotron-3-ultra-550b', api_key=SecretStr('k')) and captured warning logs from openhands.sdk.conversation.impl.local_conversation.

Relevant output:

commit fd88a5ce
prompt_cache_warning_count 0
prompt_cache_warning_messages []

This confirms the operational visibility gap described in the PR: a non-caching model can be used without any construction-time warning.

Step 2 — Apply the PR's changes:
Checked out PR commit 1643f000d7df3b45ae8b0d2cd8b080722ed415fa.

Step 3 — Re-run with the fix in place:
Constructed conversations for a non-caching model, a supported caching model, a supported model with caching explicitly disabled, and two separate non-caching conversations.

Relevant output:

non_caching_model warning_count= 1
non_caching_model warning_message= Prompt caching is not active for model 'nemotron-3-ultra-550b' on this conversation. Long agent loops will pay full prompt-token cost on every step. Consider tightening `max_iteration_per_run` or switching to a model with prompt-cache support for long-running tasks.
supported_model warning_count= 0
supported_model_disabled warning_count= 1
two_new_non_caching_conversations warning_count= 2

This confirms the warning is visible to operators when caching is inactive, includes the model name for grouping, remains silent for a supported model with active caching, and fires once per new conversation construction.

Issues Found

🟠 Issue: CI is not currently green: GitHub reports pre-commit and unresolved-review-threads as failed, with several build/publish/coverage checks still in progress. I did not rerun tests, linters, or pre-commit locally per QA instructions.

This review was created by an AI agent (OpenHands) on behalf of the user.

Final verdict: PASS WITH ISSUES

juanmichelini added the enhancement New feature or request label Jun 8, 2026 — with OpenHands AI

juanmichelini mentioned this pull request Jun 8, 2026

DRAFT: feat(stats): ConversationStats.cache_efficiency_summary() probe #3567

Open

juanmichelini requested a review from all-hands-bot June 8, 2026 14:41

all-hands-bot reviewed Jun 8, 2026

View reviewed changes

juanmichelini marked this pull request as ready for review June 9, 2026 00:28

Merge branch 'main' into openhands/sdk-3-prompt-cache-observability

1643f00

juanmichelini requested a review from all-hands-bot June 9, 2026 00:28

all-hands-bot reviewed Jun 9, 2026

View reviewed changes

Merge branch 'main' into openhands/sdk-3-prompt-cache-observability

46a5bff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: feat(observability): cache hit/write rate + non-caching-model warning#3566

DRAFT: feat(observability): cache hit/write rate + non-caching-model warning#3566
juanmichelini wants to merge 3 commits into
mainfrom
openhands/sdk-3-prompt-cache-observability

juanmichelini commented Jun 8, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

all-hands-bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot Jun 8, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot Jun 9, 2026

Uh oh!

all-hands-bot Jun 9, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -763,6 +764,38 @@ def _pin_prompt_cache_key(self) -> None:
		if self.agent.llm._prompt_cache_key is None:
		self.agent.llm._prompt_cache_key = str(self._state.id)

Conversation

juanmichelini commented Jun 8, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

1. TokenUsage.cache_hit_rate / cache_write_rate

2. One-time non-caching-model warning at conversation construction

Tests

Risk / blast radius

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review: PR #3566 — DRAFT: feat(observability): cache hit/write rate + non-caching-model warning

Taste Rating

Summary

Inline Comments

What Was Done Well

Uh oh!

all-hands-bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

⚠️ QA Report: PASS WITH ISSUES

Does this PR achieve its stated goal?

Test 1: TokenUsage.cache_hit_rate / cache_write_rate

Test 2: Inactive prompt-cache warning at conversation construction

Test 3: Warning logging failure does not block construction

Issues Found

Uh oh!

all-hands-bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review: cache hit/write rate + non-caching-model warning

Taste Rating: 🟢 Good taste — elegant, simple solution

Summary

Evaluation Checklist

Minor Suggestions (non-blocking)

Risk Assessment

Verdict

Uh oh!

all-hands-bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

⚠️ QA Report: PASS WITH ISSUES

Does this PR achieve its stated goal?

Test 1: TokenUsage cache rate properties

Test 2: Conversation construction warning for inactive prompt caching

Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

juanmichelini commented Jun 8, 2026 •

edited by github-actions Bot

Loading

1. `TokenUsage.cache_hit_rate` / `cache_write_rate`

github-actions Bot commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading

all-hands-bot commented Jun 8, 2026 •

edited

Loading

Test 1: `TokenUsage.cache_hit_rate` / `cache_write_rate`

all-hands-bot commented Jun 9, 2026 •

edited

Loading