feat(bedrock): add prompt caching via CachePoint markers by skashmeri · Pull Request #1940 · kagent-dev/kagent

skashmeri · 2026-05-27T19:28:22Z

feat(bedrock): add prompt caching via CachePoint markers

Closes #1871.

Why this is needed

Multi-turn tool-using kagent agents on Bedrock pay full input-token cost
on every Converse call, because the static prefix (system prompt + tool
definitions) is re-sent and re-billed each turn. Real measurement from a
production deployment using Claude Sonnet 4.5 via the us. inference
profile in us-east-1, running every 2 hours against a ~700-pod EKS
cluster:

Per sweep: ~4 Converse calls
Cumulative input tokens (CloudWatch InvokeModel metric): ~313k
Cumulative output tokens: ~3k
Per-sweep cost: ~$0.98 (input dominates ~95%)
Per cluster/year (5 sweeps/weekday): ~$1,300
Per cluster/year (24/7 every 2h): projected ~$4-9k

~30k of the per-call input is identical across every call — system
prompt and tool definitions don't change inside a single task. Bedrock
prompt caching is designed precisely for this case: a cachePoint block
in the Converse request marks where the cacheable prefix ends, and
subsequent calls within ~5 minutes (per region) hit the cache and bill
the prefix at a reduced rate.

The Bedrock provider builds Converse requests using
system: [textBlock] and toolConfig.tools: [...] but never appends a
cachePoint block to either array, so caching is never engaged.

Why this is not redundant with the existing `spec.declarative.compaction`

kagent already has an Agent-level context-compaction feature
(Compaction, CompactionInterval, Summarizer, TokenThreshold,
etc.) that summarizes old conversation turns when the conversation
exceeds a token threshold. That solves a different problem:

Compaction: shrinks the conversation prompt when it gets too long.
Helps with context-window pressure on long-running agents.
Prompt caching: keeps the prompt the same size but tells Bedrock
"the first N tokens are stable across calls, cache them and bill
cached portion at the reduced rate."

Neither replaces the other. For a tool-using agent whose conversation
stays under the context limit but whose static prefix (system prompt +
tool defs) is large, prompt caching is the right hammer; compaction does
nothing because there's nothing to compact in the static prefix.

What this PR does

Adds a promptCaching: bool field to BedrockConfig (defaulting to
false to preserve existing behavior). When set, the provider appends
a CachePoint block:

To the end of the system content array (after the system text block)
To the end of the toolConfig.tools array (after the last ToolSpec)

Markers use CachePointTypeDefault. Bedrock silently ignores cache
points on models that don't support prompt caching, so the field is
safe to enable on a heterogeneous model fleet without per-model
gating.

Tested against us.anthropic.claude-sonnet-4-5-20250929-v1:0: the
second and subsequent Converse calls within the cache window drop their
input-token billing by ~70-90% on cache hits, depending on which static
portion (system vs tools vs both) is being hit.

Implementation surface

Mirrors the change across both runtimes — Go (for runtime: go agents)
and Python (for runtime: python agents):

Go:

go/api/v1alpha2/modelconfig_types.go: add PromptCaching to
BedrockConfig CRD struct with full doc + kubebuilder default.
go/api/adk/types.go: add PromptCaching to the internal
adk.Bedrock serializable model so it flows through agent config JSON.
go/core/internal/controller/translator/agent/adk_api_translator.go:
populate the new field when translating ModelConfig CR -> adk.Bedrock.
go/adk/pkg/agent/agent.go: thread the value into models.BedrockConfig.
go/adk/pkg/models/bedrock.go: emit the cache point markers in the
Converse request builders.

Python:

python/.../adk/types.py: add prompt_caching: bool to the
Bedrock Pydantic model and pass through to KAgentBedrockLlm factory.
python/.../adk/models/_bedrock.py: append
{"cachePoint": {"type": "default"}} to kwargs["system"] and
kwargs["toolConfig"]["tools"] when enabled.

Regenerated CRDs via make controller-manifests so
helm/kagent-crds/templates/kagent.dev_modelconfigs.yaml reflects the
new schema field.

Tests

go/adk/pkg/models/bedrock_test.go: new
TestConvertGenaiToolsToBedrockPromptCaching covering three cases:
disabled = no marker, enabled = marker appended at END of tool list
with default type, enabled-but-no-tools = no marker (no point in a
standalone marker).

Existing convertGenaiToolsToBedrock callers updated to pass the new
promptCaching bool argument as false (no behavior change).

Backward compatibility

promptCaching defaults to false everywhere; existing
ModelConfigs pick up no behavior change.
Serialized adk.Bedrock JSON uses omitempty for the new field;
older agent pods deserializing newer config see an unknown field
they ignore (Pydantic + Go json decoders both lenient by default).
The Converse API tolerates and ignores CachePoint markers on
models that don't support caching, so enabling on a mixed-model
setup is safe.

Example usage

apiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
  name: bedrock-claude
spec:
  provider: Bedrock
  model: us.anthropic.claude-sonnet-4-5-20250929-v1:0
  bedrock:
    region: us-east-1
    promptCaching: true

Signed-off-by: Shamil Kashmeri shamil@viafoura.com

Closes kagent-dev#1871. ## Why this is needed Multi-turn tool-using kagent agents on Bedrock pay full input-token cost on every Converse call, because the static prefix (system prompt + tool definitions) is re-sent and re-billed each turn. Real measurement from a production deployment using Claude Sonnet 4.5 via the `us.` inference profile in us-east-1, running every 2 hours against a ~700-pod EKS cluster: - Per sweep: ~4 Converse calls - Cumulative input tokens (CloudWatch InvokeModel metric): ~313k - Cumulative output tokens: ~3k - Per-sweep cost: ~$0.98 (input dominates ~95%) - Per cluster/year (5 sweeps/weekday): ~$1,300 - Per cluster/year (24/7 every 2h): projected ~$4-9k ~30k of the per-call input is identical across every call — system prompt and tool definitions don't change inside a single task. Bedrock prompt caching is designed precisely for this case: a `cachePoint` block in the Converse request marks where the cacheable prefix ends, and subsequent calls within ~5 minutes (per region) hit the cache and bill the prefix at a reduced rate. The Bedrock provider builds Converse requests using `system: [textBlock]` and `toolConfig.tools: [...]` but never appends a `cachePoint` block to either array, so caching is never engaged. ## Why this is not redundant with the existing `spec.declarative.compaction` kagent already has an Agent-level context-compaction feature (`Compaction`, `CompactionInterval`, `Summarizer`, `TokenThreshold`, etc.) that summarizes old conversation turns when the conversation exceeds a token threshold. That solves a different problem: - Compaction: shrinks the conversation prompt when it gets too long. Helps with context-window pressure on long-running agents. - Prompt caching: keeps the prompt the same size but tells Bedrock "the first N tokens are stable across calls, cache them and bill cached portion at the reduced rate." Neither replaces the other. For a tool-using agent whose conversation stays under the context limit but whose static prefix (system prompt + tool defs) is large, prompt caching is the right hammer; compaction does nothing because there's nothing to compact in the static prefix. ## What this PR does Adds a `promptCaching: bool` field to `BedrockConfig` (defaulting to `false` to preserve existing behavior). When set, the provider appends a `CachePoint` block: 1. To the end of the `system` content array (after the system text block) 2. To the end of the `toolConfig.tools` array (after the last ToolSpec) Markers use `CachePointTypeDefault`. Bedrock silently ignores cache points on models that don't support prompt caching, so the field is safe to enable on a heterogeneous model fleet without per-model gating. Tested against `us.anthropic.claude-sonnet-4-5-20250929-v1:0`: the second and subsequent Converse calls within the cache window drop their input-token billing by ~70-90% on cache hits, depending on which static portion (system vs tools vs both) is being hit. ## Implementation surface Mirrors the change across both runtimes — Go (for `runtime: go` agents) and Python (for `runtime: python` agents): Go: - `go/api/v1alpha2/modelconfig_types.go`: add `PromptCaching` to `BedrockConfig` CRD struct with full doc + kubebuilder default. - `go/api/adk/types.go`: add `PromptCaching` to the internal `adk.Bedrock` serializable model so it flows through agent config JSON. - `go/core/internal/controller/translator/agent/adk_api_translator.go`: populate the new field when translating ModelConfig CR -> adk.Bedrock. - `go/adk/pkg/agent/agent.go`: thread the value into `models.BedrockConfig`. - `go/adk/pkg/models/bedrock.go`: emit the cache point markers in the Converse request builders. Python: - `python/.../adk/types.py`: add `prompt_caching: bool` to the `Bedrock` Pydantic model and pass through to `KAgentBedrockLlm` factory. - `python/.../adk/models/_bedrock.py`: append `{"cachePoint": {"type": "default"}}` to `kwargs["system"]` and `kwargs["toolConfig"]["tools"]` when enabled. Regenerated CRDs via `make controller-manifests` so `helm/kagent-crds/templates/kagent.dev_modelconfigs.yaml` reflects the new schema field. ## Tests `go/adk/pkg/models/bedrock_test.go`: new `TestConvertGenaiToolsToBedrockPromptCaching` covering three cases: disabled = no marker, enabled = marker appended at END of tool list with default type, enabled-but-no-tools = no marker (no point in a standalone marker). Existing `convertGenaiToolsToBedrock` callers updated to pass the new `promptCaching bool` argument as `false` (no behavior change). ## Backward compatibility - `promptCaching` defaults to `false` everywhere; existing ModelConfigs pick up no behavior change. - Serialized `adk.Bedrock` JSON uses `omitempty` for the new field; older agent pods deserializing newer config see an unknown field they ignore (Pydantic + Go json decoders both lenient by default). - The Converse API tolerates and ignores `CachePoint` markers on models that don't support caching, so enabling on a mixed-model setup is safe. ## Example usage ```yaml apiVersion: kagent.dev/v1alpha2 kind: ModelConfig metadata: name: bedrock-claude spec: provider: Bedrock model: us.anthropic.claude-sonnet-4-5-20250929-v1:0 bedrock: region: us-east-1 promptCaching: true ``` Signed-off-by: Shamil Kashmeri <shamil@viafoura.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an opt-in “prompt caching” capability for Amazon Bedrock (Converse API) across the CRD/API layers and both Go + Python Bedrock runtime implementations by inserting Bedrock CachePoint markers into requests.

Changes:

Introduces a promptCaching/prompt_caching flag in ModelConfig/ADK types and plumbs it through translators and model factories.
Updates Go Bedrock request building to append CachePoint markers to system and toolConfig.tools and adjusts tool conversion signature + tests.
Updates Python Bedrock (Converse) request building to append CachePoint markers to system and toolConfig.tools.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
python/packages/kagent-adk/src/kagent/adk/types.py	Adds `prompt_caching` to the Python Bedrock model config type and forwards it during LLM creation.
python/packages/kagent-adk/src/kagent/adk/models/_bedrock.py	Implements Bedrock Converse request mutation to append `cachePoint` markers when enabled.
helm/kagent-crds/templates/kagent.dev_modelconfigs.yaml	Extends CRD schema with `promptCaching` for Bedrock.
go/core/internal/controller/translator/agent/adk_api_translator.go	Wires CRD `PromptCaching` into the ADK Bedrock model payload.
go/api/v1alpha2/modelconfig_types.go	Adds `PromptCaching` to the BedrockConfig CRD Go type with docs/default.
go/api/config/crd/bases/kagent.dev_modelconfigs.yaml	Extends generated CRD schema with `promptCaching` for Bedrock.
go/api/adk/types.go	Adds `PromptCaching` to the Go ADK Bedrock type for JSON payloads.
go/adk/pkg/models/bedrock_test.go	Updates tool conversion calls + adds tests for cache-point insertion behavior.
go/adk/pkg/models/bedrock.go	Adds config flag, appends cache markers to system/tools, and threads flag into tool conversion.
go/adk/pkg/agent/agent.go	Passes `PromptCaching` from ADK model into Bedrock model config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+            # If prompt caching is on, mark the end of the system content as
+            # a cache breakpoint. Bedrock caches everything up to and including
+            # this point for ~5 minutes; subsequent requests with the same
+            # prefix hit the cache. No-op if we didn't produce any system text.
+            if self.prompt_caching and kwargs.get("system"):
+                kwargs["system"].append({"cachePoint": {"type": "default"}})


+                    # CachePoint at the END of the tool list: tool definitions
+                    # are usually the biggest static chunk of an agent request
+                    # and benefit most from caching.
+                    if self.prompt_caching:
+                        converse_tools.append({"cachePoint": {"type": "default"}})
                    kwargs["toolConfig"] = {"tools": converse_tools}


+	// the end of the `tools` array. Bedrock will cache the prefix up to and
+	// including those cache points across requests in the same region for
+	// roughly 5 minutes after first use, billing the cached portion at a
+	// reduced rate on cache hits.


Copilot AI review requested due to automatic review settings May 27, 2026 19:28

skashmeri requested review from EItanya, ilackarms, iplay88keys, jmhbh, peterj, supreme-gg-gg and yuval-k as code owners May 27, 2026 19:28

github-actions Bot added the enhancement New feature or request label May 27, 2026

Copilot AI reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bedrock): add prompt caching via CachePoint markers#1940

feat(bedrock): add prompt caching via CachePoint markers#1940
skashmeri wants to merge 1 commit into
kagent-dev:mainfrom
skashmeri:feat/bedrock-prompt-caching

skashmeri commented May 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

skashmeri commented May 27, 2026

Why this is needed

Why this is not redundant with the existing spec.declarative.compaction

What this PR does

Implementation surface

Tests

Backward compatibility

Example usage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why this is not redundant with the existing `spec.declarative.compaction`