[Prompt registry] Phase 2 — Port static-tier sections behind the snap… by VascoSch92 · Pull Request #3659 · OpenHands/software-agent-sdk

VascoSch92 · 2026-06-11T14:02:58Z

HUMAN:

This is pahse 2 of #3606. In this PR, we just port the static part of the system prompt inside the new structure.

A human has tested these changes.

AGENT:

Why

Part of the prompt-registry roadmap (#3606; proposal #2827), which replaces the monolithic system_prompt.j2 + post-render refine() with a typed section registry whose static/dynamic cache split is declared per-section and
unit-testable.

This PR is the static-tier half of Phase 2 (#3610): it ports the ~17 static blocks of system_prompt.j2 into typed PromptSection classes assembled by a default registry, so registry.build(ctx).static reproduces today's
prompt byte-for-byte. It is purely additive and lands behind the Phase 0 snapshot — nothing is wired into the runtime prompt yet (that is the Phase 3 cutover).

Summary

Add 17 pure-Python PromptSection classes (no Jinja) in context/prompts/sections/static.py, ported verbatim from system_prompt.j2, plus build_default_registry() (context/prompts/default_registry.py) registering them in
template order.
registry.build(ctx).static reproduces AgentBase.static_system_message byte-for-byte across the full Phase 0 matrix (model family × browser × security-analyzer × cli_mode) and the win32 cell; every section also has a
standalone unit test (no Jinja environment).
Additive only: no change to system_prompt.j2, the snapshot oracle, or any runtime path. refine() (win32 shell-term swap) is applied only in the two blocks that actually contain shell terms.

Issue Number

#3610 (roadmap #3606, proposal #2827)

How to Test

Library-only change with no runtime wiring yet (behind the snapshot), so the end-to-end evidence is the byte-for-byte equivalence test — it builds real Agent instances (not mocks), renders the real static_system_message,
and asserts the registry reproduces it:

uv run pytest tests/sdk/context/prompts/test_default_registry.py -q
# 34 passed — every Phase 0 matrix cell byte-for-byte + win32 + per-section unit tests

Regression (legacy render path untouched) + Phase 1:

uv run pytest tests/sdk/context/prompts/ tests/sdk/context/test_agent_context.py tests/sdk/agent/test_build_prompt_context.py -q
# 167 passed — includes the 48-cell snapshot oracle, unchanged

Full lint/type gate:

uv run pre-commit run --files \
  openhands-sdk/openhands/sdk/context/prompts/sections/__init__.py \
  openhands-sdk/openhands/sdk/context/prompts/sections/static.py \
  openhands-sdk/openhands/sdk/context/prompts/default_registry.py \
  tests/sdk/context/prompts/test_default_registry.py
# ruff format, ruff lint, pycodestyle, pyright, import rules, tool-registration — all pass

Video/Screenshots

N/A — no UI or runtime surface (the registry is not wired into the prompt yet). Evidence is the test output above; the equivalence test exercises real Agent/LLM objects, not mocks.

Type

Notes

Behind the snapshot: static_system_message is not yet routed through the registry — that is the Phase 3 cutover. This PR only proves equivalence.
Inter-section spacing: the registry joins sections with one blank line; the legacy template leaves 2–5 around guarded {% if %} blocks. The equivalence test normalizes only those </TAG>…3+ blanks…<TAG> boundaries (one
tag-anchored regex), so every section body is asserted byte-for-byte. The literal whitespace shift lands at the Phase 3 cutover.
Lint: static.py carries a file-level # ruff: noqa: E501 for its verbatim long prompt lines; pyproject.toml is untouched.
Intentional edge-case divergences (outside the matrix): <SECURITY> is guarded on security_policy_filename (omitted when empty, vs. the template's empty tags); a custom security_policy_filename would resolve its content
into the context (follow-up).
Follow-up: point 2 of [Prompt registry] Phase 2 — Port sections behind the snapshot #3610 — the dynamic-tier sections (DateTime/RepoContext/AvailableSkills/CustomSuffix/CustomSecrets) from system_message_suffix.j2.

…shot

github-actions · 2026-06-11T14:03:41Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-11T14:03:48Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-11T14:06:22Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
TOTAL	30610	8435	72%

report-only-changed-files is enabled. No files were changed during this commit :)

all-hands-bot

⚠️ QA Report: PASS WITH ISSUES

The new static prompt registry works as an additive SDK API and matches legacy section content after documented gap normalization, but literal byte-for-byte parity with Agent.static_system_message is not true.

Does this PR achieve its stated goal?

Partially. I exercised the SDK as a user would by importing build_default_registry(), constructing real Agent/LLM objects, building prompt contexts, and comparing the registry output to the existing static prompt path across default, analyzer, sandbox-tier, browser-enabled, and Windows-simulated scenarios. Those scenarios matched after the PR's documented inter-section gap normalization and produced no dynamic block, so the static-tier port is functionally usable behind the snapshot. However, exact equality was False for the real default prompt (15187 vs 15191 bytes), so the PR does not literally satisfy the repeated byte-for-byte parity claim.

Phase	Result
Environment Setup	✅ `make build` completed successfully.
CI Status	🟡 21 checks passing, 8 pending, 2 skipped at review time; no checks were rerun.
Functional Verification	⚠️ Registry behavior verified with real SDK objects; exact byte parity issue found.

Functional Verification

Test 1: Establish base behavior without the PR registry

Step 1 — Reproduce / establish baseline (without the fix):
Ran git worktree add --detach /tmp/qa-pr3659-base origin/main and then executed the SDK from that base worktree:

base_has_default_registry= False
base_static_len= 15191
base_starts= You are OpenHands agent, a helpful AI assistant that can interact with a computer to solve tasks.
base_has_role= True
base_has_browser= False
base_has_security_policy= True

This shows the base branch has the legacy Agent.static_system_message path but no openhands.sdk.context.prompts.default_registry entry point.

Step 2 — Apply the PR's changes:
Returned to PR head 3fd47bf95719a408dfc5132369a20d811c655da1 on vasco/static-part after bootstrapping with make build.

Step 3 — Re-run with the PR in place:
Executed real SDK agents and the new registry API:

CASE=default_gpt5
  registry_static_len= 15187
  legacy_static_len= 15191
  exact_equal= False
  canonical_equal= True
  dynamic_is_none= True
  has_security= True
  has_security_risk= True
  has_important= True
CASE=anthropic_with_analyzer
  registry_static_len= 14489
  legacy_static_len= 14493
  exact_equal= False
  canonical_equal= True
  dynamic_is_none= True
  has_security= True
  has_security_risk= True
  has_important= True
CASE=sandbox_risk_tiers
  registry_static_len= 15139
  legacy_static_len= 15143
  exact_equal= False
  canonical_equal= True
  dynamic_is_none= True
  has_security= True
  has_security_risk= True
  has_important= True

This shows the new API is usable and the static bodies match after gap normalization, but exact byte-for-byte parity with the live prompt path is not achieved.

Test 2: Browser, Windows, and edge-path behavior

Step 1 — Reproduce / establish baseline (without the fix):
On base, the new registry import was absent, so these sectionized paths could not be exercised there.

Step 2 — Apply the PR's changes:
Used the PR branch's new registry and section classes directly with SDK PromptContext/Agent objects.

Step 3 — Re-run with the PR in place:
Ran browser-enabled, Windows-simulated, and disabled-security-policy cases:

CASE=browser_enabled_agent
  ctx_enable_browser= True
  registry_has_browser= True
  legacy_has_browser= True
  canonical_equal= True
CASE=browser_guard_direct_context
  disabled= False
  enabled= True
CASE=windows_agent
  ctx_platform= windows
  registry_contains_powershell= True
  registry_contains_execute_powershell= False
  canonical_equal= True
CASE=security_policy_disabled
  registry_static_len= 13674
  legacy_static_len= 13703
  exact_equal= False
  canonical_equal= False
  dynamic_is_none= True
  has_security= False
  has_security_risk= True
  has_important= True

This confirms browser and Windows paths behave as expected after normalization. The disabled security policy divergence matches the PR notes and is not runtime-wired yet, but it reinforces that the new registry is not a literal byte-for-byte replacement for all legacy static prompt outputs.

Exact mismatch evidence

Ran a compact diff summary on the default PR prompt:

exact_equal= False
registry_len= 15187 legacy_len= 15191 delta= 4
first_diff_index= 9476
registry_excerpt= 'ser has explicitly requested and would expect\n\n</SECURITY>\n\n<SECURITY_RISK_ASSESSMENT>\n# Security Risk Policy\nWhen using tools that support '
legacy_excerpt= 'ser has explicitly requested and would expect\n\n</SECURITY>\n\n\n<SECURITY_RISK_ASSESSMENT>\n# Security Risk Policy\nWhen using tools that support'

This pinpoints the observed difference as inter-section blank-line spacing.

Issues Found

🟡 Minor: The new registry is functionally usable and section bodies match after normalization, but registry.build(ctx).static is not literally byte-for-byte equal to Agent.static_system_message despite the docstring/PR wording implying that level of parity.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot · 2026-06-11T14:10:03Z

+
+``build_default_registry()`` registers the static-tier sections in the exact order
+``agent/prompts/system_prompt.j2`` emits them, so ``registry.build(ctx).static``
+reproduces ``AgentBase.static_system_message``. The dynamic-tier sections


🟡 Minor: QA execution against a real Agent showed registry.build(ctx).static is not byte-for-byte equal to Agent.static_system_message (default gpt-5: 15187 vs 15191 bytes; first difference is an extra legacy blank line before <SECURITY_RISK_ASSESSMENT>). Since this docstring says the registry reproduces the legacy message and the PR description emphasizes byte-for-byte parity, please either preserve exact inter-section spacing here or clarify that this is canonicalized/inter-section-normalized parity until the Phase 3 cutover.

This comment was created by an AI agent (OpenHands) on behalf of the user.

[Prompt registry] Phase 2 — Port static-tier sections behind the snap…

3fd47bf

…shot

VascoSch92 requested a review from all-hands-bot June 11, 2026 14:03

Merge branch 'main' into vasco/static-part

949c285

all-hands-bot reviewed Jun 11, 2026

View reviewed changes

VascoSch92 mentioned this pull request Jun 11, 2026

[Prompt registry] Phase 2 — Port sections behind the snapshot #3610

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Prompt registry] Phase 2 — Port static-tier sections behind the snap…#3659

[Prompt registry] Phase 2 — Port static-tier sections behind the snap…#3659
VascoSch92 wants to merge 2 commits into
mainfrom
vasco/static-part

VascoSch92 commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

VascoSch92 commented Jun 11, 2026

Why

Summary

Issue Number

How to Test

Video/Screenshots

Type

Notes

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

⚠️ QA Report: PASS WITH ISSUES

Does this PR achieve its stated goal?

Test 1: Establish base behavior without the PR registry

Test 2: Browser, Windows, and edge-path behavior

Exact mismatch evidence

Issues Found

Uh oh!

all-hands-bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading