Summary
Strengthen firewall/redaction.py so that data nested below the configured
max_depth is not returned to the LLM unredacted. Today recursion stops at the depth
cap and returns the remaining subtree verbatim.
Why this matters
The Firewall is the I-01 boundary: raw tool output must never reach the LLM without
passing a redaction step. A fail-open at depth means PII, secrets, or connection
strings nested four-or-more levels deep in a tool result flow straight through. For a
security kernel this is the highest-severity class of bug because it is silent and
contradicts the documented guarantee in docs/security.md.
Current evidence
firewall/redaction.py: redact() begins with if depth >= max_depth: return data, warnings — the subtree is returned without scanning strings or sensitive field names.
firewall/budgets.py: Budgets.max_depth defaults to 3.
firewall/transform.py calls redact(data, max_depth=self._budgets.max_depth) for every response mode.
docs/security.md lists "PII / PCI leakage → Redaction + allowed_fields enforcement in the firewall" as a mitigation with no depth caveat.
External context
Indirect prompt-injection and data-leakage research treats the tool-output boundary as
the control point; a redaction layer that stops at fixed depth is a known evasion path.
Proposed implementation
- Decide the safe-by-default behavior at depth limit: rather than returning the raw
subtree, either (a) drop it and append a warning, or (b) replace it with a
"[TRUNCATED: depth limit]" sentinel. Option (b) preserves shape; both are safe.
- Keep string-pattern redaction running at the boundary even when structural recursion
stops, so leaf strings are still scanned.
- Make the choice explicit in
Budgets (e.g., on_max_depth: "drop" | "truncate").
- Update
docs/security.md and docs/context_firewall.md to state the behavior.
AI-agent execution notes
- Inspect first:
firewall/redaction.py, firewall/budgets.py, firewall/transform.py, tests/test_redaction.py, tests/test_firewall_boundary.py.
- This file is security-grade; preserve all existing redaction paths, only close the gap.
- Edge cases: cyclic structures are not expected (JSON), but very wide dicts at the boundary should not explode cost; recursion must still terminate.
- Do not change
max_depth's default value without separate discussion.
Acceptance criteria
- A nested payload with an email/secret below
max_depth returns [REDACTED]/dropped, never the raw value.
- A warning is emitted describing the depth-limit action.
tests/test_firewall_boundary.py gains a negative assertion that a deep secret string never appears in any Frame.
- Existing redaction tests still pass.
Test plan
Add deep-nesting cases to tests/test_redaction.py and tests/test_firewall_boundary.py
(fake secrets at depth 4–6). Run full make ci.
Documentation plan
Update docs/security.md redaction row and docs/context_firewall.md depth section;
CHANGELOG Fixed/Security entry.
Migration and compatibility notes
Behavior change for deeply nested outputs (previously leaked, now redacted/truncated).
Document under Security; not expected to require caller code changes.
Risks and tradeoffs
Truncating deep structure may drop data some callers relied on seeing; the handle
still references the full dataset for authorized expansion. Safety outweighs the
convenience cost.
Suggested labels
security, reliability, testing
Summary
Strengthen
firewall/redaction.pyso that data nested below the configuredmax_depthis not returned to the LLM unredacted. Today recursion stops at the depthcap and returns the remaining subtree verbatim.
Why this matters
The Firewall is the I-01 boundary: raw tool output must never reach the LLM without
passing a redaction step. A fail-open at depth means PII, secrets, or connection
strings nested four-or-more levels deep in a tool result flow straight through. For a
security kernel this is the highest-severity class of bug because it is silent and
contradicts the documented guarantee in
docs/security.md.Current evidence
firewall/redaction.py:redact()begins withif depth >= max_depth: return data, warnings— the subtree is returned without scanning strings or sensitive field names.firewall/budgets.py:Budgets.max_depthdefaults to3.firewall/transform.pycallsredact(data, max_depth=self._budgets.max_depth)for every response mode.docs/security.mdlists "PII / PCI leakage → Redaction + allowed_fields enforcement in the firewall" as a mitigation with no depth caveat.External context
Indirect prompt-injection and data-leakage research treats the tool-output boundary as
the control point; a redaction layer that stops at fixed depth is a known evasion path.
Proposed implementation
subtree, either (a) drop it and append a warning, or (b) replace it with a
"[TRUNCATED: depth limit]"sentinel. Option (b) preserves shape; both are safe.stops, so leaf strings are still scanned.
Budgets(e.g.,on_max_depth: "drop" | "truncate").docs/security.mdanddocs/context_firewall.mdto state the behavior.AI-agent execution notes
firewall/redaction.py,firewall/budgets.py,firewall/transform.py,tests/test_redaction.py,tests/test_firewall_boundary.py.max_depth's default value without separate discussion.Acceptance criteria
max_depthreturns[REDACTED]/dropped, never the raw value.tests/test_firewall_boundary.pygains a negative assertion that a deep secret string never appears in any Frame.Test plan
Add deep-nesting cases to
tests/test_redaction.pyandtests/test_firewall_boundary.py(fake secrets at depth 4–6). Run full
make ci.Documentation plan
Update
docs/security.mdredaction row anddocs/context_firewall.mddepth section;CHANGELOG
Fixed/Securityentry.Migration and compatibility notes
Behavior change for deeply nested outputs (previously leaked, now redacted/truncated).
Document under Security; not expected to require caller code changes.
Risks and tradeoffs
Truncating deep structure may drop data some callers relied on seeing; the
handlestill references the full dataset for authorized expansion. Safety outweighs the
convenience cost.
Suggested labels
security, reliability, testing