fix(storage): reject path-traversal artifact ids at the path chokepoint by minion1227 · Pull Request #301 · vouchdev/vouch

minion1227 · 2026-07-01T03:49:12Z

What changed

Artifact ids are now rejected at the storage layer when they contain a path
separator, .., a nul byte, or are empty. A new _reject_unsafe_id guard
runs at the _yaml / _page_path chokepoint that every _*_path helper
funnels through, so all artifact read/write paths are covered at once.

Why

Source.id is locked to a hex sha256, but Claim, Page, Entity,
Relation, Evidence, Session, and Proposal all declare a bare
id: str with no validator, and _yaml turned that straight into
kb_dir/<sub>/<id>.yaml with no containment check. A poisoned id could then
escape kb_dir:

bundle smuggling → later write outside the tree. import_apply uses
_safe_member_path to keep each tar member's filename inside kb_dir,
but does not check the id field inside the YAML body. A claim landed
under claims/innocent.yaml whose id is "../../../tmp/pwned" reads
back fine (safe filename), but the next update_claim / supersede /
contradict resolves _claim_path(claim.id) to a path outside kb_dir.
arbitrary read on get. get_claim("../../../etc/hostname") resolved
straight through _yaml before this change.

This is the same class of hole the tar-member fix (bundle._safe_member_path,
the CVE-2007-4559 fix) already closed on the import path — _reject_unsafe_id
mirrors its _unsafe_name_reason checks at the storage chokepoint. closes #149.

What might break

nothing legitimate. artifact ids are flat slugs / hex / uuids — none contain
/, \, .., or nul — so every real read/write is unaffected (the full
suite, including bundle/sync/session round-trips, is green). the only calls
that now raise ValueError are ones that were already unsafe. no on-disk
format, kb.* method, or audit-log shape changes.

a complementary model-layer id validator (mirroring
Source._id_is_hex_sha256) is deliberately left as a follow-up: the storage
guard already closes every reach path, including raw-id reads a model
validator cannot see, and keeping it storage-only avoids overlapping the
model classes.

VEP

not a surface change — a storage-layer containment guard in the same shape
already merged for tar members. no VEP needed.

Tests

make check — ruff clean; mypy clean on touched files; pytest green
(only pre-existing, environment-only failures remain: fastapi/[web]
not installed and embeddings installed, both of which also fail on
main)
new tests in tests/test_storage.py: _yaml / _page_path reject
traversal / separator / absolute / empty ids (parametrised); put_*,
get_*, and put_page reject traversal ids; the disk-smuggled-id →
update_claim exploit chain is blocked before any write; and a
regression guard that legitimate slug / hex ids still resolve
CHANGELOG.md updated under ## [Unreleased]

Summary by CodeRabbit

Bug Fixes
- Strengthened storage handling to reject unsafe artifact and source IDs, preventing path traversal and accidental writes or reads outside the app’s data directory.
- IDs that are empty, contain path separators (/ or \), path traversal segments (./..), or NUL bytes are now blocked across claim and page operations, including updates.
- Added regression coverage (including a “disk smuggling” scenario) to ensure malicious on-disk content can’t be promoted through updates, while valid IDs still resolve normally.

`Source.id` is locked to a hex sha256, but `Claim`, `Page`, `Entity`, `Relation`, `Evidence`, `Session`, and `Proposal` all carry a bare `id: str` with no validator. `_yaml` / `_page_path` turned those ids straight into filesystem paths with no containment check, so a poisoned id — e.g. `"../../tmp/x"` smuggled through a bundle body that `import_apply` lands under a safe filename but whose in-memory id is traversal — could escape `kb_dir` on the next put / update / lifecycle write, or read an arbitrary file on get. add `_reject_unsafe_id`, mirroring `bundle._unsafe_name_reason`: an id must be non-empty and free of nul bytes, path separators, and `..` components. route it through `_yaml` and `_page_path` — the single chokepoint every `_*_path` helper funnels through — so all artifact read/write paths are guarded at once. a complementary model-layer id validator (mirroring `Source._id_is_hex_sha256`) is left as a follow-up; the storage guard already closes every reach path, including raw-id reads a model validator cannot see. closes vouchdev#149

coderabbitai · 2026-07-01T03:49:28Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 206f4b37-f58e-4863-9162-242fe03ec4a3

📥 Commits

Reviewing files that changed from the base of the PR and between 137b515 and a5db33c.

📒 Files selected for processing (2)

src/vouch/storage.py
tests/test_storage.py

🚧 Files skipped from review as they are similar to previous changes (2)

src/vouch/storage.py
tests/test_storage.py

📝 Walkthrough

Walkthrough

Adds a storage-layer unsafe-ID validator, wires it into YAML, page, and source path construction, expands regression coverage for rejected and valid IDs, and adds a changelog note describing the path-traversal hardening.

Changes

Artifact ID Path Traversal Guard

Layer / File(s)	Summary
Unsafe ID validator and path helper wiring `src/vouch/storage.py`	Adds `_reject_unsafe_id(obj_id)` and calls it from `_yaml(sub, obj_id)`, `_page_path(page_id)`, and `_source_dir(source_id)` before filesystem paths are built.
Regression tests and changelog `tests/test_storage.py`, `CHANGELOG.md`	Adds storage tests for unsafe-id rejection across source, YAML, page, claim, and update flows, includes a disk-smuggling regression and a safe-id path check, and documents the fix in the changelog.

Estimated code review effort: 2 (Simple) | ~12 minutes

Possibly related issues

Path traversal: untrusted slug_hint / artifact id can write approved artifacts outside the KB #170: This PR hardens the same storage path-construction helpers for artifact IDs and rejects traversal-like values at the chokepoints.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the storage fix for rejecting unsafe artifact ids.
Linked Issues check	✅ Passed	The PR enforces storage-layer rejection for unsafe artifact ids across YAML, pages, and source paths, matching the issue's traversal-prevention goal.
Out of Scope Changes check	✅ Passed	No unrelated changes are evident; the changelog note and source-path guard support the same storage-hardening work.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/vouch/storage.py`:
- Around line 257-265: The storage path validation is incomplete because source
paths are still built from untrusted source IDs without rejection. Update the
source lookup flow in Storage, especially get_source(), read_source_content(),
and _source_dir(), to apply _reject_unsafe_id() to source_id before constructing
paths so traversal inputs cannot escape kb_dir. Keep the fix aligned with the
existing _yaml(), _page_path(), and _claim_path() validation pattern.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 868a3b3e-60b5-42f6-bee6-d17f55afff2f

📥 Commits

Reviewing files that changed from the base of the PR and between 94e84b1 and 137b515.

📒 Files selected for processing (3)

CHANGELOG.md
src/vouch/storage.py
tests/test_storage.py

address review on vouchdev#301: `_yaml` / `_page_path` were guarded, but `get_source` / `read_source_content` (and the evidence-ref existence checks) route raw `source_id` strings through `_source_dir` without it, so `get_source("../../outside")` could still escape kb_dir to `<outside>/meta.yaml` / `content`. `Source.id` is hex-locked on the model but these read paths take unvalidated strings. apply `_reject_unsafe_id` at the `_source_dir` chokepoint too, closing the last storage traversal path. add get_source / read_source_content rejection tests.

minion1227 · 2026-07-01T04:02:17Z

good catch — fixed in a5db33c. added _reject_unsafe_id at the _source_dir
chokepoint too, so get_source / read_source_content and the evidence-ref
existence checks reject traversal ids as well. Source.id is hex-locked on
the model, but those read paths take raw strings, so this closes the last
storage traversal path. added get_source / read_source_content rejection
tests; full suite still green.

github-actions Bot added docs documentation, specs, examples, and repo guidance storage kb storage, migrations, schemas, and proposals tests tests and fixtures size: S 50-199 changed non-doc lines labels Jul 1, 2026

coderabbitai Bot reviewed Jul 1, 2026

View reviewed changes

Comment thread src/vouch/storage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(storage): reject path-traversal artifact ids at the path chokepoint#301

fix(storage): reject path-traversal artifact ids at the path chokepoint#301
minion1227 wants to merge 2 commits into
vouchdev:mainfrom
minion1227:minion_149

minion1227 commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

minion1227 commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

minion1227 commented Jul 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Why

What might break

VEP

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

minion1227 commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

minion1227 commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading