-
Notifications
You must be signed in to change notification settings - Fork 42
[FEAT]: pytest-xdist support for distributed test execution
#73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nina-msft
wants to merge
4
commits into
microsoft:main
Choose a base branch
from
nina-msft:dev/nina-msft/8259
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,170 @@ | ||
| # Parallel Execution with pytest-xdist | ||
|
|
||
| RAMPART supports parallel test execution via `pytest-xdist`, producing a **single unified report** even when tests run across multiple worker processes. | ||
|
|
||
| --- | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ```bash | ||
| pip install pytest-xdist | ||
| pytest -n 4 | ||
| ``` | ||
|
|
||
| With `-n 4`, pytest spawns 4 worker processes that execute tests in parallel. RAMPART intercepts each worker's results, ships them to the controller process, and emits **one consolidated report** at the end of the session. | ||
|
|
||
| --- | ||
|
|
||
| ## How It Works | ||
|
|
||
| ``` | ||
| Worker 1 Worker 2 Controller | ||
| ───────── ───────── ────────── | ||
| collect results collect results | ||
| │ │ | ||
| pytest_sessionfinish pytest_sessionfinish | ||
| │ │ | ||
| serialize → workeroutput serialize → workeroutput | ||
| │ │ | ||
| └───────────┬───────────────┘ | ||
| ▼ | ||
| pytest_testnodedown (per worker) | ||
| deserialize + merge into | ||
| controller's RampartSession | ||
| │ | ||
| ▼ | ||
| pytest_sessionfinish (controller) | ||
| aggregate trials → evaluate gates → emit sinks | ||
| │ | ||
| ▼ | ||
| Single unified TestRunReport | ||
| ``` | ||
|
|
||
| - **Workers** collect [`Result`][rampart.core.result.Result] objects normally and serialize them into `config.workeroutput`. Workers do **not** emit reports. | ||
| - **Controller** receives each worker's payload via the `pytest_testnodedown` hook, merges results into its own [`RampartSession`][rampart.pytest_plugin._session.RampartSession], and emits sinks once at session end. | ||
|
|
||
| The result: **one** `JsonFileReportSink` output file, **one** call to `MyCustomSink.emit_async`, and accurate population statistics over the full result set. | ||
|
|
||
| --- | ||
|
|
||
| ## Trial Tests with xdist | ||
|
|
||
| `@pytest.mark.trial(n=, threshold=)` clones a test into N independent runs. Under xdist, clones may be distributed across workers depending on the `--dist` mode. | ||
|
|
||
| | `--dist` mode | Trial behavior | | ||
| |---------------|----------------| | ||
| | `loadgroup` | All trial clones for one test run on the same worker (recommended for locality) | | ||
| | `load` (default) | Trial clones distributed round-robin across workers | | ||
| | `loadscope` / `loadfile` | Grouped by class/module/file | | ||
|
|
||
| **Correctness is preserved regardless of mode** — the controller aggregates trial groups from the merged result set. You'll see a warning if you use `@trial` markers without `--dist=loadgroup`: | ||
|
|
||
| ```text | ||
| RAMPART @trial markers present with --dist=load. Trial clones may be | ||
| split across workers. Aggregation remains correct (controller merges | ||
| all results), but using --dist=loadgroup keeps trial clones co-located | ||
| on one worker for better locality. | ||
| ``` | ||
|
|
||
| To silence the warning and improve locality: | ||
|
|
||
| ```bash | ||
| pytest -n 4 --dist=loadgroup | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Constraints on `rampart_sinks` | ||
|
|
||
| When running under xdist, the controller process does not execute test fixtures. To discover your sinks, RAMPART scans registered conftest modules for a `rampart_sinks` attribute and calls it directly. | ||
|
|
||
| **Supported shapes:** | ||
|
|
||
| ```python | ||
| # Parameterless fixture — works on both single-process and xdist | ||
| @pytest.fixture(scope="session") | ||
| def rampart_sinks(): | ||
| return [JsonFileReportSink(output_dir=Path(".report"))] | ||
|
|
||
| # Plain list assigned at module level — works on both | ||
| rampart_sinks = [JsonFileReportSink(output_dir=Path(".report"))] | ||
| ``` | ||
|
|
||
| **Not supported under xdist** (the warning is logged and the sink is skipped): | ||
|
|
||
| ```python | ||
| # Fixture with dependencies — cannot be resolved on the controller | ||
| @pytest.fixture(scope="session") | ||
| def rampart_sinks(my_sink_config, db_connection): | ||
| return [DatabaseSink(connection=db_connection)] | ||
| ``` | ||
|
|
||
| If your sinks need dependencies, consider: | ||
|
|
||
| - Constructing them at module level with explicit configuration | ||
| - Reading configuration from environment variables inside a parameterless function | ||
| - Running without xdist (`pytest` instead of `pytest -n 4`) until a hook-based registration API is added | ||
|
|
||
| --- | ||
|
|
||
| ## Trust Boundary & Security | ||
|
|
||
| Worker payloads cross a process boundary via `execnet` and may contain attacker-controlled content (agent responses, payload text, evaluator rationale). RAMPART's serialization defends against: | ||
|
|
||
| - **Arbitrary code execution** — strict JSON-safe primitives only; no `pickle`, `marshal`, or custom `__reduce__`. | ||
| - **Schema drift** — payloads with missing or unknown schema versions are rejected fail-closed. | ||
| - **Memory exhaustion** — worker payloads are capped at 64 MB by default. | ||
| - **Terminal/log injection** — ANSI escape sequences are stripped from free-form text at the deserialization boundary. | ||
| - **Path traversal** — worker-local artifact paths are stored as opaque strings in metadata; the controller never accesses worker files. | ||
|
|
||
| ### Size cap | ||
|
|
||
| The default 64 MB cap can be overridden via the pytest CLI option or an ini setting: | ||
|
|
||
| ```bash | ||
| pytest -n 4 --rampart-xdist-max-bytes=134217728 | ||
| ``` | ||
|
|
||
| Or in `pytest.ini` / `pyproject.toml`: | ||
|
|
||
| ```ini | ||
| [pytest] | ||
| rampart_xdist_max_bytes = 134217728 | ||
| ``` | ||
|
|
||
| Workers that exceed the cap log a warning and emit a truncation marker. The controller records the affected worker as incomplete in `TestRunReport.metadata`. | ||
|
|
||
| --- | ||
|
|
||
| ## Incomplete Runs | ||
|
|
||
| If a worker crashes, runs out of time, or hits the size cap, the controller marks the run as incomplete: | ||
|
|
||
| ```python | ||
| report.metadata["incomplete"] # True if any worker failed | ||
| report.metadata["incomplete_reasons"] # list[str] — one per failure | ||
| ``` | ||
|
|
||
| Reports are still emitted with whatever data was collected. For safety-critical CI, sinks or post-processing should check the `incomplete` flag and fail the build accordingly. | ||
|
|
||
| --- | ||
|
|
||
| ## Run-Mode Metadata | ||
|
|
||
| Reports produced under xdist include: | ||
|
|
||
| ```python | ||
| report.metadata["xdist_active"] # True | ||
| report.metadata["worker_count"] # int | ||
| report.metadata["dist_mode"] # "load", "loadgroup", etc. | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Limitations | ||
|
|
||
| - Sinks discovered on the controller cannot depend on other pytest fixtures (see Constraints above). | ||
| - Mixed RAMPART versions across controller and workers are unsupported; install the same version everywhere. | ||
| - `pytest-xdist` itself does not support interactive debugging (`--pdb`, `--trace`); use single-process mode for debugging. | ||
|
|
||
| A hook-based sink registration API for complex sink configurations is a planned follow-up. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ai note:
ANSI stripping doesn't cover OSC, and the security notes claim it does.
_ANSI_ESCAPE_RE = re.compile(r"\x1b\[[0-9;]*[A-Za-z]")matches CSI only. It does not match:\x1b]…BELor\x1b]…ST) — used for terminal hyperlinks (\x1b]8;;https://attacker/\x1b\\text\x1b]8;;\x1b\\) and window-title sets. Windows Terminal, iTerm2, modern GNOME Terminal, and Konsole all render hyperlinks. So an attacker-controlled response can produce a clickable link inpytest's terminal output even though "ANSI was stripped".\x1b[PXY^_]…ST)\x9b…) — less common but allowed by ECMA-48The PR explicitly markets ANSI stripping as a defense against terminal injection from compromised workers, so this gap directly undermines a stated security property. A stricter pattern that handles the C1 introducers, or just
re.sub(r"[\x00-\x08\x0b-\x1f\x7f-\x9f]", "", text)to strip all C0/C1 controls, would close it. Whichever you pick, also worth adding a couple of OSC-flavored test cases toTestStripAnsi.