Harden ACP/LLM transport and sandbox experiment execution#290
Open
DermotOBrien-EC wants to merge 5 commits into
Open
Harden ACP/LLM transport and sandbox experiment execution#290DermotOBrien-EC wants to merge 5 commits into
DermotOBrien-EC wants to merge 5 commits into
Conversation
Add experiment.sandbox.dataset_cache_root (default /opt/datasets) and thread it through the network_disabled_guidance prompt block so generated experiment code is instructed to load torchvision datasets from the configured path with download=False. The default value matches the prior hardcoded constant, so existing configs that omit the field render identical prompts. Tighten missing-data semantics: the prompt now forbids silent synthetic-data fallback and requires FileNotFoundError + non-zero exit if a pre-cached dataset file is missing. This is an intentional behaviour change for every sandbox network_policy="none" codegen call, motivated by a focused-replay defect where missing MNIST raw files were papered over with synthetic tensors. Add an explicit DATASET_USED: <name> stdout-stamp requirement so downstream metric capture has a dataset-provenance signal independent of whatever JSON result schema CodeAgent invents. Focused replay (CODE_GENERATION..EXPERIMENT_RUN against MNIST raw files pre-staged at /tmp/arc_sandbox_trial/datasets) confirms all three checks: the stamp appears in stdout, run-1.json:metrics carries 105 per-condition namespaced keys, and the canonical runs/results.json contains the structured harness output rather than a stdout-parsed fallback.
Include both tail-capped stdout and stderr in the exit-N ACP failure message; acpx often writes the real error to stdout while stderr is empty, so the prior message discarded it. Preserve reconnect and command-too-long markers verbatim from the raw streams so tail truncation cannot disable the retry and stdin-fallback paths in _send_prompt.
Align the LLMConfig dataclass default with the from_rc_config() fallback of 600s. A direct LLMConfig construction path omitted timeout_sec and silently inherited the old 300s default.
Cold agent start-up through acpx can exceed the old hardcoded 30s session create/ensure budget (claude observed at ~31s), flaking the first stage of a run. Add session_init_timeout_sec (default 120) on AcpConfig/ACPConfig, thread it through the loader and from_rc_config, and fall back from 'ensure' to 'new' on TimeoutExpired instead of aborting session init with an unhandled exception.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Five focused reliability fixes for long autonomous runs, concentrated in the LLM/ACP transport and the sandbox experiment path.
acpxagent start was observed at ~31s, overrunning the old hardcoded 30s session create/ensure budget and flaking a run's first stage. AddsAcpConfig.session_init_timeout_secand falls back fromensuretonewonTimeoutExpired.LLMConfigconstruction path silently inherited the old 300s default; this matches it to thefrom_rc_configfallback.acpxstdout in ACP failure diagnostics.acpxoften writes the real error to stdout while stderr is empty; the failure message now includes both (tail-capped) and preserves reconnect / command-too-long markers so truncation cannot disable the retry and stdin-fallback paths./opt/datasets, so existing configs render identical prompts). Also tightens the prompt contract: forbid silent synthetic-data fallback, requireFileNotFoundError+ non-zero exit on missing cached data, and emit aDATASET_USED: <name>stamp. Note: in this PR the no-synthetic rule and theDATASET_USEDstamp are prompt-level guidance only — runtime enforcement of dataset provenance is intentionally deferred to a follow-up PR.results.jsonover stdout-parsed metrics, clear the sandbox before each run, and mtime-anchor the discovery glob so a prior failed run can't leak stale results.Tests: +678 lines across
test_rc_llm.py,test_rc_prompts.py,test_rc_config.py, andtest_rc_executor.py.Verify:
pytest tests/test_rc_llm.py tests/test_rc_prompts.py tests/test_rc_config.py tests/test_rc_executor.py