Add ATIF conversion pipeline#256
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands /iterate |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — the pipeline wires up correctly and all 596 tests pass. Three issues worth addressing before merge.
- is ATIF-in / ATIF-out (normalization only), but the name and the README pipeline diagram () imply it emits ADP std format. This naming mismatch will mislead future contributors. Rename it (or clarify in a comment and the diagram).
- does a wasteful round-trip through at the end; is sufficient since the copy is already a valid .
- The fallback in is a fragile heuristic — see inline comment.
[RISK ASSESSMENT]
- [Overall PR] Risk Assessment: 🟢 LOW — additive pipeline layer, backward-compatible, no existing behavior broken.
VERDICT: ✅ Worth merging with the heuristic fixed.
KEY INSIGHT: delegates to internally, so the ATIF stage currently inherits ADP normalization rather than preserving raw tool shapes — acceptable for now but worth a follow-up if per-dataset ATIF fidelity is ever needed.
This review was generated by an AI agent (OpenHands) on behalf of the user.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26885141828
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — The three previously flagged issues are partially addressed: normalize_atif_trajectory no longer re-validates, and the schema_version fallback is now exception-based. Two new issues and one outstanding naming concern are below.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26885838005
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — pipeline wires up correctly and previous review items are resolved. One new correctness bug found.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26886419503
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — pipeline wires up correctly, all 596 tests pass, and previous review items are resolved. Two new issues below.
This review was generated by an AI agent (OpenHands) on behalf of the user.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26887170488
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — pipeline wires up correctly and all 596 tests pass. Two new issues found.
Was this automated review useful? React with 👍 or 👎 to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26888057010
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — previous review items are all resolved and all 596 tests pass. One new documentation inconsistency below.
AGENTS.md line 146 — SFT generation command doesn't use the ATIF pipeline
The pipeline diagram (line 30) shows the canonical path as sample_atif.json → scripts/atif_to_std.py → agents/*/std_to_sft.py → sample_sft/, but the "Generate sample files" command at line 146 still reads from sample_std.json directly:
cat datasets/$MY_DATASET/sample_std.json | ... | python agents/openhands_v0/std_to_sft.py ...Both paths produce identical output (since std_to_sft.py auto-detects format), but contributors following AGENTS.md will never exercise the new ATIF-based SFT generation path. Consider updating to:
cat datasets/$MY_DATASET/sample_atif.json | python scripts/json_to_jsonl.py | python scripts/atif_to_std.py | python agents/openhands_v0/std_to_sft.py --is_web=no --api_env=execute_bash | python scripts/jsonl_to_json.py > datasets/$MY_DATASET/sample_sft/openhands_v0.jsonWas this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26888896141
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — pipeline wires up correctly, tests pass (596 passed, 97 skipped), and previous review items are resolved. One new issue found.
full_atif.jsonl is not gitignored — script_full.sh (line 9) generates datasets/$MY_DATASET/full_atif.jsonl, but .gitignore only covers full_raw.jsonl, full_std.jsonl, and full_sft.jsonl. A contributor who runs the full pipeline locally could accidentally commit large ATIF JSONL files. Add full_atif.jsonl to .gitignore beside the other full_*.jsonl patterns, and add it to the AGENTS.md "do not commit" list (line 44) alongside the existing full-corpus artifact names.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26889796205
Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed the ADP schema removal request in ff580a6. Summary:
Validation:
This PR comment was created by an AI agent (OpenHands) on behalf of the user. |
|
@OpenHands Can raw_to_standardized.py be deleted now given that the method for creating standardized format is now raw_to_atif and atif_to_std? Also, check to make sure that the changes to the resulting sft data are minimal compared to when the sft data was generated from the legacy adp format. |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed the
SFT comparison against the legacy ADP-generated data on
Changed SFT files observed in the existing ATIF PR output: Validation run:
Note: I first attempted This PR comment was created by an AI agent (OpenHands) on behalf of the user. |
|
Since my last summary, there were no additional code changes or follow-up actions. Final status:
Conciseness:
The work is complete. |
Co-authored-by: openhands <openhands@all-hands.dev>
|
I audited and regenerated the samples in Answer: no, the committed What changed:
SFT audit versus
So the regenerated output is not minimal compared to legacy ADP output. The earlier audit showing 76/86 unchanged was against the pre-regeneration artifacts; after forcing regeneration through the ATIF path, the deltas are much larger. The biggest count changes are in flattened web/AgentLab-style samples and datasets whose ATIF sample now preserves all raw sample rows (for example Validation run:
Dependency note: full This PR comment was created by an AI agent (OpenHands) on behalf of the user. |
|
@OpenHands tests are failing, please fix |
|
I'm on it! neubig can track my progress at all-hands.dev |
32736a3 to
af91d08
Compare
1af4324 to
925cf7e
Compare
Summary
raw -> raw_to_atif.py -> sample_atif.json -> atif_to_std.py -> sample_std.json -> shared std_to_sft.py -> sample_sft/*.terminal,python,file_editor,finish) while keeping agent-native rendering inside each agent converter.agents/*.api.py,raw_to_standardized.py, andstd_to_sft.pypatterns in favor of metadata-declared tools and shared SFT converters.sample_std.json, plus tests that enforce sample/stage alignment, metadata coherence, built-in/custom tool schemas, metadata-backed sample expectations, shared-script layout rules, SFT prompt hygiene, standardized tool names, and no dataset branching in agent SFT converters.metadata.json.Fixes #243.
Tests
uv run --with-requirements requirements.txt pre-commit run --all-files(passed)uv run --with-requirements requirements.txt python -m pytest -q(1065 passed, 1 skipped)sys.path.insert/sys.path.appendpath hacks; dataset-owned scripts live underdatasets/*; agent-owned SFT converters live underagents/*.7d7ebae:check-pr-artifacts,check_dataset_metadata,pre-commit, andtest (3.12)all passed.Notes / design decisions
raw_to_atif.pyprojects raw transcripts/tool calls into ATIF while preserving source semantics as much as practical.atif_to_std.pyremains ATIF-in/ATIF-out. Common alias/type normalization lives in shared ATIF utilities, with dataset characteristics recorded inmetadata.jsonrather than dataset-specific SFT converters.std_to_sft.pyconverters consume standardized ATIF without dataset-local SFT scripts. Dataset-specific behavior is represented through ATIF normalization and metadata-declared capabilities.sample_sft/openhands_sdk.jsonis the primary SFT sample checked across datasets; legacy v0 samples remain where present.This PR description was updated by an AI agent on behalf of the user.