Add: dep_gen deps.json v2 — tensor-annotated edges + differential replay#769
Merged
ChaoWao merged 1 commit intoMay 13, 2026
Merged
Conversation
There was a problem hiding this comment.
Code Review
This pull request upgrades the dependency graph generation to version 2, introducing tensor-level annotations such as shapes, offsets, and dtypes. The core replay logic now employs a dual-pass differential check between a canonical oracle and an annotated mirror to guarantee correctness. Corresponding updates were made to the visualization tools, including a new --show-tensor-info mode, and the test suite. Feedback suggests serializing 64-bit integers as strings in the JSON output to ensure interoperability with JavaScript parsers.
…fferential replay
Replay always emits the v2 schema with task IDs, the underlying tensors
they touch, and the (offset, shape, dtype) slice each edge represents.
v1 (task-pair-only) is gone — the runtime never writes it, downstream
tools (swimlane_converter, deps_to_graph, scene_test gate) reject any
non-2 version.
Self-checking dual-pass replay
- Per record, the host runs TWO parallel PTO2TensorMap instances. Oracle
drives the canonical compute_task_fanin template (unchanged runtime).
Annotated mirrors STEP A + STEP B inline against a second map, with a
wider callback that captures the matched PTO2TensorMapEntry + consumer
Tensor + arg index per emit.
- After both passes finish the record, the producer-id sets are
compared. Divergence -> LOG_ERROR with symmetric difference + return
non-zero, deps.json NOT written. Guarantee against silent shotgun
modifications: anyone who changes compute_task_fanin semantics trips
the gate immediately.
- INOUT+COVERED remove_entry is mirrored exactly so both maps stay
bit-equivalent for the next record.
v2 schema
- tasks[]: task_id, scope (auto/manual), args[] with per-slot
{idx, type, tensor_id, dtype, shape, offset}. OUTPUT slots omit
tensor info (zero blob at submit time); viewer backfills tensor_id
from downstream creator-source edges when unambiguous.
- tensors[]: stable FNV-1a 64-bit hash of (buffer_addr, version) as
tensor_id; raw_shapes describes the underlying buffer; per-slot
shape/offset describes the slice.
- edges[]: pred, succ, arg, source (explicit/creator/tensormap),
overlap (tensormap only), tensor_id + consumer slice + producer slice
(tensormap only). Distinct args/sources keep their own edges;
task-pair projection still satisfies fanout subset deps.
- All uint64 fields (task_id, tensor_id, pred, succ, buffer_addr) are
serialized as JSON STRINGS. tensor_id (FNV hash) and buffer_addr
routinely exceed Number.MAX_SAFE_INTEGER (2^53-1), which would
silently corrupt them in JavaScript-based JSON parsers. Python
consumers pass them through int(v) which handles either form.
Viewer: deps_to_graph.py
- --show-tensor-info renders each task as an HTML-table node:
input rows (top, blue) | identity header (core-type colored) |
output rows (bottom, orange). Each arg cell is a 4-line block:
"arg<i> <TYPE> <Tname>:<dtype>" + raw/shape/offset.
- Edges route producer:out_<idx>:e -> consumer:in_<arg>:w by matching
tensor_id; explicit edges render dashed grey; tensormap edges with
overlap != covered carry a small red label.
- Default mode (no flag) unchanged: bare shape nodes + bare arrows.
- uint64 fields (task_id, tensor_id) coerced int<-str once at ingestion
via _normalize_tensor_id alias of _normalize_task_id.
Other consumers
- swimlane_converter.py: reads v2 dict edges, projects to (pred, succ)
set for Perfetto flow events; warns + falls back to fanout[] on any
non-v2 file. normalize_pto2_task_id_int already handles string-encoded
uint64.
- test_dep_gen.py: asserts v2 schema, projects edges to task-pair set
for fanout subset deps check, validates tasks[] / tensors[] /
per-edge annotation completeness.
perf_to_mermaid removed
- The Mermaid timing-graph tool is superseded by deps_to_graph (the
structural counterpart, with --show-tensor-info for per-edge slice
info) and swimlane_converter (Perfetto flow events sourced from
deps.json since hw-native-sys#737). simpler_setup/tools/perf_to_mermaid.py and
all docs/__init__/README references are dropped in this commit.
docs/dfx/dep_gen.md, simpler_setup/tools/README.md,
docs/profiling-name-map.md
Schema (with JS precision rationale), differential-validation contract,
new viewer flag, and the deps_to_graph replacement for perf_to_mermaid.
Issue: hw-native-sys#666
de31792 to
54f88d1
Compare
6 tasks
ChaoWao
added a commit
to ChaoWao/simpler-fork
that referenced
this pull request
May 13, 2026
Each DFX capture pipeline (dep_gen / l2_swimlane / tensor_dump) ships
with a consumer script under simpler_setup/tools/. The scene test for
that pipeline now invokes the consumer against the artifact it just
produced, asserting exit code 0. If a future schema change breaks the
tool, the failure attributes to the same CI step that captured the
artifact rather than surfacing later as a silent tooling rot.
Smoke is exit-code-only — HTML / PDF / diagram content is NOT validated.
The contract is "does the tool still parse this schema", not "is the
rendered output correct".
Wiring
- simpler_setup/tools/_smoke.py: run_tool + has_binary helpers shared
by all DFX tests.
- tests/.../dfx/dep_gen/test_dep_gen.py: deps_to_graph smoked in both
default and --show-tensor-info modes (guarded by has_binary("dot")
so dev machines without graphviz skip cleanly).
- tests/.../dfx/l2_swimlane/test_l2_swimlane.py: swimlane_converter
smoked against l2_perf_records.json.
- tests/.../dfx/tensor_dump/test_tensor_dump.py: dump_viewer smoked
against the captured tensor_dump/ directory.
- pmu has no consumer tool; no smoke added (raw csv is the artifact).
sched_overhead_analysis is intentionally NOT smoked — it requires a
real device log and would false-positive on sim. Reserve for a future
hardware DFX smoke.
CI: graphviz installed on the github-hosted sim runners (both Linux
and macOS) so deps_to_graph can render. The self-hosted onboard a2a3
runner warns if graphviz is missing instead of failing — the runner
admin should install it for full coverage, otherwise the has_binary
guard skips that one smoke.
Issue: follow-up to hw-native-sys#769
ChaoWao
added a commit
to ChaoWao/simpler-fork
that referenced
this pull request
May 13, 2026
Each DFX capture pipeline (dep_gen / l2_swimlane / tensor_dump) ships
with a consumer script under simpler_setup/tools/. The scene test for
that pipeline now invokes the consumer against the artifact it just
produced, asserting exit code 0. If a future schema change breaks the
tool, the failure attributes to the same CI step that captured the
artifact rather than surfacing later as a silent tooling rot.
Smoke is exit-code-only — HTML / PDF / diagram content is NOT validated.
The contract is "does the tool still parse this schema", not "is the
rendered output correct".
Wiring
- simpler_setup/tools/_smoke.py: run_tool + has_binary helpers shared
by all DFX tests.
- tests/.../dfx/dep_gen/test_dep_gen.py: deps_to_graph smoked in both
default and --show-tensor-info modes (guarded by has_binary("dot")
so dev machines without graphviz skip cleanly).
- tests/.../dfx/l2_swimlane/test_l2_swimlane.py: swimlane_converter
smoked against l2_perf_records.json.
- tests/.../dfx/tensor_dump/test_tensor_dump.py: dump_viewer smoked
against the captured tensor_dump/ directory.
- pmu has no consumer tool; no smoke added (raw csv is the artifact).
sched_overhead_analysis is intentionally NOT smoked — it requires a
real device log and would false-positive on sim. Reserve for a future
hardware DFX smoke.
CI: graphviz installed on the github-hosted sim runners (both Linux
and macOS) so deps_to_graph can render. The self-hosted onboard a2a3
runner warns if graphviz is missing instead of failing — the runner
admin should install it for full coverage, otherwise the has_binary
guard skips that one smoke.
Issue: follow-up to hw-native-sys#769
ChaoWao
added a commit
that referenced
this pull request
May 13, 2026
Each DFX capture pipeline (dep_gen / l2_swimlane / tensor_dump) ships
with a consumer script under simpler_setup/tools/. The scene test for
that pipeline now invokes the consumer against the artifact it just
produced, asserting exit code 0. If a future schema change breaks the
tool, the failure attributes to the same CI step that captured the
artifact rather than surfacing later as a silent tooling rot.
Smoke is exit-code-only — HTML / PDF / diagram content is NOT validated.
The contract is "does the tool still parse this schema", not "is the
rendered output correct".
Wiring
- simpler_setup/tools/_smoke.py: run_tool + has_binary helpers shared
by all DFX tests.
- tests/.../dfx/dep_gen/test_dep_gen.py: deps_to_graph smoked in both
default and --show-tensor-info modes (guarded by has_binary("dot")
so dev machines without graphviz skip cleanly).
- tests/.../dfx/l2_swimlane/test_l2_swimlane.py: swimlane_converter
smoked against l2_perf_records.json.
- tests/.../dfx/tensor_dump/test_tensor_dump.py: dump_viewer smoked
against the captured tensor_dump/ directory.
- pmu has no consumer tool; no smoke added (raw csv is the artifact).
sched_overhead_analysis is intentionally NOT smoked — it requires a
real device log and would false-positive on sim. Reserve for a future
hardware DFX smoke.
CI: graphviz installed on the github-hosted sim runners (both Linux
and macOS) so deps_to_graph can render. The self-hosted onboard a2a3
runner warns if graphviz is missing instead of failing — the runner
admin should install it for full coverage, otherwise the has_binary
guard skips that one smoke.
Issue: follow-up to #769
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resolves #666 —
deps.jsonnow carries per-edge tensor metadata (offset/shape/dtype) and per-task input/output slot info, instead of just task→task IDs. Zero runtime changes; all work lives in the host-side replay + downstream tools.v2 schema (replaces v1, no fallback):
tasks[]— per task:task_id,scope,args[]with{idx, type, tensor_id, dtype, shape, offset}per slot. OUTPUT slots omit tensor info (zero blob at submit time).tensors[]— one entry per unique(buffer_addr, version); stable FNV-1atensor_id;raw_shapesfor the underlying buffer.edges[]—{pred, succ, arg, source}plustensor_id/consumer_shape/consumer_offset(non-explicit) andproducer_shape/producer_offset(source=tensormap).Self-checking replay — no shotgun-modification risk on
compute_task_fanin:PTO2TensorMapinstances run in lockstep.compute_task_fanintemplate (unchanged, zero runtime touch) and collects the producer-id set the runtime would have wired.PTO2TensorMapEntry&+ consumerTensor*+ arg index, capturing per-edge metadata.LOG_ERROR+ return non-zero,deps.jsonnot written. Anyone who later changescompute_task_faninsemantics will trip the gate immediately and know to mirror the change in the annotated pass.remove_entryis mirrored so both maps stay bit-equivalent for the next record.Viewer (
deps_to_graph.py --show-tensor-info):Replaces per-edge labels with HTML-table task nodes — input rows (blue) on top, identity header in the middle (core-type colored), output rows (orange) on the bottom. Each arg cell is a 4-line block:
arg<i> <TYPE> <Tname>:<dtype>+raw:/shape:/offset:. Edges routepred:out_<idx>:e → succ:in_<arg>:wby matchingtensor_id. OUTPUT slots backfill theirtensor_idfrom downstreamcreatoredges when unambiguous (marked?in the row).Test plan
pip install --no-build-isolation -e .builds clean on macOS arm64python test_dep_gen.py -p a2a3sim --enable-dep-gen(which auto-adds--enable-l2-swimlane) PASSED on a2a3sim — 5 tasks, 7 tensors, 6 annotated edges,fanout ⊆ depsgate greendeps_to_graph.pysmoke-tested in both plain and--show-tensor-infomodesswimlane_converter.load_deps_jsoncorrectly projects v2 →{pred → [succ]}Issue: #666