Refactor: Arg dep API — primitive set_dependencies + ArgWithDeps<N> convenience layer#761
Conversation
There was a problem hiding this comment.
Code Review
This pull request replaces the Arg.add_dep method with Arg.set_dependencies, moving from a variadic addition model to a pointer-and-count array model for explicit task dependencies. This change removes the previous hard runtime limit on the number of dependencies per task and shifts storage ownership to the caller, requiring the dependency array to remain valid until the task is submitted. The update includes comprehensive changes to documentation, orchestration examples, and the runtime implementation for both a2a3 and a5 platforms. I have no feedback to provide.
|
@jvjhfhg What is the purpose of this API change? Is it intended to remove the |
@uv-xiao pypto team reported an inconvenience on the too low dependency count limit when manually managing dependencies. Basically it IS to remove In flexibility aspect, it's completely positive. But I do admit it could hurt convenience when hand-writing orchestration code. I'm considering providing a following porting struct and revive the template <size_t MAX_DEP_COUNT = 16>
struct ArgWithDeps {
PTO2TaskId deps[MAX_DEP_COUNT];
int count;
Arg arg;
}; |
I see. I agree that if the orch code will mainly be generated by pypto, the handwriting convenience doesn't matter much actually. And the proposed Thanks! |
- Take a caller-owned dependency array (ptr + count) instead of variadic PTO2TaskIds; lifts the hard PTO2_MAX_EXPLICIT_DEPS=16 runtime cap - Args stores (ptr, count) without copying, matching add_input/add_output lifetime semantics — the caller's array must outlive the submit - count == 0 explicitly clears any stored deps, so conditionally-built dep arrays can pass through unguarded; count > 0 is single-shot to preserve the no-accumulation invariant - Drop ExplicitDepStorage struct, PTO2_MAX_EXPLICIT_DEPS macro, and the a2a3 runtime/dep_gen static_assert (DEP_GEN_MAX_EXPLICIT_DEPS=16 is now a diagnostic-only truncation cap, unchanged) - Migrate the four paged_attention orchestration examples to build the dep set on the stack and call set_dependencies once - Update docs/manual-scope.md API, rules, and examples
- New header pto_arg_with_deps.h defines ArgWithDeps<N> (default N=16): private inheritance from Arg so set_dependencies/explicit_dep* stay hidden, with selective using-declarations exposing the Arg setter surface plus a variadic add_dep(...) that accumulates into a stack buffer; reset() clears both layers; finalize_for_submit() binds the buffer back via set_dependencies(ptr, count) and is idempotent so a wrapper can be re-submitted without tripping the single-shot check - pto_orchestration_api.h auto-includes the wrapper header at the bottom so orchestration sources keep a single include - rt_submit_task / rt_submit_aic_task / rt_submit_aiv_task gain overloads that accept ArgWithDeps<N>& and call finalize_for_submit() transparently, no caller-visible finalize step - Demonstrate both layers side-by-side in paged_attention_manual_scope (a2a3 and a5): params_sf keeps the primitive Arg+set_dependencies form, params_up switches to ArgWithDeps+add_dep, with comments marking each pattern's intended use case
1cda8c8 to
8ac1c83
Compare
Reworks the explicit-dependency API into two layers and lifts the previous hard cap on dependency count.
Primitive layer —
Arg::set_dependencies(const PTO2TaskId*, uint32_t)ptr + count) instead of variadicPTO2TaskIds, lifting the hardPTO2_MAX_EXPLICIT_DEPS = 16runtime capArgstores(ptr, count)without copying, matchingadd_input/add_outputlifetime semantics — the caller's array must outlive the submitcount == 0explicitly clears any stored deps, so conditionally-built dep arrays can pass through unguarded;count > 0is single-shot to preserve the no-accumulationinvariant
ExplicitDepStoragestruct,PTO2_MAX_EXPLICIT_DEPSmacro, and the a2a3 runtime/dep_genstatic_assert—DEP_GEN_MAX_EXPLICIT_DEPS = 16is now a diagnostic-onlytruncation cap, unchanged
docs/manual-scope.md(API, rules, examples)Convenience layer —
ArgWithDeps<N>(default N=16)A thin wrapper on top of the primitive layer that revives the previous
add_dep(...)ergonomics for hand-written orchestration.pto_arg_with_deps.h, auto-included at the bottom ofpto_orchestration_api.hso orchestration sources still need only one#includeArgwith selectiveusing-declarations exposes the Arg setter surface (add_input/add_output/add_inout/add_no_dep/add_scalar*/has_error/error_msg/launch_spec) while keepingset_dependenciesand theexplicit_dep*accessors unreachable on a wrapper instance — users cannot accidentallymix the two dep APIs on the same object
add_dep(...)accumulates into a stack-sized buffer of capacityN; overflow reports an error on the underlying Arg ("bump the template arg")reset()clears both layers;finalize_for_submit()is idempotent so a wrapper can be re-submitted without tripping the primitive layer's single-shot checkrt_submit_task/rt_submit_aic_task/rt_submit_aiv_taskoverloads acceptArgWithDeps<N>&and callfinalize_for_submit()transparently — no caller-visiblefinalize step
set_dependencies(ptr, count)directlyExample migration
paged_attention_manual_scope(both a2a3 and a5) further demonstrates both layers side-by-side:params_sfkeepsArg + set_dependencies,params_upswitches toArgWithDeps + add_dep, each with a comment marking its intended use casetests/st/{a2a3,a5}/.../dummy_task(introduced by Feat: dummy_task — dep-only task that bypasses AICore dispatch #754) migrated toset_dependenciesas part of the refactorVerification