fix(runtime): unify task state lifecycle with idempotent READY CAS and RUNNING promotion#748
Open
Crystal-wzy wants to merge 2 commits into
Open
fix(runtime): unify task state lifecycle with idempotent READY CAS and RUNNING promotion#748Crystal-wzy wants to merge 2 commits into
Crystal-wzy wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces task cookie tracking in simulation and enhances the PTO2 scheduler's handling of MIX resource shapes and task state transitions. Key updates include the addition of sim_set_current_task_cookie, new queue management methods in PTO2SchedulerState, and improved synchronization logic for MIX tasks during completion and dispatch. Additionally, the non-profiling ready path now explicitly transitions tasks to the READY state. Feedback was provided regarding a potential performance regression in the push_ready_queue_batch implementation, which uses multiple single-item pushes instead of a single batch operation.
8975412 to
6dd079f
Compare
…tly-once enqueue ## Summary - Extract `enqueue_ready_once()` in `PTO2SchedulerState` that atomically CAS task_state PENDING→READY before pushing to ready queue, guaranteeing exactly-once enqueue and eliminating potential duplicate enqueue races (both profiling and non-profiling overloads) - Replace all inline ready-queue push sites in `wire_slot()` and `release_fanin_and_check_ready()` with `enqueue_ready_once()` calls - Add READY→RUNNING state transition via `mark_task_running_on_first_dispatch()` helper, called on first block dispatch in both `dispatch_shape()` and `drain_worker_dispatch()`, completing the PENDING→READY→RUNNING state machine - Add global `completed_tasks_` progress check in `resolve_and_dispatch()` idle loop to reset idle iterations when other threads make progress, preventing premature idle timeout - Update `NonProfilingReadyPathMarksReady` unit test to verify task_state transitions to READY after `release_fanin_and_check_ready()` (previously expected PENDING by design) - All changes applied symmetrically to both a2a3 and a5 scheduler paths ## Testing - [x] All unit tests pass - [x] Pre-commit hooks pass
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
enqueue_ready_once()to centralizePENDING → READYCAS andready-queue push into a single idempotent operation. Previously the
non-profiling path never transitioned
task_statetoREADY, leavingqueue state and task state inconsistent; multiple fanin observers could
also push duplicate READY notifications. The new helper guarantees
exactly-once enqueue via CAS, with both profiling and non-profiling
paths sharing the same logic.
push_ready_queue()/push_ready_queue_batch()helpers that wrapthe spin-retry loop, replacing raw
ready_queues[…].push()calls acrossorchestration, completion, dispatch, and local-buffer overflow paths for
consistent retry behavior.
mark_task_running_on_first_dispatch(): CASREADY → RUNNINGonthe first block dispatch (including the drain-dispatch path), completing
the
PENDING → READY → RUNNING → donestate machine that was previouslymissing the RUNNING step.
running_reg_task_idisAICPU_TASK_INVALID; when a running FIN is observed while a pendingdispatch exists, wait for the pending ACK/FIN before promoting, ensuring
the payload slot is hardware-latched.
resolve_and_dispatch: reset idlecounter when global
completed_tasks_advances or when another schedulerthread still has running cores, preventing premature stall timeouts in
multi-thread scheduling.
redundant with per-core logging; move
CoreTrackervariable after thering scan for narrower scope.
SPIN_WAIT_HINT()to ready-queueenqueue_posCAS retry loop forreduced contention on hardware.
test_task_state.cpp:NonProfilingReadyPathnow assertsPTO2_TASK_READYinstead ofPTO2_TASK_PENDING, matching the unifiedstate machine.
Testing
models/qwen3/14b/qwen3_14b_decode.pypasses after theREADY-state fix.
git diff --checkpasses for the runtime worktree.