Skip to content

XTaskQueueSubmitDelayedCallback can lose its wake and strand delayed work until unrelated activity occurs #973

@jhugard

Description

@jhugard

Summary

Callbacks submitted through XTaskQueueSubmitDelayedCallback can remain
pending well past their requested delay, or indefinitely until unrelated
queue activity happens to wake the delayed port.

The failure mode here is a lost wake: the callback is never re-armed after a
same-port interleaving leaves the delayed queue with pending future work but
no active wake source.

This affects any code that relies on delayed callbacks for timeouts, retry
back-off, debouncing, or deferred continuations. In consumer code the symptom is
that a timeout or deferred action never fires when it should, even though no
cancellation occurred.

Public API affected

STDAPI XTaskQueueSubmitDelayedCallback(
    _In_ XTaskQueueHandle queue,
    _In_ XTaskQueuePort port,
    _In_ uint32_t delayMs,
    _In_opt_ void* callbackContext,
    _In_ XTaskQueueCallback* callback);

No signature change is required. The issue is a runtime violation of the
existing guarantee that delayed work will become dispatchable once its deadline
has elapsed.

Expected behavior

A callback submitted with delayMs = N should eventually become dispatchable
once at least N milliseconds have elapsed, without requiring any unrelated
queue submission, timer retarget, or termination event to wake the delayed port.

That should hold even when:

  • another delayed callback on the same port is being promoted concurrently
  • the pending delayed list becomes temporarily empty during promotion
  • a new future delayed callback is queued immediately after that empty sweep

Actual behavior

A same-port lost-wake race can strand delayed work:

  1. The delayed-callback promotion path sweeps the pending list and concludes
    there is no next future item.
  2. Before it publishes the empty state by clearing m_timerDue, another thread
    queues a future delayed callback on the same port.
  3. The new submitter observes the stale armed due time, decides it does not need
    to retarget the timer, and returns.
  4. The sweep then clears m_timerDue to UINT64_MAX, leaving the new pending
    entry with no timer armed.

At that point the delayed callback remains pending until some unrelated later
activity happens to wake the queue, or forever if no such wake occurs.

Reproduction conditions

The race is timing-sensitive and becomes easier to trigger under:

  • same-port delayed callback traffic where one callback is being promoted while
    another future callback is queued concurrently
  • workloads that repeatedly queue short delayed continuations on a manual or
    serialized port
  • test or production environments where no unrelated later delayed submission is
    guaranteed to repair the missed wake

Impact

  • Violates the practical scheduling contract of
    XTaskQueueSubmitDelayedCallback: the callback may not become dispatchable
    after its delay elapses.
  • Can cause hung timeouts, stalled retry loops, and deferred state transitions
    that never occur.
  • Is difficult to diagnose in production because any unrelated later delayed
    work can make the stranded callback appear to recover nondeterministically.

Affected area

The root cause is in the delayed-callback promotion logic in
Source/Task/TaskQueue.cpp, specifically the path that clears m_timerDue
after an empty sweep of the pending delayed list.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions