Skip to content

fix(app): make ctx.NewSession wait for agent idle (#63)#64

Open
ezynda3 wants to merge 1 commit into
masterfrom
fix/63-newsession-busy-race
Open

fix(app): make ctx.NewSession wait for agent idle (#63)#64
ezynda3 wants to merge 1 commit into
masterfrom
fix/63-newsession-busy-race

Conversation

@ezynda3

@ezynda3 ezynda3 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Description

Fix the v0.79.0 phase-handoff race: ctx.NewSession invoked from an
OnAgentEnd handler always failed with "cannot start new session while agent is busy" because AgentEnd is emitted from inside the agent loop —
before drainQueue clears app.busy — and any post-turn tooling
(formatters, on-save linters, hidden tool calls) keeps the busy window open
for seconds to minutes after that.

Instead of failing fast, RequestNewSessionFromExtension now blocks until
the agent is genuinely idle (up to a generous 10-minute internal timeout)
and then sends the NewSessionRequestEvent. The wait is built on a new
App.WaitForIdle(timeout) primitive backed by a per-busy-cycle idleCh
channel that is closed on every busy → idle transition. All existing busy
mutations are funnelled through a single setBusyLocked helper so the
channel always tracks the flag.

A sentinel ErrAgentBusy is shared between internal/app and
internal/extensions (and exposed to Yaegi as ext.ErrAgentBusy), so
extension authors can detect timeouts with errors.Is(err, ext.ErrAgentBusy)
instead of substring-matching the error message.

Before:

api.OnAgentEnd(func(_ ext.AgentEndEvent, ctx ext.Context) {
    go func() {
        // Always fails: "cannot start new session while agent is busy"
        _ = ctx.NewSession("next phase")
    }()
})

After: the same code just works — NewSession waits for the agent to
settle (including any post-turn hooks) and then switches sessions.

Fixes #63

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor / cleanup

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (godoc on
    ctx.NewSession, sentinel error doc, example comment update)
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective
  • New and existing unit tests pass locally with my changes
    (go test -race ./... clean, golangci-lint run clean)
  • Any dependent changes have been merged and published

Additional Information

Files changed

  • internal/app/app.goErrAgentBusy sentinel,
    DefaultNewSessionIdleWait, idleCh field, setBusyLocked chokepoint,
    WaitForIdle, idleSnapshot; RequestNewSessionFromExtension now waits
    for idle instead of failing fast.
  • internal/extensions/api.go — declare shared ErrAgentBusy; expand the
    NewSession godoc to describe the wait-then-send semantics and remind
    authors to call it from a goroutine.
  • internal/extensions/symbols.go — expose ext.ErrAgentBusy to Yaegi.
  • examples/extensions/phase-handoff.go — update the comment around the
    go func() { ctx.NewSession(...) }() call to reflect the new behaviour.
  • internal/app/app_test.go — 7 new regression tests covering the
    already-idle fast path, blocks-until-drain, timeout → ErrAgentBusy,
    zero-timeout-waits-indefinitely, app-close release, headless guard, and
    idleCh channel transitions.

Backward compatibility

  • The public/extension API surface is unchanged. ctx.NewSession now
    succeeds in cases that previously errored; it still returns the same
    error shapes for non-busy failure modes (no TUI attached, before-hook
    cancellation, session-file failure).
  • Old extensions that substring-matched "agent is busy" will continue
    to match because the returned error still contains that phrase
    (fmt.Errorf("cannot start new session: %w", ErrAgentBusy)), but new
    code should migrate to errors.Is(err, ext.ErrAgentBusy).

Summary by CodeRabbit

  • Bug Fixes

    • Fixed race conditions in automatic agent phase transitions for more reliable session handoffs.
    • Enhanced idle/busy state synchronization across concurrent operations.
  • Documentation

    • Clarified session switching behavior and agent busy state handling.
  • Tests

    • Added comprehensive test coverage for idle/busy signaling and phase transitions.

- Add ErrAgentBusy sentinel (shared between internal/app and
  internal/extensions) so callers can detect the busy condition with
  errors.Is instead of substring-matching the error message.
- Add App.WaitForIdle(timeout) backed by a per-busy-cycle idleCh closed
  by a new setBusyLocked chokepoint; all busy transitions now route
  through it to keep the channel in sync with the busy flag.
- Have RequestNewSessionFromExtension wait for idle (up to
  DefaultNewSessionIdleWait = 10m) instead of failing fast on IsBusy.
  This fixes the v0.79.0 phase-handoff race where OnAgentEnd fires from
  inside the agent loop, before drainQueue clears busy, so
  ctx.NewSession reliably failed with 'agent is busy'.
- Expose ext.ErrAgentBusy to Yaegi via symbols.go.
- Update NewSession godoc and phase-handoff example to document the new
  wait-then-send behavior.
- Add regression tests covering already-idle, blocks-until-drain,
  timeout, zero-timeout, app-close, headless guard, and idleCh
  transitions.

Fixes #63
@mark-iii-labs-huly

Copy link
Copy Markdown

Connected to Huly®: KIT-65

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Fixes a race in RequestNewSessionFromExtension (issue #63) by adding an idleCh channel to App that is closed on every busy→idle transition. A new setBusyLocked helper keeps busy and idleCh in sync across all call sites. WaitForIdle blocks on that channel with deadline/shutdown support. RequestNewSessionFromExtension now waits up to DefaultNewSessionIdleWait (10 min) instead of failing immediately. ErrAgentBusy is exported as a sentinel from both extensions and app packages and registered in the Yaegi symbol table.

Changes

Idle/busy signaling and NewSession race fix

Layer / File(s) Summary
ErrAgentBusy sentinel and public constants
internal/extensions/api.go, internal/extensions/symbols.go, internal/app/app.go
Defines ErrAgentBusy via errors.New, exports it in the Yaegi table, and re-exports it from app with DefaultNewSessionIdleWait = 10 * time.Minute.
idleCh field, setBusyLocked, idleSnapshot, and WaitForIdle
internal/app/app.go
Adds idleCh chan struct{} to App, initializes it closed in New, and implements setBusyLocked (re-arms channel on busy), idleSnapshot (atomic read), and a revised WaitForIdle with timeout/shutdown looping.
setBusyLocked propagation across all busy transitions
internal/app/app.go
Replaces every direct busy = true/false in RunWithFiles, SteerWithFiles, InterruptAndSend, CompactConversation, CompactAsync, releaseBusyAfterCompact, and drainQueue with setBusyLocked.
RequestNewSessionFromExtension wait-for-idle, updated docs, and example
internal/app/app.go, internal/extensions/api.go, examples/extensions/phase-handoff.go
RequestNewSessionFromExtension calls WaitForIdle and maps ErrAgentBusy into a descriptive error; Context.NewSession docs and the phase-handoff example are updated to reflect blocking semantics and goroutine usage.
Updated and new tests
internal/app/app_test.go
Migrates five compact/drain tests to setBusyLocked, and adds TestWaitForIdle_* suite (already-idle, blocks-until-drain, timeout, zero-timeout, close-while-waiting), TestRequestNewSessionFromExtension_NoTUI, and TestBusyTransitionsSignalIdleCh.

Sequence Diagram(s)

sequenceDiagram
  participant Extension as Extension (OnAgentEnd)
  participant NewSession as ctx.NewSession
  participant RequestNewSession as RequestNewSessionFromExtension
  participant WaitForIdle
  participant drainQueue
  participant idleCh

  Extension->>NewSession: go ctx.NewSession(prompt)
  NewSession->>RequestNewSession: RPC call
  RequestNewSession->>WaitForIdle: WaitForIdle(10 min)
  WaitForIdle->>idleCh: select on idleCh / timer / rootCtx.Done
  drainQueue->>idleCh: setBusyLocked(false) → close(idleCh)
  idleCh-->>WaitForIdle: channel closed, idle
  WaitForIdle-->>RequestNewSession: nil
  RequestNewSession-->>NewSession: start new session
  NewSession-->>Extension: nil (success)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 A channel once closed, a race put to rest,
setBusyLocked guards each busy-state nest.
No more spinning loops with string-match and sleep —
WaitForIdle watches, the handoff runs deep.
From OnAgentEnd we goroutine away,
and ErrAgentBusy keeps false-starts at bay! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing ctx.NewSession to wait for agent idle, directly addressing the race condition described in issue #63.
Linked Issues check ✅ Passed The PR implements solutions 2 and 3 from issue #63: RequestNewSessionFromExtension waits internally with a 10-minute timeout and exports ErrAgentBusy sentinel error for proper detection.
Out of Scope Changes check ✅ Passed All changes directly support the linked objective: core idle-wait mechanism in app.go, channel synchronization, test coverage, API documentation, and Yaegi export.
Docstring Coverage ✅ Passed Docstring coverage is 93.75% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/63-newsession-busy-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/app/app.go (1)

219-224: ⚡ Quick win

Wrap shutdown error with call-site context in WaitForIdle.

Line 223 returns a.rootCtx.Err() directly; wrapping preserves errors.Is while making upstream failures actionable.

Suggested patch
 		case <-a.rootCtx.Done():
 			if timer != nil {
 				timer.Stop()
 			}
-			return a.rootCtx.Err()
+			return fmt.Errorf("wait for idle interrupted by app shutdown: %w", a.rootCtx.Err())
 		}

As per coding guidelines, "Always check errors and wrap them with fmt.Errorf("context: %w", err)".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/app/app.go` around lines 219 - 224, The error returned from
a.rootCtx.Err() in the context cancellation case of the WaitForIdle method is
being returned directly without wrapping. Wrap this error using fmt.Errorf with
the %w verb to add contextual information about the shutdown, which makes the
error actionable for upstream callers while preserving errors.Is compatibility
and following the coding guidelines that require all errors to be wrapped with
context.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/app/app.go`:
- Around line 219-224: The error returned from a.rootCtx.Err() in the context
cancellation case of the WaitForIdle method is being returned directly without
wrapping. Wrap this error using fmt.Errorf with the %w verb to add contextual
information about the shutdown, which makes the error actionable for upstream
callers while preserving errors.Is compatibility and following the coding
guidelines that require all errors to be wrapped with context.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f9986381-5ed6-4d80-a55c-3b501840bc59

📥 Commits

Reviewing files that changed from the base of the PR and between d2e2e5e and adefd55.

📒 Files selected for processing (5)
  • examples/extensions/phase-handoff.go
  • internal/app/app.go
  • internal/app/app_test.go
  • internal/extensions/api.go
  • internal/extensions/symbols.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ctx.NewSession from OnAgentEnd always races against app.busy (v0.79.0 phase-handoff pattern is unreliable)

1 participant