Skip to content

fix(cloud-agent-sdk): upgrade read-only CLI sessions to live when the CLI reports them active#4390

Open
iscekic wants to merge 1 commit into
mainfrom
fix/cli-session-readonly-upgrade
Open

fix(cloud-agent-sdk): upgrade read-only CLI sessions to live when the CLI reports them active#4390
iscekic wants to merge 1 commit into
mainfrom
fix/cli-session-readonly-upgrade

Conversation

@iscekic

@iscekic iscekic commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Problem

Reported: start a CLI session with /local-review-uncommitted, enable remote, open the session in the mobile app → the view is stuck on the first message and never updates.

Reproduced locally (CLI 7.3.54 against the local stack, mock app client speaking the real tRPC + WS protocol). The CLI↔backend plumbing is healthy in every variant — the bug is a race in session resolution:

  • resolveSession decides remote vs read-only from activeSessions.list, which reflects CLI heartbeats and is eventually consistent: after enabling remote it takes a WS connect + heartbeat round-trip (~12s measured locally) before the session shows up as active.
  • Opening the session inside that window resolves it read-only → snapshot-only historical transport, no live socket, and resolveSession only re-runs on a full remount. The screen freezes on whatever the snapshot contained — for a just-started review, exactly one message (the slash command).
  • All resolveSession failure paths silently map to read-only too (bare catch → read-only in the mobile manager, empty-list fallbacks in the router), producing the same permanent freeze on transient errors.

It presents as a /local-review-uncommitted bug because that's the fire-and-forget flow: start it, enable remote, immediately watch on the phone (inside the window), and keep the screen open for the multi-minute review. Interactive sessions mask the bug via remount → re-resolve.

Measured timeline from the repro (remote toggled mid-review, poller replaying the app's resolve logic every 2s):

12:15:48  remote toggled in TUI
12:15:48 → 12:15:58  resolve = READ-ONLY  (activeSessions.list empty)
12:16:00  resolve = REMOTE                (first CLI heartbeat registered)

Fix

When a session resolves read-only and a userWebConnection is available, arm a watcher: retain the connection (nothing else keeps it alive for a read-only session, and without it no heartbeats arrive) and listen for sessions.list / sessions.heartbeat naming the session. When the CLI reports it active, re-resolve and swap in the live transport. The watcher is disarmed on upgrade, reconnect, disconnect, and destroy.

This is in the shared cloud-agent-sdk session orchestrator, so web and mobile both get the fix, and it also covers the silent-failure→read-only paths.

Testing

  • Two new tests in session-routing.test.ts: heartbeat-driven upgrade (including ignoring other sessions' heartbeats and watcher cleanup on upgrade), and disarm-on-destroy.
  • Full cloud-agent-sdk suite: 21 suites, 740 tests pass.
  • tsgo typecheck (web + mobile), oxlint, oxfmt clean.

… CLI reports them active

resolveSession decides remote vs read-only from activeSessions.list, which
reflects CLI heartbeats and is eventually consistent: after enabling remote
on the CLI it takes a websocket connect plus a heartbeat round-trip (~12s
measured locally) before the session appears active. Opening the session in
the app inside that window resolved it read-only, which mounted the
snapshot-only historical transport and never re-evaluated - the view stayed
frozen on whatever the snapshot contained (for a just-started
/local-review-uncommitted run, only the first message).

When a session resolves read-only and a user-web connection is available,
keep the connection retained and watch sessions.list/sessions.heartbeat;
as soon as the CLI reports the session active, re-resolve and swap in the
live transport. This also covers resolveSession's silent failure paths
(transient tRPC errors map to read-only), which previously froze the screen
the same way.
@iscekic iscekic self-assigned this Jul 3, 2026
@iscekic iscekic requested a review from eshurakov July 3, 2026 15:55
Comment thread apps/web/src/lib/cloud-agent-sdk/session.ts
@kilo-code-bot

kilo-code-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Code Review Summary

Status: 1 Issue Found | Recommendation: Address before merge

Executive Summary

The new heartbeat-driven upgrade watcher in session.ts destroys the working read-only transport before confirming the re-resolve succeeds, with no fallback if resolveSession transiently fails.

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
apps/web/src/lib/cloud-agent-sdk/session.ts 337 Watcher-triggered re-resolve destroys the working read-only transport unconditionally; a transient resolveSession failure leaves the session in a hard error state with no fallback or re-armed watcher
Files Reviewed (2 files)
  • apps/web/src/lib/cloud-agent-sdk/session.ts - 1 issue
  • apps/web/src/lib/cloud-agent-sdk/session-routing.test.ts - 0 issues

Fix these issues in Kilo Cloud


Reviewed by claude-sonnet-5-20260630 · Input: 36 · Output: 21.5K · Cached: 1.3M

Review guidance: REVIEW.md from base branch main

@iscekic iscekic enabled auto-merge (squash) July 3, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant