Skip to content

fix: improve daemon diagnostics and remove compact snapshots#786

Merged
thymikee merged 6 commits into
mainfrom
codex/daemon-startup-diagnostics
Jun 12, 2026
Merged

fix: improve daemon diagnostics and remove compact snapshots#786
thymikee merged 6 commits into
mainfrom
codex/daemon-startup-diagnostics

Conversation

@thymikee

@thymikee thymikee commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

Improves daemon startup failure evidence for isolated replay/test and normal daemon startup paths.

Before: a daemon that exited before publishing metadata collapsed into a generic metadata wait failure.
After: startup failures include the state dir, daemon metadata paths, daemon log path, child process pid/exit evidence, cleanup results, and a bounded daemon log tail when available.

Also removes compact snapshot mode end to end across CLI/client contracts, daemon flags, iOS runner contracts/plans, replay serialization, and Android snapshot filtering. The legacy -c token is still accepted only for snapshot/diff as a no-op passthrough so old commands do not fail, but no snapshot request forwards compact behavior. ADR 0004 is restored as an amended record for the surviving iOS snapshot backend strategy.

The iOS query-sweep recovery tier budget is relaxed from 1s to 3s so degraded-but-enumerable simulator trees can complete in the regular visible capture plan instead of prematurely falling through to lower-fidelity recovery.

Closes #687

Touched files: 75

Validation

After rebasing onto latest origin/main, passed direct oxfmt --write, focused Vitest coverage for parser/replay/capture/selector/snapshot paths (14 files, 394 tests), and pnpm check:quick. After reverting unrelated fixture-app copy/formatting noise, src/platforms/ios/__tests__/runner-client.test.ts passed in isolation and pnpm check:quick passed again; the full focused rerun had only runner-client cache/timing failures that passed on immediate isolated rerun.

Before the rebase, also passed pnpm build, pnpm build:xcuitest, and live iOS simulator checks on 2AF9D7DF-516A-47F5-8D4B-4EA0600E611E: prepare ios-runner, Settings snapshot -i and legacy snapshot -i -c returned equivalent healthy tree-backed output, Settings replay passed, and sampled Settings click/back interactions completed successfully.

Test-app replay evidence: gesture-lab.ad passed; checkout-form.ad still failed at the existing iOS keyboard dismiss unsupported-operation step after successful app launch and fills, so it remains residual fixture/runtime risk rather than validation for this PR.

SkillGym remains blocked in this sandbox because the runner environment requires external Codex/Claude runners with network access.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

Size Report

Metric Base Current Diff
JS raw 1.3 MB 1.3 MB +459 B
JS gzip 409.8 kB 410.0 kB +211 B
npm tarball 543.2 kB 543.1 kB -102 B
npm unpacked 1.8 MB 1.8 MB -1.1 kB

Startup median (7 runs, lower is better):

Scenario Base Current Diff
CLI --version 26.6 ms 26.6 ms +0.0 ms
CLI --help 51.4 ms 51.5 ms +0.1 ms

Top changed chunks:

Chunk Raw diff Gzip diff
dist/src/9542.js +1.2 kB +376 B
dist/src/6133.js -287 B -72 B
dist/src/2415.js -179 B -46 B
dist/src/123.js -106 B -34 B
dist/src/args.js +55 B +28 B

@thymikee thymikee force-pushed the codex/daemon-startup-diagnostics branch from cfb8a78 to 943ee74 Compare June 12, 2026 13:56
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-12 15:33 UTC

@thymikee

Copy link
Copy Markdown
Member Author

Review — one P1 (docs governance), two P2s; behavior verified good live

Reviewed both halves, with live validation of the -c removal on a simulator (worktree build of 943ee74d0; typecheck, lint, fallow, full unit suite 261 files / 2525 tests all green).

What I verified works

  • Healthy path: Settings -ihealthy/tree, 19 nodes, 1.6s.
  • Modal-overlay class (the production-login shape): warm -i → recovers via the query tier in ~1.5s, 6–7 nodes with honest sparse verdict + warning — equivalent to what -i -c delivered before, so this app class loses nothing. (The 90s cold first-snapshot stall I hit exists on main too — tree-backend first-AX-settle on this app, pre-existing, not this PR.)
  • Compat shim: -c parses and behaves exactly as -i for snapshot/diff; unknown flags still rejected (-z errors); the shim is narrowly scoped in isLegacyIgnoredSnapshotShortFlag. No residual -i -c guidance anywhere on the branch (grep-verified across skills/docs/hints).
  • Perf rationale checks out: the timing evidence matches what we measured during the verdict work — the compact flat sweep was chronically deadline-bound (3–7s with recovery) while the regular tree path is sub-second on healthy apps. The query sweep correctly survives as a recovery tier in regularVisiblePlan.
  • Diagnostics half: runCmdDetachedMonitored + ready/early-exit/timeout race + bounded 64KB log tail is clean, uses exec.ts per convention, and the early-exit test writes through the real stderr fd. Nice.

Findings

  1. P1 — ADR 0004 is deleted, not superseded. The ADR documents the entire snapshot backend strategy: the regular/raw strategy split, the Bluesky-class failure context, the capture-plan implementation note, and the future simulator AX-service direction — all of which remain implemented and load-bearing after this PR. Deleting the whole record because one of three strategies retired leaves the surviving architecture with no decision history, and future architecture reviews will re-litigate it. Restore it amended (mark the compact strategy retired with this PR's perf rationale) or supersede it with a new ADR — either is a 10-minute fix.

  2. P2 — Android's compact semantics removed without mention. ui-hierarchy.ts had real, working -c behavior on Android (meaningful-text/id/hittable filter). The slowness rationale is iOS-only (flat-sweep deadline); Android -c users silently get larger snapshots. Dropping it for one-flag-one-meaning consistency is defensible — but the PR body should say so, and the squash message should mention the compact removal at all (the title only covers the diagnostics half).

  3. P2 — the swallow is silent. An agent or script passing -c gets no signal that semantics changed. Suggest a one-line notice on use ("-c is deprecated and ignored; snapshot -i is the default surface") — cheap, and saves the next person a confused bisect. Related nit: replay scripts parse -c but no longer serialize it, so a parse→serialize round-trip silently rewrites old scripts; fine, but worth one sentence in the body.

Verdict: good to merge once the ADR is restored-as-amended; 2 and 3 can ride along or follow up.

@thymikee thymikee force-pushed the codex/daemon-startup-diagnostics branch from 943ee74 to e4c7910 Compare June 12, 2026 14:20
@thymikee thymikee changed the title fix: improve daemon startup diagnostics fix: improve daemon diagnostics and remove compact snapshots Jun 12, 2026
@thymikee thymikee force-pushed the codex/daemon-startup-diagnostics branch from e4c7910 to 924a33d Compare June 12, 2026 14:30
@thymikee

Copy link
Copy Markdown
Member Author

Re-review of the update — P1 resolved well; one scope flag on the new budget commit

Resolved:

  • P1 (ADR): better than asked — ADR 0004 restored as amended (two durable strategies, Bluesky context and the future AX-service direction preserved), plus new Regression Notes, a strategy-ownership "Consequences" section, and a README cross-link. This is exactly the record the surviving architecture needed.
  • P2 (Android disclosure): now explicit in the body ("…and Android snapshot filtering"). Remaining nit: the PR title still only covers the diagnostics half — worth fixing in the squash message at merge.
  • P2 (deprecation notice): not taken — fine, it was a suggestion.

New scope flag — perf: relax compact iOS snapshot budget (1.0s → 3.0s): this commit wasn't part of any review ask, has no rationale in the PR body, and its message is stale within its own stack ("compact budget" for a constant that, two commits later, only bounds the query-sweep recovery tier). That said, I measured it rather than just objecting: on the modal-overlay app class, warm -i with the 3s budget runs 0.55–1.2s, and the sweep now sometimes completes within budget (recovered/queries) instead of deadline-truncating into private-AX — so the change empirically reduces spurious private-AX recoveries, which I suspect was the motivation. The unmeasured residual is Bluesky-class trees, where queries genuinely crawl: those now pay up to +2s per degraded snapshot before the private tier. Still safely inside the 20s plan budget and 30s watchdog.

Ask: keep it if you add one sentence of rationale to the body and rename the commit (e.g. perf: relax query-sweep recovery budget) so the squash history doesn't reference a mode this same PR deletes — or split it out with its own measurement. Either way it shouldn't ship under a name that contradicts the stack.

With the ADR restored, nothing else blocks merge.

@thymikee thymikee force-pushed the codex/daemon-startup-diagnostics branch from 924a33d to 18bb5e4 Compare June 12, 2026 15:00
@thymikee thymikee merged commit 5c83fe4 into main Jun 12, 2026
20 checks passed
@thymikee thymikee deleted the codex/daemon-startup-diagnostics branch June 12, 2026 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve isolated replay/test daemon startup diagnostics

1 participant