Codex Lab MVP dogfood plan

## Finish Line

Codex Lab is dogfoodable as the active Codex-based coding CLI, with the restored Every Code history used as source material and with regression gates that protect skills behavior, prompt caching, thread continuity, and agent/review workflows before upstream merges or feature ports land.

## Current Status

State: Recovery pivot after discovering recent Every-Code-parity work may have used the wrong restored source tree. PR #125 merged updated repo instructions: `../code-prealign-new-skills/code-rs` is the authoritative restored Every Code product/UX source; `../code/code-rs` and `../code/codex-rs` are lineage/reference material only.

New active plan:
- #126 - Audit recent Codex Lab PRs against authoritative Every Code source. This is now the Now-focus recovery slice before more broad feature work.
- #115 - Codex Lab multi-auth MVP remains important, but is blocked by #126 until the auth/login/account-switching cluster has an audit verdict.

Current MVP sequencing:
1. #126 - Audit recent PRs/features that were intended to port Every Code behavior or fixtures, starting with the PR #63 auto-review persistence miss, then resuming the low-to-high PR audit.
2. #115 - Resume multi-auth only after the `/login`, add-account, account picker, and profile-switching behavior is reconciled against Every Code's actual flow.
3. #36 - Continue Code Bridge activation after bridge/browser behavior is classified as Every Code parity, Codex Lab-original, or needs rework.
4. #49/#50 - Keep Launchplane/app-server integration behind concrete product use and parity classification.
5. #93 - Exact/minimal prompt profiles remain valuable, especially for auth priming and non-code agents, but should follow the account/profile audit.

Current implementation stance:
- Codex Lab is a new fork with its own home/config/state folder, so accidental intermediate Codex Lab formats are not compatibility contracts.
- Prefer clean canonical implementations over legacy bridges, dual-read formats, feature flags, or retained dead code.
- When replacing a wrong or provisional local state shape, do not design indefinite migration compatibility unless the user explicitly decides the data is worth preserving.
- For local experimental state, quarantine/delete/ignore old shapes rather than carrying them in hot paths.
- Tests should protect the canonical target shape, not every transient shape written during the recovery period.

Recommended next action: implement the #126 PR #63 correction as a clean cutover to Every Code-shaped compact auto-review run metadata plus sidecar detail storage, without permanent support for old inline-finding Codex Lab formats.

Important audit stance:
- Do not assume passing CI proves product parity.
- Treat recent Every-Code-parity PRs as provisional until compared to `../code-prealign-new-skills/code-rs`.
- Use read-only agents for source inventory and PR-matrix review before corrective implementation.
- Keep corrective PRs small and evidence-backed.

## Source Material

Use restored `cbusillo/code` issues and commits as historical source material, not as automatic carry-forward status. Important restored planning issues include:

- cbusillo/code#397 - Stabilize Every Code parity on Codex CLI fork baseline.
- cbusillo/code#404 - Dogfood Parity 1: Daily Driver Baseline.
- cbusillo/code#386 - Inventory and port Every Code features onto Codex base.
- cbusillo/code#387 - Validate Codex-base Every Code in copied Codex Desktop app.
- cbusillo/code#388 - Decide remote inbox and Discord UI for Codex-base session continuity.
- cbusillo/code#400 - Auto Review proof metrics.
- cbusillo/code#399 - Code Bridge/browser parity.
- cbusillo/code#398 - Auto Drive parity.
- cbusillo/code#401 - Token/prompt-cache diagnostics.
- cbusillo/code#92 - Context source ledger and prompt observability.
- cbusillo/code#43 - Token efficiency and local LLM sandboxing.
- cbusillo/code#46 - De-duplicate prompt and skill injection overhead.
- cbusillo/code#49 - LM Studio readiness.
- cbusillo/code#212 - Natural agent delegation policy.

Recent restored commits worth mining include the Codex substrate migration, Codex Desktop startup compatibility, Every Code identity/config alignment, feature inventory, parity ledger, prompt-cache prefix preservation, auto-review proof metrics, auto-review lifecycle/store/ledger work, agent context file preloading, worktree decision gate hardening, and release/build runner fixes.

## Base Platform Decision

Codex Lab remains the product base. Build on the Codex CLI/Desktop/app-server substrate because the hard-to-recreate value is Codex Desktop compatibility, Codex iOS/mobile control of Desktop, Codex subscription auth, upstream `openai/codex` compatibility, and continuity with the Every Code Codex-base port path.

`opencode` and other coding harnesses are reference implementations, not the base. Steal ideas when they improve Codex Lab without breaking Codex compatibility and can be validated through the exec harness or focused tests.

## Guardrails

- Use fixtures/tests/concepts before broad implementation overlays where feasible.
- Treat missing Every Code behavior as unclassified until this plan or a child issue records Port, Rewrite, Covered, Defer, or Retire.
- Keep `openai` as fetch-only upstream.
- Avoid direct work on protected/default branches; use focused task branches and PRs for implementation.
- Every code-bearing port slice should include scoped validation evidence.
- The exec harness is mandatory for Codex Lab-specific regressions: skills prompt strength, cached-token stability, thread continuity, Desktop/app-server compatibility, and Every Code workflow expectations.
- Cache-sensitive scenarios should compare input tokens, cached input tokens, cache ratio, and normalized prompt-prefix stability against explicit baselines.

## MVP Slices

1. Repo and runner recovery.
   - Verify `cbusillo/codex-lab` runners after repo move.
   - Keep `cbusillo/code` runner-free unless a future archival workflow explicitly needs one.

2. Planning source of truth.
   - Make this issue the durable parent plan.
   - Create focused child issues for independent MVP slices.
   - Realign repo `AGENTS.md` so future agents find this issue instead of relying on local plan files.

3. Exec harness hardening and regression gates.
   - Fix known automation findings: cleanup `just` argument forwarding and exec-harness workflow path triggers.
   - Add `skills-cache-continuity` as the first cache-sensitive scenario.
   - Add deterministic fake-Responses coverage first, then local-LLM dogfood and frontier release variants.

4. Skills and prompt-cache protection.
   - Protect against skills becoming less directive or being treated as optional advice.
   - Protect against prompt-stack churn reducing cached-token reuse.

5. Auto Review proof loop.
   - Restore actionable review evidence without broad-review noise.
   - Mine restored Every Code auto-review lifecycle/store/ledger commits and issues before implementation.

6. Agents and third-party agent orchestration.
   - Rebuild configurable agent roles, third-party agents, local LLM roles, and review/validation loops on Codex Lab primitives.

7. Codex Desktop/app-server compatibility.
   - Validate Codex Lab in Desktop and keep app-server behavior upstream-shaped unless an additive extension is explicitly validated.

8. Code Bridge/browser and remote-control workflows.
   - Preserve or replace Code Bridge/browser/control capability with clear Desktop/app-server boundaries.

9. Auto Drive.
   - Rebuild Auto Drive on validated Codex thread/session/worktree/token primitives rather than copying old implementation wholesale.

10. Local LLM dogfood.
    - Make LM Studio/OpenAI-compatible local endpoints first-class for bounded dogfood basics, then graduate roles only after repeated proof.

## Opencode And Other References

`opencode` is the primary near-term reference for plugin/hooks ergonomics, named agents/subagents, LM Studio flow, provider/auth boundaries, permissions UX, run artifacts, GitHub automation presentation, prompt/context controls, and MCP/tool discovery.

Keep a lightweight watch on OpenHands, SWE-agent/mini-SWE-agent, Aider, Cline/Roo Code, Goose, and Continue for product architecture ideas. Separately, keep Terminal-Bench, promptfoo, Inspect AI, SWE-bench, Aider benchmarks, and BrowserGym/WebArena/OSWorld in mind for eval and grading ideas. These are references only.

## Next Actions

- Wait for the manually triggered `exec-harness` runner proof on `cbusillo/codex-lab` and record the result here.
- Create child issues for the first few MVP slices after confirming this parent issue shape.
- Update repo `AGENTS.md` to point future agents at this issue and labels such as `plan` and `codex-lab-mvp`.
- Start implementation with known exec-harness automation fixes, then `skills-cache-continuity`.

## Relationships

Sub-issues created from the 2026-06-12 plan review:

- #30 - MVP classification ledger for restored Every Code evidence.
- #31 - Fix exec-harness automation reliability gaps.
- #32 - Add exec-harness cache assertion primitives.
- #33 - Implement `skills-cache-continuity` exec-harness scenario.
- #34 - Validate Codex Desktop and app-server compatibility.
- #35 - Define Auto Review proof loop MVP.
- #36 - Scope minimal Code Bridge vertical slice.
- #37 - Rescope Auto Drive for Codex Lab MVP.

Related existing planning issues:

- #1 - Codex fork overlay capability map.
- #2 - Provider-agnostic worker decision spike.
- #4 - Codex Lab identity and channel naming.
- #22 - Fork CI posture and full-CI runner assumptions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex Lab MVP dogfood plan #28

Finish Line

Current Status

Source Material

Base Platform Decision

Guardrails

MVP Slices

Opencode And Other References

Next Actions

Relationships

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Codex Lab MVP dogfood plan #28

Description

Finish Line

Current Status

Source Material

Base Platform Decision

Guardrails

MVP Slices

Opencode And Other References

Next Actions

Relationships

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions