Skip to content

docs(spec-check): lumo-spec design doc#1

Closed
viktor-savchik-idf wants to merge 3 commits into
mainfrom
docs/spec-check-design
Closed

docs(spec-check): lumo-spec design doc#1
viktor-savchik-idf wants to merge 3 commits into
mainfrom
docs/spec-check-design

Conversation

@viktor-savchik-idf

@viktor-savchik-idf viktor-savchik-idf commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Drafts the design for lumo-spec (target 0.3.0): the third missing piece of the Lumo design audit — "does the design match the product requirements?".

Updated 2026-05-26: split from a 643-line monolithic doc into a tree of 9 files (README index + 8 sub-docs), matching the docs/design/multi-file-resolution/ precedent. Each sub-doc is under ~170 lines and linkable independently.

Structure

File Purpose
README.md TL;DR, non-goals, goals, invariants, phase plan, sub-doc index
01-inputs.md CLI shape, env vars, layout schema, spec format guarantees
02-algorithm.md Pipeline, LLM call shape, cost
03-outputs.md Finding envelope, id enum, severity, exit codes
04-sources.md Plugin contract, Confluence / Jira / Markdown specifics, ADF flattener
05-honesty.md Seven non-negotiable honesty contracts
06-testing.md Unit, replay, golden cases, dogfood, CI
07-risks.md Risks, mitigations, four open questions
08-phasing.md Per-PR scope, deliverable definition for 0.3.0

Key decisions

  • Auth: env vars (LUMO_*), not a config file. Matches OSS pattern users recognise from gh / aws / terraform.
  • LLM: Anthropic SDK by default (Haiku 4.5, temperature=0), with LUMO_ANTHROPIC_BASE_URL for any Anthropic-compatible proxy.
  • Sources in v1: Confluence (v2 API, native ADF), Jira (description + comments), Markdown (offline / OSS). Notion / Linear deferred.
  • Honesty contract: source field always llm-derived, every finding carries confidence + evidence (verbatim spec quote, validated as substring of the fetched spec).
  • Structured output via tool-use — eliminates JSON parse failure mode.
  • Phasing: 5 PRs (Markdown → Confluence → Jira → MCP → dogfood), each shipping as a 0.2.x patch toward 0.3.0.

What's deliberately out of scope

Read-only, no spec authoring, no multi-page consolidation, no Notion / Linear in v1, no design generation, no spec ↔ code diff (transitive via existing tools).

Open questions

Called out in 07-risks.md:

  1. Confluence v1 vs v2 API fallback.
  2. Image placeholder strategy.
  3. Spec caching on disk.
  4. Jira sub-task aggregation.

Review focus

Doc-only PR. Review the decisions (auth model, LLM provider story, honesty contract, source-plugin shape, phasing). Once merged, implementation PRs reference this doc and each phase ships independently.

Roadmap alignment

Matches the 2026-05-25 Phase 2 restructure (804cefe): lumo-spec is now the first new Phase 2 tool, ahead of lumo-tier (0.4.0), lumo-component (0.4.0), and multi-file AST resolution (0.5.0).

Drafts the design for lumo-spec (target 0.3.0): design vs.
requirements semantic check via Confluence / Jira / Markdown.

Decisions captured:
- Auth via env vars (LUMO_*), not config file — matches gh / aws
  pattern OSS users expect.
- LLM provider: Anthropic SDK by default (Haiku 4.5, temperature=0),
  with optional LUMO_ANTHROPIC_BASE_URL for proxies (LiteLLM /
  Bedrock).
- Source plugins in v1: Confluence (v2 API, ADF native), Jira
  (description + comments), Markdown (offline / OSS).
- Honesty contract: source field always 'llm-derived', every finding
  carries confidence + evidence (verbatim spec quote, validated as
  substring of fetched spec to guard against fabrication).
- Structured output via tool-use — eliminates JSON parse failure.
- Five-phase rollout: Markdown -> Confluence -> Jira -> MCP ->
  dogfood, one PR each, ships as 0.2.x patches toward 0.3.0.

Open questions called out (Confluence v1/v2 fallback, image
placeholder strategy, spec caching, sub-task aggregation) so they
get resolved before code, not during.
@viktor-savchik-idf viktor-savchik-idf self-assigned this May 26, 2026
@viktor-savchik-idf viktor-savchik-idf added the enhancement New feature or request label May 26, 2026
Replaces docs/design/spec-check.md (643 lines, one file) with
docs/design/spec-check/ — README index plus eight sub-docs by
concern, matching the docs/design/multi-file-resolution/ precedent.

Structure:
  README.md       — TL;DR, non-goals, goals, invariants, phase plan, sub-doc index
  01-inputs.md    — CLI, env vars, layout schema, spec format guarantees
  02-algorithm.md — pipeline, LLM call shape, cost
  03-outputs.md   — finding envelope, id enum, severity, exit codes
  04-sources.md   — plugin contract, Confluence/Jira/Markdown specifics, ADF flattener
  05-honesty.md   — seven non-negotiable contracts
  06-testing.md   — unit, replay, golden cases, dogfood, CI
  07-risks.md     — risks, mitigations, four open questions
  08-phasing.md   — per-PR scope, deliverable definition

Why: monolithic design docs become unreadable, unreviewable, and
stale fast. Each sub-doc here is under ~170 lines and linkable
independently. Index links to every sub-doc with a one-line purpose.

ROADMAP.md updated to point at the directory instead of the deleted
single file.
Audit pass removing invented quantities and unverified claims from
the design doc. Decisions stay firm; quantities become explicit TBDs
resolved during implementation / dogfood:

- Model choice (Haiku vs Sonnet): now an open question resolved in
  Phase 1 against golden cases, not assumed. Haiku tool-use capability
  at this complexity was never verified.
- Cost / token counts: removed fabricated $/check and token ranges.
  Measure during Phase 5 dogfood from real spec+layout sizes.
- Prompt-cache saving: dropped the '~80%' figure (Anthropic marketing,
  not our measurement).
- Spec length cap: was a round 32k guess; now derived from model
  context + real Plazo spec sizes.
- Fixture / golden-case counts: dropped '30 fixtures' / '12 cases';
  counts follow from the node-type set and finding-id enum.
- ADF port: stopped asserting the Plazo source is 'well-tested';
  re-test independently.
- LLM replay: stopped calling it a 'borrowed vcr.py pattern'; it's a
  design decision (custom shim or adapter) made in Phase 1.
- HTTP client: noted as deferred choice, not implied.
- Severity table: clarified product specs rarely use RFC 2119 wording,
  so LLM inference is the primary path, the keyword table the exception.

Added 'What this doc does NOT claim' to the README index so reviewers
see the TBDs up front.
@OneXeor

OneXeor commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Closing in favor of #4, which combines and rewrites this work as one OSS-ready PR with project-specific language removed.

@OneXeor OneXeor closed this Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants