Skip to content

feat: query planning + multi-hop decomposition (Phase 2.1 + 2.2)#17

Merged
hallelx2 merged 3 commits into
mainfrom
feat/plan-and-decompose
May 27, 2026
Merged

feat: query planning + multi-hop decomposition (Phase 2.1 + 2.2)#17
hallelx2 merged 3 commits into
mainfrom
feat/plan-and-decompose

Conversation

@hallelx2
Copy link
Copy Markdown
Owner

@hallelx2 hallelx2 commented May 27, 2026

Summary

  • Adds a Phase 2.1 query planner (pkg/retrieval/plan.go): one short LLM call before retrieval that returns a structured Plan (intent, entities, expected document areas, multi-hop flag, sub-questions). Cached in a per-process LRU keyed on (query, model) so repeat questions don't burn budget; default capacity 128.
  • Adds a Phase 2.2 multi-hop decomposer (pkg/retrieval/decompose.go): when the plan is multi-hop, runs the wrapped Strategy once per sub-question and unions the per-sub-question selections in stable first-seen order. Strategy-agnostic — composes on top of single-pass, chunked-tree, agentic, or the cached wrapper. Falls through transparently when the plan is missing, non-multi-hop, or has no sub-questions.
  • Wires both into /v1/query and /v1/answer via a new enable_planning request-body field (per-request override) and a retrieval.planning config block (server-side default). The synthesis prompt grows a "Planner notes" section when a plan is present so synthesis sees the same structured understanding retrieval used. Responses surface the sanitised plan under a top-level plan key (omitempty).

Design rationale

  • Planning is its own short LLM call. Folding planner intent into the selection prompt is left for Phase 2.5; today the selection model still sees the original query unchanged. This keeps the change additive — a regression in the planner cannot degrade selection quality.
  • is_multi_hop is conservative by design. The prompt biases toward false: a single question that mentions two things is not multi-hop; a compound question that requires combining two distinct retrievals is. Over-firing here would double LLM cost without quality wins. The parser also self-corrects is_multi_hop=true with empty sub_questions back to false.
  • Cache miss must never block on a writer race. A sync.Mutex serialises writes for the same key so concurrent identical queries fold to one LLM call, but the underlying cache.LRU is mutex-guarded for atomicity. Cache failures (e.g. zero-capacity LRU) are silent — the next call simply re-issues the LLM call. Plans are returned as defensive copies so a caller mutating Entities or SubQuestions can't corrupt the cached entry.
  • Planner failures degrade gracefully. Persistent JSON-parse failures → nil plan, retrieval continues with the original query (same pattern as runSelectionWithRetry in single_pass.go). Transport errors are logged but not propagated to the HTTP layer — a planner blip should not 500 an otherwise-working retrieval request.
  • Per-request opt-in beats config. The body field is a pointer-bool so we can distinguish "absent" (fall back to config) from "explicit false" (force off). The Planner is instantiated whenever an LLM is configured, so opt-in callers work even with planning.enabled: false.

Risk envelope

  • Default disabled at both config (retrieval.planning.enabled: false) and per-request (enable_planning absent) levels. Existing callers see no behaviour change, no extra LLM calls, no extra latency.
  • Planner errors do not surface. Transport / parse failures → continue with original query.
  • Decomposer short-circuits on sub-question errors so retrieval bugs aren't silently masked by the multi-hop loop.
  • No changes to existing strategies. Planner and decomposer are pure additions composing on top of the existing Strategy / CostStrategy interface.

Test plan

  • go build ./... clean
  • go vet ./... clean
  • go test ./... all green (planner: 11 tests, decomposer: 9 tests, config: planning defaults + env override)
  • Planner: cache hit/miss, concurrent same-query dedup, retry-on-bad-JSON degrades to nil plan, retry-then-success, transport-error propagation, empty-query no-op, nil-planner safety, defensive cache copy
  • Decomposer: nil/non-multi-hop/empty-subs fall-through, per-sub-question dispatch order, union dedup with overlap, error short-circuit with partial usage, non-CostStrategy compatibility, end-to-end Planner+Decomposer over real SinglePass
  • Config: Retrieval.Planning defaults (Enabled=false, CacheSize=128, Decompose=true), env overrides (VLE_RETRIEVAL_PLANNING_*)
  • OpenAPI: Plan schema added; both Query/Answer request schemas grow enable_planning; both response schemas grow plan ($ref, omitempty)
  • Manual end-to-end with a live LLM (deferred to staging — opt-in flag means risk of regression is bounded)

Opt-in instructions

Per-request:

POST /v1/answer
{"document_id":"...", "query":"...", "enable_planning": true}

Server-wide (config.yaml):

retrieval:
  planning:
    enabled: true
    model: "gemini-2.0-flash"   # cheap/fast model for the short planning call
    cache_size: 128
    decompose: true

Or env:

VLE_RETRIEVAL_PLANNING_ENABLED=true
VLE_RETRIEVAL_PLANNING_MODEL=gemini-2.0-flash
VLE_RETRIEVAL_PLANNING_DECOMPOSE=true

DO NOT MERGE — review only.

Summary by CodeRabbit

Release Notes

  • New Features

    • Optional query planning: Enable per-request structured plan generation before retrieval via new enable_planning option
    • Multi-hop query decomposition for complex queries that break down into sub-questions
    • Plan details now included in API responses when planning is enabled
  • Configuration

    • New retrieval.planning configuration block with settings to enable planning, customize cache behavior, and control multi-hop decomposition

Review Change Stack

hallelx2 added 3 commits May 27, 2026 01:52
Adds pkg/retrieval/plan.go: one LLM call before retrieval that returns
a structured Plan (intent, entities, expected_doc_areas, is_multi_hop,
sub_questions). Cached on a per-(query, model) basis in an in-process
LRU (default 128 entries) so repeat questions don't burn budget.

Reuses the runSelectionWithRetry pattern from single_pass.go: persistent
JSON-parse failures degrade gracefully to a nil plan + nil error so the
caller continues with the original query. Transport errors still bubble.

The planning prompt biases conservatively on is_multi_hop — only flags
queries that genuinely need decomposition into distinct sub-retrieval
passes. The decomposer further self-corrects an is_multi_hop=true with
empty sub_questions back to false at parse time.
Adds pkg/retrieval/decompose.go: when a Plan has IsMultiHop=true and
non-empty SubQuestions, runs the wrapped Strategy once per sub-question
and returns the union of selected IDs in stable first-seen order. Each
sub-question is a tighter prompt than the compound original — the
selection LLM gets one thing to reason about instead of a multi-part
question.

Fall-through is transparent: nil plan, IsMultiHop=false, or empty
SubQuestions → delegate to Strategy.Select with the original query
unchanged. Callers can wire the decomposer unconditionally.

Aggregates Usage across sub-questions when the wrapped Strategy
implements CostStrategy. Non-CostStrategy fall-back works too (Usage
is zero in that case; selection behaviour is identical). Error on any
sub-question short-circuits and returns the partial Usage so retrieval
bugs aren't silently swallowed by the multi-hop loop.
Server-side opt-in for Phase 2.1 + 2.2. New PlanningBlock under
retrieval (enabled, model, cache_size, decompose; env: VLE_RETRIEVAL_PLANNING_*).
Default disabled at both config and per-request levels, so existing
callers see no behaviour change.

Wiring:
- api.Deps gains Planner + Planning fields. The body-level
  enable_planning field (pointer-bool to disambiguate absent from
  explicit-false) overrides the config block.
- handleQuery / handleAnswer route through a small set of helpers
  (runPlanner, runSelection, runSelectionWithUsage) that fold the
  Planner output into selection. Multi-hop plans go through a
  Decomposer wrapping the active Strategy when planning.decompose
  is true.
- /v1/answer's synthesis prompt grows a short "Planner notes" block
  (intent, entities, expected doc areas, sub-questions) when a plan
  is present, so the model reasons with the same understanding the
  retrieval pipeline used.
- Both endpoints surface the sanitised Plan in the response under
  "plan" (omitempty) when planning ran.
- cmd/engine instantiates a Planner whenever LLM is configured, so
  per-request opt-in still works even with planning.enabled=false.
- Planner transport errors are LOGGED but not propagated — a planner
  blip cannot 500 an otherwise-working retrieval request.

OpenAPI: Plan schema added; QueryRequest/AnswerRequest get
enable_planning; QueryResponse/AnswerResponse get plan ($ref Plan,
omitempty). config.example.yaml gets a documented retrieval.planning
block.

Tests: planning defaults + env-override coverage added to
pkg/config/config_test.go. All existing tests still pass.
Copilot AI review requested due to automatic review settings May 27, 2026 01:01
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8432ae0c-bb78-4ae6-a427-ca7bb44e8f14

📥 Commits

Reviewing files that changed from the base of the PR and between d92db83 and bb32d4e.

📒 Files selected for processing (10)
  • cmd/engine/main.go
  • config.example.yaml
  • internal/api/server.go
  • openapi.yaml
  • pkg/config/config.go
  • pkg/config/config_test.go
  • pkg/retrieval/decompose.go
  • pkg/retrieval/decompose_test.go
  • pkg/retrieval/plan.go
  • pkg/retrieval/plan_test.go

📝 Walkthrough

Walkthrough

This PR adds a query planning system that enables optional opt-in LLM-based query planning before retrieval. Clients can request structured plans (intent, entities, document areas, multi-hop sub-questions), which drive selection via optional multi-hop decomposition and inform answer synthesis. The planner caches per (query, model) with deduplication for concurrent identical queries, and the decomposer aggregates retrieval results across sub-questions in stable first-seen order.

Changes

Query Planning Feature

Layer / File(s) Summary
Configuration, data models, and API contracts
pkg/config/config.go, config.example.yaml, openapi.yaml
Introduces PlanningBlock with enabled/model/cache_size/decompose fields, Plan struct with intent/entities/document areas/multi-hop sub-questions, and extends QueryRequest/QueryResponse/AnswerRequest/AnswerResponse with enable_planning boolean and plan field in responses.
Planner LLM integration and caching
pkg/retrieval/plan.go, pkg/retrieval/plan_test.go
Implements Planner with LRU cache per (query, model), per-planner mutex for concurrent deduplication, JSON-mode LLM requests with retry on parse failures, robust ParsePlan that tolerates code fences/prose, and defensive clones on cache hit. Tests validate happy path, cache hits/misses, concurrency deduplication, retry behavior, transport errors, empty queries, and cache immutability.
Multi-hop decomposition for plan sub-questions
pkg/retrieval/decompose.go, pkg/retrieval/decompose_test.go
Introduces Decomposer wrapping a Strategy, executing it per Plan.SubQuestions with stable SectionID deduplication and Usage aggregation; falls through to single strategy call when plan is nil/non-multihop/empty. Tests verify fallthrough, per-sub-question dispatch, union deduplication, error short-circuiting, non-cost-strategy behavior, nil-strategy defense, and end-to-end planner+decomposer wiring.
API endpoint wiring for planning and selection
internal/api/server.go
Updates Deps with Planner and Planning config, adds enable_planning?: *bool to request parsing, replaces direct Strategy.Select calls with plan-aware runSelection and runSelectionWithUsage that optionally decompose multi-hop plans, includes plan in responses, passes plan to synthesiseAnswer for prompt augmentation, and provides helper functions (planningEnabled, runPlanner, runSelection, runSelectionWithUsage, shouldDecompose, writePlanHints).
Application initialization and dependency injection
cmd/engine/main.go
Initializes Planner in run() with configured/default planning model, sets cache size from config, logs planning enablement, and wires Planner and Planning config into api.Deps.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A planner arrives with a hop and a gleam,
Breaking queries to sub-questions, a clever retrieval scheme,
Cached plans speed the way, LLM calls unified,
Multi-hop decomposition—the search dream's amplified!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/plan-and-decompose

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hallelx2 hallelx2 merged commit 54ae0d5 into main May 27, 2026
4 of 8 checks passed
@hallelx2 hallelx2 deleted the feat/plan-and-decompose branch May 27, 2026 01:02
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @hallelx2, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@hallelx2 hallelx2 review requested due to automatic review settings May 27, 2026 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant