feat: content-aware re-rank pass (Phase 2.3)#18
Conversation
…dates
Adds pkg/retrieval/rerank.go: a ReRanker that takes the strategy's
selected section IDs plus the first ~2000 chars of each section's body
and issues one short LLM call to score relevance on a 0-100 scale.
Sorted descending by score; ties preserve the strategy's original
order via a stable insertion sort.
Failure semantics deliberately tolerant — every input ID surfaces in
the output exactly once:
- Empty candidate list short-circuits without an LLM call.
- Transport failure returns input order + non-nil error.
- All retries returning bad JSON degrade to input order + nil error
(mirrors runSelectionWithRetry).
- Hallucinated IDs are dropped silently.
- Missing IDs receive score=0 at the bottom of the output.
Prompt scoring rubric is anchored to four bands so the model returns
coarse-but-meaningful integers rather than a uniform 50 across the
board: 90-100 directly answers, 60-89 contributes evidence, 30-59
tangential, 0-29 not useful. ParseReRank reuses the tolerant
code-fence + prose-stripping pattern from ParseSelection / ParsePlan.
16 table-driven tests cover happy path, empty input, transport
failure, parse-retry exhaustion, parse-then-succeed, hallucinated
IDs, missing IDs, duplicate IDs, negative-score clamp, prompt
content+truncation, nil LLM no-op, nil receiver no-op, and the parser
contract (code fence, leading prose, empty input).
…rrides Adds ReRankBlock under RetrievalConfig with Enabled / Model / MaxContentChars (default 2000) / TopK (default 0 = keep all) fields, plus matching VLE_RETRIEVAL_RERANK_ENABLED, _MODEL, _MAX_CONTENT_CHARS, and _TOP_K env overrides following the same shape as the existing planning + agentic + answer_span env handling. Re-rank is opt-in: defaults are Enabled=false / TopK=0 so existing deployments behave identically. Per-request `enable_rerank` body field (wired in the next commit) overrides this block when set. Validate rejects negative MaxContentChars and TopK so a stray env or YAML typo can't silently flip the model into "send no content" mode. Zero is valid for both: TopK=0 means keep all candidates, MaxContentChars=0 falls through to the ReRanker's compiled default. Four new tests: default values, env override happy path, env disable of a YAML-set true, env override rejects bad numeric values, and validation of negative inputs.
Adds the per-request `enable_rerank` body field (pointer, so absent vs.
explicit false are distinguishable) on both QueryRequest and
AnswerRequest. When true, or when retrieval.rerank.enabled is on at
config level, the engine runs one extra LLM call after candidate
content is loaded but before answer-span extraction / synthesis.
The pass scores each candidate on 0-100 and reorders the slice
descending by score. retrieval.rerank.top_k (when non-zero) truncates
the survivors. Sections that ran through re-rank surface a `score`
field in /v1/query.sections[] and /v1/answer.citations[].
Failure semantics carried through from pkg/retrieval/rerank.go:
- No LLM client or no ReRanker wired → re-rank silently skipped.
- Empty candidate slice → no LLM call, no scores.
- Transport / parse failure → warn-log, strategy order preserved.
- Hallucinated IDs dropped; missing IDs surface unscored at the
bottom. No candidate is ever lost.
For /v1/answer the ReRanker's Usage is accumulated into totalUsage so
the response's usage block reflects the real cost; the synthesis
prompt then sees the post-rerank top-k.
cmd/engine/main.go instantiates a ReRanker against the configured LLM
whenever a client is wired (mirrors the planner pattern) — opt-in
behaviour is purely flag-driven from there. Boot logs the model +
content cap + top-k when re-rank is enabled at config level.
openapi.yaml: enable_rerank field on QueryRequest + AnswerRequest;
score field on QuerySection + AnswerCitation. Schema descriptions
document the 0-100 scale and the "never drops sections" envelope.
config.example.yaml: documented retrieval.rerank block with cost +
opt-in commentary so operators understand why this is off by default.
|
Warning Review limit reached
More reviews will be available in 41 minutes and 44 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Adds a content-aware re-rank pass that runs after the retrieval
Strategy.Select. Takes the strategy's selected section IDs + their loaded content and issues one LLM call to score each section (0-100) against the query. Sections are reordered descending by score;retrieval.rerank.top_koptionally truncates the survivors.This rescues the failure mode where the strategy reasoned over title + summary + HyDE candidate questions alone and got fooled by surface-level lexical matches — reading the actual content closes the gap. One extra LLM call per query, ~3-5k input tokens, ~$0.0003 on gemini-2.5-flash for a 5-candidate × 2k-char call.
Design
pkg/retrieval/rerank.go—ReRankerwith the same defensive shape as the planner: tolerant JSON parsing, retry on parse failure, graceful degradation to input order when retries are exhausted. Every input ID surfaces in the output exactly once — re-rank can reorder but never drops candidates.pkg/config— newretrieval.rerankblock (enabled,model,max_content_chars,top_k) plusVLE_RETRIEVAL_RERANK_*env overrides following the same pattern as the existing planning + agentic + answer_span env handling.enable_rerankbody field onQueryRequestandAnswerRequest. When the pass runs, sections (in/v1/query) and citations (in/v1/answer) carry ascorefield. For/v1/answerthe re-rank usage is added to the response'susageblock.Failure envelope
enabled=false. Existing callers see zero behaviour change.enable_rerankoverrides the config so callers can experiment without a server restart.Opt-in instructions
Or per-request:
{ \"document_id\": \"...\", \"query\": \"What was Apple's fiscal 2023 revenue?\", \"enable_rerank\": true }Test plan
go build ./...clean.go vet ./...clean.go test ./...— all green (existing tests still pass).