feat: content-aware re-rank pass (Phase 2.3) by hallelx2 · Pull Request #18 · hallelx2/vectorless-engine

hallelx2 · 2026-05-27T01:19:24Z

Summary

Adds a content-aware re-rank pass that runs after the retrieval Strategy.Select. Takes the strategy's selected section IDs + their loaded content and issues one LLM call to score each section (0-100) against the query. Sections are reordered descending by score; retrieval.rerank.top_k optionally truncates the survivors.

This rescues the failure mode where the strategy reasoned over title + summary + HyDE candidate questions alone and got fooled by surface-level lexical matches — reading the actual content closes the gap. One extra LLM call per query, ~3-5k input tokens, ~$0.0003 on gemini-2.5-flash for a 5-candidate × 2k-char call.

Design

pkg/retrieval/rerank.go — ReRanker with the same defensive shape as the planner: tolerant JSON parsing, retry on parse failure, graceful degradation to input order when retries are exhausted. Every input ID surfaces in the output exactly once — re-rank can reorder but never drops candidates.
pkg/config — new retrieval.rerank block (enabled, model, max_content_chars, top_k) plus VLE_RETRIEVAL_RERANK_* env overrides following the same pattern as the existing planning + agentic + answer_span env handling.
Server wiring — per-request enable_rerank body field on QueryRequest and AnswerRequest. When the pass runs, sections (in /v1/query) and citations (in /v1/answer) carry a score field. For /v1/answer the re-rank usage is added to the response's usage block.

Failure envelope

Re-rank is opt-in: defaults to enabled=false. Existing callers see zero behaviour change.
Transport / parse / hallucination failures NEVER drop sections. At worst the strategy's original order is preserved.
Per-request enable_rerank overrides the config so callers can experiment without a server restart.

Opt-in instructions

retrieval:
  rerank:
    enabled: true
    model: \"gemini-2.5-flash\"      # or your cheap/fast model
    max_content_chars: 2000
    top_k: 5                       # 0 = keep all candidates

Or per-request:

{
  \"document_id\": \"...\",
  \"query\": \"What was Apple's fiscal 2023 revenue?\",
  \"enable_rerank\": true
}

Test plan

go build ./... clean.
go vet ./... clean.
go test ./... — all green (existing tests still pass).
16 new ReRanker table-driven tests: happy path / empty input / LLM transport failure / parse-retry exhaustion / bad-JSON-then-success / hallucinated IDs / missing IDs / duplicate IDs / negative-score clamp / prompt content + truncation / nil LLM no-op / nil receiver no-op / parser code-fence + leading-prose + empty input.
4 new config tests: defaults, env override happy path, env disable of a YAML-set true, env rejects garbage numbers, validate rejects negatives.

…dates Adds pkg/retrieval/rerank.go: a ReRanker that takes the strategy's selected section IDs plus the first ~2000 chars of each section's body and issues one short LLM call to score relevance on a 0-100 scale. Sorted descending by score; ties preserve the strategy's original order via a stable insertion sort. Failure semantics deliberately tolerant — every input ID surfaces in the output exactly once: - Empty candidate list short-circuits without an LLM call. - Transport failure returns input order + non-nil error. - All retries returning bad JSON degrade to input order + nil error (mirrors runSelectionWithRetry). - Hallucinated IDs are dropped silently. - Missing IDs receive score=0 at the bottom of the output. Prompt scoring rubric is anchored to four bands so the model returns coarse-but-meaningful integers rather than a uniform 50 across the board: 90-100 directly answers, 60-89 contributes evidence, 30-59 tangential, 0-29 not useful. ParseReRank reuses the tolerant code-fence + prose-stripping pattern from ParseSelection / ParsePlan. 16 table-driven tests cover happy path, empty input, transport failure, parse-retry exhaustion, parse-then-succeed, hallucinated IDs, missing IDs, duplicate IDs, negative-score clamp, prompt content+truncation, nil LLM no-op, nil receiver no-op, and the parser contract (code fence, leading prose, empty input).

…rrides Adds ReRankBlock under RetrievalConfig with Enabled / Model / MaxContentChars (default 2000) / TopK (default 0 = keep all) fields, plus matching VLE_RETRIEVAL_RERANK_ENABLED, _MODEL, _MAX_CONTENT_CHARS, and _TOP_K env overrides following the same shape as the existing planning + agentic + answer_span env handling. Re-rank is opt-in: defaults are Enabled=false / TopK=0 so existing deployments behave identically. Per-request `enable_rerank` body field (wired in the next commit) overrides this block when set. Validate rejects negative MaxContentChars and TopK so a stray env or YAML typo can't silently flip the model into "send no content" mode. Zero is valid for both: TopK=0 means keep all candidates, MaxContentChars=0 falls through to the ReRanker's compiled default. Four new tests: default values, env override happy path, env disable of a YAML-set true, env override rejects bad numeric values, and validation of negative inputs.

Adds the per-request `enable_rerank` body field (pointer, so absent vs. explicit false are distinguishable) on both QueryRequest and AnswerRequest. When true, or when retrieval.rerank.enabled is on at config level, the engine runs one extra LLM call after candidate content is loaded but before answer-span extraction / synthesis. The pass scores each candidate on 0-100 and reorders the slice descending by score. retrieval.rerank.top_k (when non-zero) truncates the survivors. Sections that ran through re-rank surface a `score` field in /v1/query.sections[] and /v1/answer.citations[]. Failure semantics carried through from pkg/retrieval/rerank.go: - No LLM client or no ReRanker wired → re-rank silently skipped. - Empty candidate slice → no LLM call, no scores. - Transport / parse failure → warn-log, strategy order preserved. - Hallucinated IDs dropped; missing IDs surface unscored at the bottom. No candidate is ever lost. For /v1/answer the ReRanker's Usage is accumulated into totalUsage so the response's usage block reflects the real cost; the synthesis prompt then sees the post-rerank top-k. cmd/engine/main.go instantiates a ReRanker against the configured LLM whenever a client is wired (mirrors the planner pattern) — opt-in behaviour is purely flag-driven from there. Boot logs the model + content cap + top-k when re-rank is enabled at config level. openapi.yaml: enable_rerank field on QueryRequest + AnswerRequest; score field on QuerySection + AnswerCitation. Schema descriptions document the 0-100 scale and the "never drops sections" envelope. config.example.yaml: documented retrieval.rerank block with cost + opt-in commentary so operators understand why this is off by default.

sourcery-ai

Sorry @hallelx2, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

coderabbitai · 2026-05-27T01:19:30Z

Warning

Review limit reached

@hallelx2, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 41 minutes and 44 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9842ec9b-de4d-4b41-b05e-9367990b64b3

📥 Commits

Reviewing files that changed from the base of the PR and between 54ae0d5 and 478e578.

📒 Files selected for processing (8)

cmd/engine/main.go
config.example.yaml
internal/api/server.go
openapi.yaml
pkg/config/config.go
pkg/config/config_test.go
pkg/retrieval/rerank.go
pkg/retrieval/rerank_test.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/rerank-with-content

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

hallelx2 added 3 commits May 27, 2026 02:09

Copilot AI review requested due to automatic review settings May 27, 2026 01:19

sourcery-ai Bot reviewed May 27, 2026

View reviewed changes

Copilot started reviewing on behalf of hallelx2 May 27, 2026 01:19 View session

hallelx2 merged commit 55dd9c1 into main May 27, 2026
6 of 9 checks passed

hallelx2 deleted the feat/rerank-with-content branch May 27, 2026 01:20

hallelx2 review requested due to automatic review settings May 27, 2026 01:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: content-aware re-rank pass (Phase 2.3)#18

feat: content-aware re-rank pass (Phase 2.3)#18
hallelx2 merged 3 commits into
mainfrom
feat/rerank-with-content

hallelx2 commented May 27, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

coderabbitai Bot commented May 27, 2026

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hallelx2 commented May 27, 2026

Summary

Design

Failure envelope

Opt-in instructions

Test plan

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 27, 2026

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant