Skip to content

feat: content-aware re-rank pass (Phase 2.3)#18

Merged
hallelx2 merged 3 commits into
mainfrom
feat/rerank-with-content
May 27, 2026
Merged

feat: content-aware re-rank pass (Phase 2.3)#18
hallelx2 merged 3 commits into
mainfrom
feat/rerank-with-content

Conversation

@hallelx2
Copy link
Copy Markdown
Owner

Summary

Adds a content-aware re-rank pass that runs after the retrieval Strategy.Select. Takes the strategy's selected section IDs + their loaded content and issues one LLM call to score each section (0-100) against the query. Sections are reordered descending by score; retrieval.rerank.top_k optionally truncates the survivors.

This rescues the failure mode where the strategy reasoned over title + summary + HyDE candidate questions alone and got fooled by surface-level lexical matches — reading the actual content closes the gap. One extra LLM call per query, ~3-5k input tokens, ~$0.0003 on gemini-2.5-flash for a 5-candidate × 2k-char call.

Design

  • pkg/retrieval/rerank.goReRanker with the same defensive shape as the planner: tolerant JSON parsing, retry on parse failure, graceful degradation to input order when retries are exhausted. Every input ID surfaces in the output exactly once — re-rank can reorder but never drops candidates.
  • pkg/config — new retrieval.rerank block (enabled, model, max_content_chars, top_k) plus VLE_RETRIEVAL_RERANK_* env overrides following the same pattern as the existing planning + agentic + answer_span env handling.
  • Server wiring — per-request enable_rerank body field on QueryRequest and AnswerRequest. When the pass runs, sections (in /v1/query) and citations (in /v1/answer) carry a score field. For /v1/answer the re-rank usage is added to the response's usage block.

Failure envelope

  • Re-rank is opt-in: defaults to enabled=false. Existing callers see zero behaviour change.
  • Transport / parse / hallucination failures NEVER drop sections. At worst the strategy's original order is preserved.
  • Per-request enable_rerank overrides the config so callers can experiment without a server restart.

Opt-in instructions

retrieval:
  rerank:
    enabled: true
    model: \"gemini-2.5-flash\"      # or your cheap/fast model
    max_content_chars: 2000
    top_k: 5                       # 0 = keep all candidates

Or per-request:

{
  \"document_id\": \"...\",
  \"query\": \"What was Apple's fiscal 2023 revenue?\",
  \"enable_rerank\": true
}

Test plan

  • go build ./... clean.
  • go vet ./... clean.
  • go test ./... — all green (existing tests still pass).
  • 16 new ReRanker table-driven tests: happy path / empty input / LLM transport failure / parse-retry exhaustion / bad-JSON-then-success / hallucinated IDs / missing IDs / duplicate IDs / negative-score clamp / prompt content + truncation / nil LLM no-op / nil receiver no-op / parser code-fence + leading-prose + empty input.
  • 4 new config tests: defaults, env override happy path, env disable of a YAML-set true, env rejects garbage numbers, validate rejects negatives.

hallelx2 added 3 commits May 27, 2026 02:09
…dates

Adds pkg/retrieval/rerank.go: a ReRanker that takes the strategy's
selected section IDs plus the first ~2000 chars of each section's body
and issues one short LLM call to score relevance on a 0-100 scale.
Sorted descending by score; ties preserve the strategy's original
order via a stable insertion sort.

Failure semantics deliberately tolerant — every input ID surfaces in
the output exactly once:

  - Empty candidate list short-circuits without an LLM call.
  - Transport failure returns input order + non-nil error.
  - All retries returning bad JSON degrade to input order + nil error
    (mirrors runSelectionWithRetry).
  - Hallucinated IDs are dropped silently.
  - Missing IDs receive score=0 at the bottom of the output.

Prompt scoring rubric is anchored to four bands so the model returns
coarse-but-meaningful integers rather than a uniform 50 across the
board: 90-100 directly answers, 60-89 contributes evidence, 30-59
tangential, 0-29 not useful. ParseReRank reuses the tolerant
code-fence + prose-stripping pattern from ParseSelection / ParsePlan.

16 table-driven tests cover happy path, empty input, transport
failure, parse-retry exhaustion, parse-then-succeed, hallucinated
IDs, missing IDs, duplicate IDs, negative-score clamp, prompt
content+truncation, nil LLM no-op, nil receiver no-op, and the parser
contract (code fence, leading prose, empty input).
…rrides

Adds ReRankBlock under RetrievalConfig with Enabled / Model /
MaxContentChars (default 2000) / TopK (default 0 = keep all) fields,
plus matching VLE_RETRIEVAL_RERANK_ENABLED, _MODEL,
_MAX_CONTENT_CHARS, and _TOP_K env overrides following the same shape
as the existing planning + agentic + answer_span env handling.

Re-rank is opt-in: defaults are Enabled=false / TopK=0 so existing
deployments behave identically. Per-request `enable_rerank` body field
(wired in the next commit) overrides this block when set.

Validate rejects negative MaxContentChars and TopK so a stray env or
YAML typo can't silently flip the model into "send no content" mode.
Zero is valid for both: TopK=0 means keep all candidates,
MaxContentChars=0 falls through to the ReRanker's compiled default.

Four new tests: default values, env override happy path, env disable
of a YAML-set true, env override rejects bad numeric values, and
validation of negative inputs.
Adds the per-request `enable_rerank` body field (pointer, so absent vs.
explicit false are distinguishable) on both QueryRequest and
AnswerRequest. When true, or when retrieval.rerank.enabled is on at
config level, the engine runs one extra LLM call after candidate
content is loaded but before answer-span extraction / synthesis.

The pass scores each candidate on 0-100 and reorders the slice
descending by score. retrieval.rerank.top_k (when non-zero) truncates
the survivors. Sections that ran through re-rank surface a `score`
field in /v1/query.sections[] and /v1/answer.citations[].

Failure semantics carried through from pkg/retrieval/rerank.go:

  - No LLM client or no ReRanker wired → re-rank silently skipped.
  - Empty candidate slice → no LLM call, no scores.
  - Transport / parse failure → warn-log, strategy order preserved.
  - Hallucinated IDs dropped; missing IDs surface unscored at the
    bottom. No candidate is ever lost.

For /v1/answer the ReRanker's Usage is accumulated into totalUsage so
the response's usage block reflects the real cost; the synthesis
prompt then sees the post-rerank top-k.

cmd/engine/main.go instantiates a ReRanker against the configured LLM
whenever a client is wired (mirrors the planner pattern) — opt-in
behaviour is purely flag-driven from there. Boot logs the model +
content cap + top-k when re-rank is enabled at config level.

openapi.yaml: enable_rerank field on QueryRequest + AnswerRequest;
score field on QuerySection + AnswerCitation. Schema descriptions
document the 0-100 scale and the "never drops sections" envelope.

config.example.yaml: documented retrieval.rerank block with cost +
opt-in commentary so operators understand why this is off by default.
Copilot AI review requested due to automatic review settings May 27, 2026 01:19
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @hallelx2, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Warning

Review limit reached

@hallelx2, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 41 minutes and 44 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9842ec9b-de4d-4b41-b05e-9367990b64b3

📥 Commits

Reviewing files that changed from the base of the PR and between 54ae0d5 and 478e578.

📒 Files selected for processing (8)
  • cmd/engine/main.go
  • config.example.yaml
  • internal/api/server.go
  • openapi.yaml
  • pkg/config/config.go
  • pkg/config/config_test.go
  • pkg/retrieval/rerank.go
  • pkg/retrieval/rerank_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/rerank-with-content

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hallelx2 hallelx2 merged commit 55dd9c1 into main May 27, 2026
6 of 9 checks passed
@hallelx2 hallelx2 deleted the feat/rerank-with-content branch May 27, 2026 01:20
@hallelx2 hallelx2 review requested due to automatic review settings May 27, 2026 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant