Skip to content

fix(server): wire PageIndex strategy + answer endpoints into the DEPLOYED cmd/server binary#26

Merged
hallelx2 merged 4 commits into
mainfrom
feat/consolidate-server-redesign
May 28, 2026
Merged

fix(server): wire PageIndex strategy + answer endpoints into the DEPLOYED cmd/server binary#26
hallelx2 merged 4 commits into
mainfrom
feat/consolidate-server-redesign

Conversation

@hallelx2
Copy link
Copy Markdown
Owner

@hallelx2 hallelx2 commented May 28, 2026

The gap

Production deploys cmd/server (built from Dockerfile.server), which serves its API from internal/handler/router.go. But this cycle's redesign work — the pageindex retrieval strategy and the /v1/answer + /v1/answer/pageindex endpoints — landed only on the OTHER binary, cmd/engine (internal/api), which does not deploy. The redesign was therefore unreachable in production:

  • cmd/server::buildStrategy knew only single-pass | chunked-tree | agentic, so retrieval.strategy=pageindex silently fell through to chunked-tree.
  • internal/handler/router.go mounted no /v1/answer or /v1/answer/pageindex routes at all.

This PR makes all of it reachable on the deployed binary, adapting the standalone (single-tenant, nil-UUID org) implementations to cmd/server's multi-tenant model (org + store resolved from X-Vectorless-Org / X-Vectorless-Store).

What's now reachable in production

  • pageindex as a selection strategybuildStrategy gains the pageindex case. buildPageIndexStrategy is ported from cmd/engine, wiring the storage-backed PageLoader plus a DB-backed TOC provider that reads documents.toc_tree (degrading to the synthesised view when the column is NULL). The DB pool is threaded through buildStrategy.
  • POST /v1/answer — retrieval + per-section answer-span extraction + a synthesis LLM call → a quote-grounded answer with citations.
  • POST /v1/answer/pageindex — the PageIndex agentic loop end-to-end (structure → get_pages → done), with an opt-in reasoning trace and an SSE stream=true variant. Its per-request strategy copy gets an org-scoped TOC provider so get_document_structure reads only the requesting tenant's documents.toc_tree.

Per-request strategy override on /v1/query

/v1/query accepts an optional "strategy" field. When present it selects from a name-keyed map of strategies pre-built once at boot (chunked-tree, pageindex, agentic, single-pass), so selection is a map lookup, not a hot-path rebuild. Absent/empty → the configured default (every existing caller unchanged); unknown name → 400. This lets the benchmark A/B chunked-tree vs pageindex against the same running engine with no redeploy.

Divergence guard

router_parity_test.go walks the mounted chi router and fails if /v1/answer, /v1/answer/pageindex, or the /v1/query mount goes missing — so the two binaries can't silently diverge like this again.

handler.Deps additions

Strategies (override map), LLM, LLMModel, AnswerSpan, Answer, Replay, PageIndexStrategy, PageIndex — all wired at boot in cmd/server. Replay capture is shared with the existing /v1/replay endpoint via byte-identical response storage.

Test plan

  • go build ./... clean (both cmd/server and cmd/engine)
  • go vet ./... clean
  • go test ./... all green
  • Handler test proves /v1/query with {"strategy":"pageindex"} routes to the page-based strategy (and the default/unknown/nil-set paths)
  • Parity test asserts the answer endpoints are mounted

Default retrieval.strategy is left at chunked-tree; the bench drives pageindex per-request via the override. Does not touch pkg/ingest/ or pkg/parser/.

Summary by CodeRabbit

  • New Features
    • Added answer generation endpoints that synthesize responses based on document content
    • Answer endpoints support configurable retrieval strategies with per-request overrides
    • Streaming answer variant with optional reasoning traces available
    • Answers include citations with extracted quotes and page ranges for transparency
    • Configurable LLM defaults and answer tuning parameters
    • Replay storage for persisting answer responses

Review Change Stack

hallelx2 added 4 commits May 28, 2026 15:08
The deployed cmd/server binary's buildStrategy only knew single-pass,
chunked-tree, and agentic — so retrieval.strategy=pageindex silently
fell through to chunked-tree, leaving the page-based strategy
unreachable in production even though cmd/engine already supported it.

Thread the DB pool through buildStrategy and port buildPageIndexStrategy
from cmd/engine, wiring the storage-backed PageLoader plus a DB-backed
TOC provider that reads documents.toc_tree (degrading to the synthesised
view when the column is NULL).
Add an optional "strategy" field to the /v1/query body so a caller can
A/B retrieval strategies (e.g. chunked-tree vs pageindex) against the
SAME running engine without a redeploy — exactly what the benchmark
needs. Strategies are pre-built once at boot into a name-keyed map, so
per-request selection is a map lookup, not a rebuild on the hot path.

An absent or empty field uses the configured default, keeping every
existing caller unchanged. An unknown name returns 400 rather than
silently falling back. A test seam (treeLoader) lets the handler run
end-to-end without a live Postgres backend.
…d router

Both answer endpoints existed only on the standalone cmd/engine binary
(internal/api), so they were unreachable on the deployed cmd/server
binary (internal/handler) that production runs. Port them, adapting the
standalone single-tenant org to the deployed server's multi-tenant
model (org + store resolved from the X-Vectorless-Org / X-Vectorless-Store
headers).

- /v1/answer runs retrieval + per-section answer-span extraction + a
  synthesis LLM call, returning a quote-grounded answer with citations.
- /v1/answer/pageindex runs the PageIndex agentic loop end-to-end
  (structure -> get_pages -> done), with an opt-in reasoning trace and
  an SSE streaming variant. Its per-request strategy copy gets an
  org-scoped TOC provider so get_document_structure reads only the
  requesting tenant's documents.toc_tree.

Extend handler.Deps with the LLM client, default model, answer-span /
answer config, the replay store, and the dedicated PageIndex strategy,
and wire them at boot in cmd/server. Replay capture is shared with the
existing /v1/replay endpoint via byte-identical response storage.
Add a parity test that walks the mounted chi router and fails if
/v1/answer or /v1/answer/pageindex (or the /v1/query mount) is missing.
The deployed cmd/server and standalone cmd/engine binaries serve
overlapping APIs from two separate routers and have silently diverged
before — this is the regression that left the PageIndex answer
endpoints unreachable in production. The guard makes any future drop of
a required route fail loudly instead of shipping a half-wired binary.
Copilot AI review requested due to automatic review settings May 28, 2026 14:32
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @hallelx2, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

The PR introduces POST /v1/answer and POST /v1/answer/pageindex endpoints that synthesize LLM-grounded answers from document retrieval, enhances the existing query handler for per-request strategy selection, and adds server-level infrastructure for strategy pre-building, replay recording, and LLM integration.

Changes

Answer Generation with Strategy Overrides

Layer / File(s) Summary
Server foundation and dependency setup
cmd/server/main.go, internal/handler/router.go, internal/handler/helpers.go
Server initializes pre-built retrieval strategies (buildStrategySet), optional replay storage (LRU cache), dedicated pageindex strategy with DB-backed TOC provider, and expands handler.Deps with LLM client, models, answer configuration, and replay store. Helpers add replay-aware JSON marshaling (marshalJSONForReplay, writeJSONWithReplay) for capturing exact response bytes.
Query handler strategy override mechanism
internal/handler/query.go
QueryHandler accepts a pre-built strategy map and injectable treeLoader seam. Per-request strategy field selects from override map or falls back to default. Returns 400 for unknown strategies, 503 when no strategy available. Both cost-aware and standard retrieval paths route through resolved per-request strategy.
Answer synthesis endpoint
internal/handler/answer.go
AnswerHandler for POST /v1/answer validates org headers, loads document tree, runs cost-aware section selection, loads section content from storage, extracts grounding quotes concurrently per section, synthesizes final answer via single LLM completion with section titles and extracted quotes in prompt, builds citations with optional quote spans, aggregates retrieval + synthesis usage, and records response to replay store.
PageIndex answer endpoint with streaming
internal/handler/answer_pageindex.go
AnswerPageIndexHandler for POST /v1/answer/pageindex uses specialized pageindex strategy with tenant-scoped TOC provider, supports per-request hops/page limits and reasoning capture via OnEvent. Implements optional SSE streaming (stream=true) emitting tool events as reasoning trace and final answer event. Deduplicates page citations by range, materializes cited content, optionally extracts quotes, and deterministically sorts for stable output.
Route registration for answer endpoints
internal/handler/router.go
Registers POST /v1/answer and /v1/answer/pageindex routes wired to newly constructed answer handlers with LLM client, configuration blocks, replay store, and pageindex strategy dependencies.
Query strategy routing tests
internal/handler/query_strategy_test.go
Test doubles (labeledStrategy, memStorage) and end-to-end HTTP tests verify strategy override selection, default strategy fallback when override omitted, HTTP 400 rejection of unknown strategies, and nil override set behavior.
Router parity and mount validation
internal/handler/router_parity_test.go
Tests assert required POST endpoints (/v1/query, /v1/answer, /v1/answer/pageindex) are mounted via Chi route walking, with deterministic sorted route listing in error messages.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • hallelx2/vectorless-engine#13: Modifies server retrieval strategy construction in cmd/server/main.go and the shared buildStrategy helper, affecting how this PR's strategy pre-building integrates with existing strategy wiring.

Poem

🐰 A hop through endpoints new,
with answers grounded, citations too,
strategies picked by query or by page,
replay logs written, wisdom on the stage.
From retrieval's depth to synthesis bright,
the rabbit weaves knowledge into light! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main objective: wiring the PageIndex strategy and answer endpoints into the deployed server binary for production use.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/consolidate-server-redesign

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hallelx2 hallelx2 merged commit 94fc6a3 into main May 28, 2026
3 of 9 checks passed
@hallelx2 hallelx2 deleted the feat/consolidate-server-redesign branch May 28, 2026 14:34
@hallelx2 hallelx2 review requested due to automatic review settings May 28, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant