Skip to content

feat(ai): hybrid (semantic) catalog search — semantic=true (AI-057)#361

Open
mrviduus wants to merge 2 commits into
mainfrom
ai-057-hybrid-search
Open

feat(ai): hybrid (semantic) catalog search — semantic=true (AI-057)#361
mrviduus wants to merge 2 commits into
mainfrom
ai-057-hybrid-search

Conversation

@mrviduus

Copy link
Copy Markdown
Owner

AI-057 — hybrid (semantic) catalog search (Phase 9)

GET /search?q=&semantic=true blends keyword (FTS) + vector (query-embedding vs editions.embedding cosine) edition rankings via RRF — a semantically-related-but-keyword-absent book surfaces. Same PaginatedResult<SearchResultDto> shape (frontend-transparent).

  • HybridCatalogSearch (Application): wide FTS pool + EmbedAsync + editions-cosine rank (AI-055 visibility, param vector → HNSW) + RrfFusion.Fuse on edition_id + paginate the fused order; vector-only hits get card fields + empty highlights.
  • semantic absent/false → pure-FTS path byte-for-byte unchanged (no embed/cost); search-semantic rate limit (20/min) only when semantic=true.
  • No visibility leak (vector side + vector-only fetch both apply catalog filters — integration-asserted, all 4 exclusion classes).
  • Graceful FTS fallback (QA P2): embed/vector failure → FTS-only, never hard-fails; cancellation propagates.

654 unit + integration (pgvector, gated). StudyBuddy set-equality green; docker-compose clean. status=1 Published. Frontend toggle = later.

🤖 Generated with Claude Code

mrviduus and others added 2 commits June 17, 2026 18:28
Phase 9. GET /search?q=&semantic=true blends keyword (FTS) + vector
(query-embedding vs editions.embedding cosine) edition rankings via RRF,
same PaginatedResult<SearchResultDto> shape (frontend-transparent). A
semantically-related-but-keyword-absent book surfaces — the payoff.

- Orchestrator HybridCatalogSearch (Application, not the FTS provider which
  has no AI deps): wide FTS pool (offset 0) + IEmbeddingService.EmbedAsync
  + editions-cosine rank (AI-055 visibility: site, status=1, embedding NOT
  NULL, lang, EXISTS(chapters); cosine <=> via HNSW, param vector) +
  RrfFusion.Fuse on edition_id + paginate the FUSED order. Vector-only hits
  get title/author/cover + first-chapter fallback + empty highlights.
- semantic absent/false → today's pure-FTS path byte-for-byte unchanged,
  no embed, no cost; search-semantic rate limit (20/min) applies ONLY when
  semantic=true (pure-FTS unthrottled).
- Graceful FTS fallback (QA P2): embed/vector-rank failure → log + verbatim
  searchProvider.SearchAsync (semantic search never hard-fails catalog
  search); OperationCanceledException propagates.

654 unit (fusion granularity, toggle-off passthrough, embed-guard, empty-
vector degradation, embed-failure fallback, cancellation) + integration
(pgvector, gated): keyword-absent semantic hit surfaces, draft/hidden/
other-site/other-lang never appear, pure-FTS control no drift. StudyBuddy
set-equality green; docker-compose clean. Frontend toggle UI = later.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant