mrviduus · mrviduus · Jun 17, 2026 · Jun 17, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,18 @@
 
 ## [Unreleased]
 
+### Phase 9 — hybrid (semantic) catalog search (AI-057) (2026-06-17)
+
+Third Phase 9 slice: `GET /search?q=&semantic=true` blends the existing keyword (FTS) edition ranking with a **vector** ranking (query embedding vs the AI-054 `editions.embedding`) via **RRF**, returning the SAME `PaginatedResult<SearchResultDto>` shape (frontend-transparent). **Default OFF**: `semantic` absent/false → today's pure-FTS path, **byte-for-byte unchanged, zero new cost/latency**. Backend only (toggle UI is out of scope). Eval (precision@k) is a later step.
+
+- **Orchestrator** (`backend/src/Application/Search/HybridCatalogSearch.cs`) — lives in **Application**, NOT the FTS provider (which has no AI deps). `SearchAsync(request, language, ct)`: (a) pulls a WIDER FTS candidate pool from the provider (offset 0, `limit ≈ clamp(offset+limit, 30, 200)`, highlights ON) keyed by `edition_id`; (b) `IEmbeddingService.EmbedAsync(q)` for the query vector (one OpenAI embedding call per semantic search); (c) runs the editions-cosine SQL for the vector edition-id pool; (d) `RrfFusion.Fuse([ftsIds, vectorIds])` at **edition granularity** (the shared fusion key — both retrievers already collapse to one row per edition); (e) **paginates the FUSED order** with the request's offset/limit (fixing the FTS-internal-pagination-vs-fusion skew); (f) materializes page DTOs — reuses the FTS hit's `SearchResultDto` (with its best-chapter snippet) where present, and for **vector-only editions** (no keyword match) fetches title/author/cover + a first-chapter (`chapter_number`-ordered) `ChapterId/Slug/Title` fallback with **empty `Highlights`** (no snippet exists). Application already references `Ai.Core`/`Ai.Rag`, so `RrfFusion`, `IEmbeddingService`, and `RagService.FormatVector` are **reused directly** (no new project dep, no copied RRF).
+- **Vector SQL** (mirrors AI-055 visibility exactly): `SELECT e.id FROM editions e WHERE e.site_id = @siteId AND e.status = 1 AND e.embedding IS NOT NULL AND (@lang IS NULL OR e.language = @lang) AND EXISTS (SELECT 1 FROM chapters c WHERE c.edition_id = e.id) ORDER BY e.embedding <=> CAST(@qvec AS vector) LIMIT @pool;` — `@qvec` = `FormatVector(queryVector)`, parameterized so the HNSW `vector_cosine_ops` index serves the ORDER BY; `<=>` cosine (the stored mean is un-normalized → cosine mandatory, NOT L2); 5s command timeout; `status = 1` = `EditionStatus.Published` ordinal.
+- **Toggle** (`backend/src/Api/Endpoints/SearchEndpoints.cs`). Added `[FromQuery] bool? semantic`. The existing ≥2-char / non-empty / ≤200-char `q` guards run FIRST, so a short/empty query with `semantic=true` returns today's 400 **with no embed call**. `semantic == true` + valid `q` → `HybridCatalogSearch.SearchAsync`; otherwise → the verbatim `searchProvider.SearchAsync` path. `TotalCount` is **approximate** (distinct fused-candidate-pool size) — no extra exact-count scan in v1.
+- **Rate limit.** New `search-semantic` policy (`backend/src/Api/Program.cs`, 20/min per IP, cloned from `explain`/`translate`) applied to `GET /search`. CRITICAL: the policy is a **NO-OP (`GetNoLimiter`) unless `?semantic` is truthy** — the pure-FTS path consumes no partition and stays completely unthrottled (zero new cost/latency).
+- **Graceful FTS fallback (P2 fix)** (`backend/src/Application/Search/HybridCatalogSearch.cs`). The semantic step (the external `IEmbeddingService.EmbedAsync` call + the vector-rank SQL) is wrapped in a single `try/catch (Exception ex) when (ex is not OperationCanceledException)`: on ANY failure (OpenAI down/throttled/timeout, vector-query error) the orchestrator logs a warning (`ILogger<HybridCatalogSearch>`) and returns the **verbatim pure-FTS** result by re-issuing `searchProvider.SearchAsync(request, ct)` — so a semantic search that can't reach the embedder degrades to a keyword search (byte-identical shape: correct `TotalCount` + pagination) instead of hard-500ing the whole catalog. **Semantic search never takes down catalog search.** `OperationCanceledException` is explicitly NOT swallowed — genuine request cancellation propagates. (The empty-vector "no editions embedded yet" case was already handled by RRF; this guards only the embed/vector-query THROW path.)
+- **No migration** — the `editions.embedding` column + HNSW index already exist from AI-054.
+- **Tests.** Integration (`tests/TextStack.IntegrationTests/HybridCatalogSearchTests.cs`, real Postgres+pgvector, `TEST_DB_CONNECTION`-gated, self-contained seed + cleanup, mirrors the AI-055 harness; **mocks `IEmbeddingService`** to return a fixed query vector — no real OpenAI): seeds A (keyword-matches `q`, orthogonal embedding), B (keyword-ABSENT, embedding colinear with the fixed query vector), and draft/hidden/other-site/other-lang near-editions. `semantic=true` asserts **B surfaces** (the keyword-absent semantic payoff), A present, the invisible editions **never** appear, and B's hit carries **empty highlights + a first-chapter fallback**; a control asserts the pure-FTS path returns A unaffected (no drift). Unit (`tests/TextStack.UnitTests/HybridCatalogSearchTests.cs`): edition-id-granularity RRF (an edition in BOTH lists outranks a single-list edition; a vector-only edition still ranks), the ≥2-char guard predicate (no embed), and the **P2 fallback** — a fake `IEmbeddingService` that THROWS makes `SearchAsync` return the stub `ISearchProvider`'s FTS result (no exception propagates, DB never touched), while a fake embedder throwing `OperationCanceledException` PROPAGATES (cancellation not swallowed). `dotnet build` + UnitTests (654, StudyBuddy set-equality green) + the 2 integration tests (ran against a disposable `pgvector/pgvector:pg16` migrated via `dotnet ef database update`, then removed — AI-055's 5 integration tests re-run green as a regression check; docker-compose left untouched) + `dotnet format --verify-no-changes` all green; no new `ITool`.
+
 ### Phase 9 — "Similar books" rail on BookDetailPage (AI-056) (2026-06-17)
 
 The first user-visible Phase 9 surface. A `SimilarBooksRail` on the web `BookDetailPage` renders books most similar to the one being viewed, via the AI-055 endpoint `GET /books/{slug}/similar?limit=8` (cosine over `editions.embedding`). `getSimilarBooks(slug, limit)` added to the api client (mirrors the language-prefixed `/books/{slug}/...` pattern), wired through `useApi()`. The rail reuses the existing "more by author" book-card markup/CSS (cover + `stringToColor` first-letter fallback, `LocalizedLink` to `/books/{slug}`) — no new design. **Renders nothing (returns null) on an empty list OR a fetch error** — a book with no embedding (or no neighbors) simply shows no rail, never an error/skeleton; client-side fetch, SSG-safe. 3 Vitest cases (renders cards, hides on empty, hides on error); web suite 520 green; tsc + build clean. Note: existing prod editions have NULL embedding until the owner runs the AI-054 `backfill-edition-embeddings` CLI — the rail hides gracefully until then.

diff --git a/backend/src/Api/Endpoints/SearchEndpoints.cs b/backend/src/Api/Endpoints/SearchEndpoints.cs
@@ -10,6 +10,7 @@
 
 using Api.Language;
 using Api.Sites;
+using Application.Search;
 using Contracts.Common;
 using Microsoft.AspNetCore.Mvc;
 using TextStack.Search.Abstractions;
@@ -29,8 +30,9 @@ public static void MapSearchEndpoints(this WebApplication app)
         // Group endpoints under /search prefix with OpenAPI tag
         var group = app.MapGroup("/search").WithTags("Search");
 
-        // Two endpoints: full-text search and autocomplete suggestions
-        group.MapGet("", Search).WithName("Search");
+        // Two endpoints: full-text search and autocomplete suggestions.
+        // search-semantic limiter is a NO-OP unless ?semantic=true (AI-057) — pure-FTS stays unthrottled.
+        group.MapGet("", Search).WithName("Search").RequireRateLimiting("search-semantic");
         group.MapGet("/suggest", Suggest).WithName("SearchSuggest");
     }
 
@@ -41,10 +43,12 @@ public static void MapSearchEndpoints(this WebApplication app)
     private static async Task<IResult> Search(
         HttpContext httpContext,
         ISearchProvider searchProvider,  // Injected via DI
+        HybridCatalogSearch hybridSearch, // AI-057: resolved always, invoked only when semantic=true
         [FromQuery] string q,             // Search query
         [FromQuery] int? limit,           // Page size (default 20, max 100)
         [FromQuery] int? offset,          // Skip N results
         [FromQuery] bool? highlight,      // Include text snippets?
+        [FromQuery] bool? semantic,       // AI-057: blend FTS + vector via RRF? (default OFF)
         CancellationToken ct)
     {
         // ─── Input Validation ───────────────────────────────────
@@ -77,7 +81,12 @@ private static async Task<IResult> Search(
             highlight ?? false);
 
         // ─── Execute Search ─────────────────────────────────────
-        var result = await searchProvider.SearchAsync(request, ct);
+        // AI-057: semantic=true blends FTS + editions.embedding cosine via RRF (same DTO shape).
+        // The ≥2-char/non-empty guard above already ran, so the embed call is never wasted on a
+        // short query. semantic absent/false → today's pure-FTS path, byte-for-byte unchanged.
+        var result = semantic == true
+            ? await hybridSearch.SearchAsync(request, language, ct)
+            : await searchProvider.SearchAsync(request, ct);
 
         // ─── Map to Response ────────────────────────────────────
         // Transform internal SearchHit to API DTO

diff --git a/backend/src/Api/Program.cs b/backend/src/Api/Program.cs
@@ -151,6 +151,15 @@
 builder.Services.AddScoped(_ =>
     new Application.Recommendations.SimilarBooksService(() => new NpgsqlConnection(connectionString)));
 
+// Hybrid catalog search (AI-057): blends the FTS edition ranking with cosine NN over
+// editions.embedding via RRF. Only invoked on `semantic=true`; the pure-FTS path never touches it.
+builder.Services.AddScoped(sp =>
+    new Application.Search.HybridCatalogSearch(
+        sp.GetRequiredService<TextStack.Search.Abstractions.ISearchProvider>(),
+        sp.GetRequiredService<global::TextStack.Ai.Core.IEmbeddingService>(),
+        () => new NpgsqlConnection(connectionString),
+        sp.GetRequiredService<ILogger<Application.Search.HybridCatalogSearch>>()));
+
 // Reindex service (used by CLI)
 builder.Services.AddScoped<SearchReindexService>();
 
@@ -352,6 +361,26 @@
             QueueLimit = 0,
         });
     });
+    // Hybrid catalog search (AI-057): semantic=true embeds the query (one paid OpenAI embedding
+    // call per request) before the $0 pgvector scan, so it gets its own per-IP throttle. CRITICAL:
+    // this policy is a NO-OP unless `semantic` is truthy — the pure-FTS path (semantic absent/false)
+    // consumes no partition and stays completely unthrottled (zero new cost/latency).
+    options.AddPolicy("search-semantic", httpContext =>
+    {
+        var semantic = httpContext.Request.Query["semantic"].ToString();
+        var isSemantic = semantic.Equals("true", StringComparison.OrdinalIgnoreCase)
+                         || semantic == "1";
+        if (!isSemantic)
+            return RateLimitPartition.GetNoLimiter("search-fts");
+
+        var ip = httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown";
+        return RateLimitPartition.GetFixedWindowLimiter("semantic:" + ip, _ => new FixedWindowRateLimiterOptions
+        {
+            Window = TimeSpan.FromMinutes(1),
+            PermitLimit = 20,
+            QueueLimit = 0,
+        });
+    });
     // "Ask this book" (RAG) — one LLM call per request, per-user reading. 30/min per IP is
     // generous for genuine use and caps scripted abuse.
     options.AddPolicy("rag.ask", httpContext =>