Skip to content

fix(search): reject automatic-search results whose title is irrelevant to the audiobook#591

Draft
kevinheneveld wants to merge 3 commits into
Listenarrs:canaryfrom
kevinheneveld:fix/automatic-search-relevance
Draft

fix(search): reject automatic-search results whose title is irrelevant to the audiobook#591
kevinheneveld wants to merge 3 commits into
Listenarrs:canaryfrom
kevinheneveld:fix/automatic-search-relevance

Conversation

@kevinheneveld
Copy link
Copy Markdown

@kevinheneveld kevinheneveld commented May 14, 2026

Summary

Adds a context-aware relevance filter to the automatic-search pipeline so it stops queuing downloads that share a single common token with the audiobook but are otherwise unrelated. A Patterson Hood concert recording could be matched to a James Patterson audiobook on the single shared surname. Conservative — rejects only on explicit signal, fails open otherwise, and manual search keeps the old behavior so users browsing results still see everything.

How it works

  1. ISearchResultFilter gains a default-interface-method overload that accepts an optional Audiobook? context. Existing filters need no edits.
  2. SearchResultFilterPipeline.ApplyFilters / WouldFilter accept the same optional context and pass it to each filter.
  3. New RelevanceFilter (under Search/Filters/) implements the contextual overload: stop-word-filtered token overlap between the result title and audiobook.Title + audiobook.Authors, rejecting below 30%. Without an audiobook context the filter fails open.
  4. AutomaticSearchService, DownloadService.SearchAndDownloadAsync, and LibraryController.ProcessAudiobookForSearchAsync (the three auto-pick flows) invoke the pipeline with the audiobook context after raw search and before scoring.
  5. The stop-word-filtered tokenizer is lifted into a shared SignificantTokens helper so other call sites can reuse it.

What's not in this PR

The original draft also restricted automatic searches to the Newznab Books > Audiobook category (3030) at query time and added a language filter that rejected foreign-language editions. Both have been dropped from this PR:

  • Category restriction: per-indexer category config is the right place for that — query-time imposition can mask oddly-configured indexers. The relevance filter in this PR already eliminates the music-album false matches the category restriction was meant to defend against.
  • Language filter: upstream's SearchResultScorer already penalises language mismatches via LanguageMismatchPenalty / LanguageMissingPenalty, so a hard reject in this PR would be redundant.

Test plan

  • SearchRelevanceFilterTests — bug-report scenario (Patterson Hood vs Dog Diaries), legitimate matches (David Copperfield by Charles Dickens with author-first and title-first naming), case- and punctuation-insensitivity, fail-open when audiobook has no significant tokens, and no-audiobook-context backwards-compat
  • Full backend suite passes (740/740)
  • Verified in a live instance: automatic search shows 0 spurious matches across 25+ downloads, relevance rejections firing as expected

🤖 Generated with Claude Code

@kevinheneveld kevinheneveld requested a review from a team May 14, 2026 01:38
@kevinheneveld kevinheneveld marked this pull request as draft May 14, 2026 01:43
@kevinheneveld kevinheneveld changed the title Reject low-relevance search results; restrict automatic search to audiobook category Search-result filtering: relevance, audiobook category, and language May 14, 2026
Copy link
Copy Markdown
Contributor

@T4g1 T4g1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid work!

// Search for results. Restrict to the Newznab "Books > Audiobook" category
// (3030) so music-only indexers configured in Prowlarr without a category
// restriction don't return concert/album torrents for audiobook queries.
var searchResults = await searchService.SearchAsync(searchQuery, category: "3030", isAutomaticSearch: true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the category is intended to work that way. Idealy, we'd want the search to be performed on a specific indexer (later in the process), and that method should take the specific indexer it runs on as an input, then use the configured categories defined there.

At this point in the process, "3030" is only valid for specific indexers, maybe we could fix the process (which does not seem to be using the configured categories right now) and modify the targeted indexer(s) by making sure 3030 is set as a default when user configure those ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — the 3030 hardcode is a stopgap for the music-album bug, not the right long-term shape. The proper fix needs category config threaded per-indexer, which this PR doesn't touch.

Two options:

  1. Scope the per-indexer category work into this PR, or
  2. Drop the category layer from here entirely and leave this PR as the relevance + language filters — those catch the same music albums anyway (the relevance filter alone eliminated the music grabs in my testing) — then do indexer categories as a separate PR after the foundational changes land.

I'd lean toward option 2 to keep this PR focused, but happy to go either way — your call.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a very bad example at keeping PR focused ah ah but from my point of view, both option work as long as we stay agnostic of the indexers specificities. If you go for option 2, I would create an ticket to keep track of that category issue on indexers. There's much to be done on the indexer side so it could be a starting point for that

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with option 2 — dropped the category-3030 piece in af5168e (force-pushed). PR is now relevance + language only. Filed #594 to track the proper per-indexer categories work for when the indexer-side foundations are ready.

Comment thread listenarr.api/Services/QualityProfileService.cs Outdated
@kevinheneveld kevinheneveld force-pushed the fix/automatic-search-relevance branch from af5168e to c4792f8 Compare May 17, 2026 22:45
@kevinheneveld kevinheneveld changed the title Search-result filtering: relevance, audiobook category, and language fix(search): reject automatic-search results whose title is irrelevant to the audiobook May 17, 2026
Kevin Heneveld and others added 3 commits May 19, 2026 08:12
…t to the audiobook, and restrict automatic search to the audiobook category

Two compounding gaps let automatic search match unrelated content to
audiobook queries:

1. The search-result filter pipeline was not given the audiobook being
   searched for, so no filter could compare a result title against the
   audiobook's title or authors. A Patterson Hood concert recording could
   be matched to a James Patterson audiobook on the single shared
   "Patterson" surname.
2. AutomaticSearchService called searchService.SearchAsync with no
   category, so music-only Prowlarr indexers like BT.etree (configured
   with cat=3000 only) returned concert torrents for audiobook queries.

Fix layers:
- ISearchResultFilter gains a default-interface-method overload that
  accepts an optional Audiobook? context. Existing filters need no
  edits; their context-free behaviour is preserved.
- SearchResultFilterPipeline.ApplyFilters / WouldFilter accept the same
  optional Audiobook? context and pass it through to filters.
- New RelevanceFilter (Search/Filters/RelevanceFilter.cs) implements
  the contextual overload: stop-word-filtered token overlap between the
  result title and audiobook.Title + audiobook.Authors, rejecting below
  30%. With no context (manual search, AsinEnricher per-result calls)
  the filter fails open, so manual users keep seeing every result.
- AutomaticSearchService now invokes the filter pipeline with the
  audiobook context after raw search and before scoring.
- AutomaticSearchService passes category: "3030" to SearchAsync.
- The stop-word tokenizer is lifted into a shared SignificantTokens
  helper so the upcoming backfill TitleMatcher can reuse it.

Tests: nine new tests covering the bug-report scenario directly
(Patterson Hood vs Dog Diaries), legitimate matches (David Copperfield
by Charles Dickens, author-first vs title-first naming), case- and
punctuation-insensitivity, the fail-open behaviour when audiobook has
no significant tokens, and the no-audiobook-context backwards-compat
case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…roller auto-pick flows too

The previous commit wired the context-aware RelevanceFilter into
AutomaticSearchService only. The two other auto-pick callers
(DownloadService.SearchAndDownloadAsync via the manual "search and
download" endpoint, and LibraryController.ProcessAudiobookForSearchAsync
via the bulk-search-all endpoint) bypassed the filter and went straight
from search to scoring — so they could still pick an irrelevant result
when the indexer returned only weak matches.

DownloadService injects SearchResultFilterPipeline via its primary
constructor; LibraryController resolves it from the existing scope
factory to avoid widening its constructor. Both call
pipeline.ApplyFilters(searchResults, audiobook: audiobook) before
ScoreSearchResults, matching the AutomaticSearchService pattern.

Manual search (no audiobook context) keeps the old behavior — the
filter fails open without context.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The audiobook-category restriction is the wrong place for this — proper
behavior is to use each indexer's configured categories, not impose a
specific Newznab category at query time. The RelevanceFilter in this
PR already eliminates the music-album results the hardcoded category
was meant to defend against, so dropping it here keeps this PR focused
on result filtering. Per-indexer category configuration belongs in the
indexer admin UI, not in the auto-search query path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@kevinheneveld kevinheneveld force-pushed the fix/automatic-search-relevance branch from 938f2f0 to 60b14cb Compare May 19, 2026 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants