fix(search): reject automatic-search results whose title is irrelevant to the audiobook#591
Conversation
| // Search for results. Restrict to the Newznab "Books > Audiobook" category | ||
| // (3030) so music-only indexers configured in Prowlarr without a category | ||
| // restriction don't return concert/album torrents for audiobook queries. | ||
| var searchResults = await searchService.SearchAsync(searchQuery, category: "3030", isAutomaticSearch: true); |
There was a problem hiding this comment.
I don't believe the category is intended to work that way. Idealy, we'd want the search to be performed on a specific indexer (later in the process), and that method should take the specific indexer it runs on as an input, then use the configured categories defined there.
At this point in the process, "3030" is only valid for specific indexers, maybe we could fix the process (which does not seem to be using the configured categories right now) and modify the targeted indexer(s) by making sure 3030 is set as a default when user configure those ?
There was a problem hiding this comment.
Agreed — the 3030 hardcode is a stopgap for the music-album bug, not the right long-term shape. The proper fix needs category config threaded per-indexer, which this PR doesn't touch.
Two options:
- Scope the per-indexer category work into this PR, or
- Drop the category layer from here entirely and leave this PR as the relevance + language filters — those catch the same music albums anyway (the relevance filter alone eliminated the music grabs in my testing) — then do indexer categories as a separate PR after the foundational changes land.
I'd lean toward option 2 to keep this PR focused, but happy to go either way — your call.
There was a problem hiding this comment.
I'm a very bad example at keeping PR focused ah ah but from my point of view, both option work as long as we stay agnostic of the indexers specificities. If you go for option 2, I would create an ticket to keep track of that category issue on indexers. There's much to be done on the indexer side so it could be a starting point for that
af5168e to
c4792f8
Compare
…t to the audiobook, and restrict automatic search to the audiobook category Two compounding gaps let automatic search match unrelated content to audiobook queries: 1. The search-result filter pipeline was not given the audiobook being searched for, so no filter could compare a result title against the audiobook's title or authors. A Patterson Hood concert recording could be matched to a James Patterson audiobook on the single shared "Patterson" surname. 2. AutomaticSearchService called searchService.SearchAsync with no category, so music-only Prowlarr indexers like BT.etree (configured with cat=3000 only) returned concert torrents for audiobook queries. Fix layers: - ISearchResultFilter gains a default-interface-method overload that accepts an optional Audiobook? context. Existing filters need no edits; their context-free behaviour is preserved. - SearchResultFilterPipeline.ApplyFilters / WouldFilter accept the same optional Audiobook? context and pass it through to filters. - New RelevanceFilter (Search/Filters/RelevanceFilter.cs) implements the contextual overload: stop-word-filtered token overlap between the result title and audiobook.Title + audiobook.Authors, rejecting below 30%. With no context (manual search, AsinEnricher per-result calls) the filter fails open, so manual users keep seeing every result. - AutomaticSearchService now invokes the filter pipeline with the audiobook context after raw search and before scoring. - AutomaticSearchService passes category: "3030" to SearchAsync. - The stop-word tokenizer is lifted into a shared SignificantTokens helper so the upcoming backfill TitleMatcher can reuse it. Tests: nine new tests covering the bug-report scenario directly (Patterson Hood vs Dog Diaries), legitimate matches (David Copperfield by Charles Dickens, author-first vs title-first naming), case- and punctuation-insensitivity, the fail-open behaviour when audiobook has no significant tokens, and the no-audiobook-context backwards-compat case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…roller auto-pick flows too The previous commit wired the context-aware RelevanceFilter into AutomaticSearchService only. The two other auto-pick callers (DownloadService.SearchAndDownloadAsync via the manual "search and download" endpoint, and LibraryController.ProcessAudiobookForSearchAsync via the bulk-search-all endpoint) bypassed the filter and went straight from search to scoring — so they could still pick an irrelevant result when the indexer returned only weak matches. DownloadService injects SearchResultFilterPipeline via its primary constructor; LibraryController resolves it from the existing scope factory to avoid widening its constructor. Both call pipeline.ApplyFilters(searchResults, audiobook: audiobook) before ScoreSearchResults, matching the AutomaticSearchService pattern. Manual search (no audiobook context) keeps the old behavior — the filter fails open without context. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The audiobook-category restriction is the wrong place for this — proper behavior is to use each indexer's configured categories, not impose a specific Newznab category at query time. The RelevanceFilter in this PR already eliminates the music-album results the hardcoded category was meant to defend against, so dropping it here keeps this PR focused on result filtering. Per-indexer category configuration belongs in the indexer admin UI, not in the auto-search query path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
938f2f0 to
60b14cb
Compare
Summary
Adds a context-aware relevance filter to the automatic-search pipeline so it stops queuing downloads that share a single common token with the audiobook but are otherwise unrelated. A Patterson Hood concert recording could be matched to a James Patterson audiobook on the single shared surname. Conservative — rejects only on explicit signal, fails open otherwise, and manual search keeps the old behavior so users browsing results still see everything.
How it works
ISearchResultFiltergains a default-interface-method overload that accepts an optionalAudiobook?context. Existing filters need no edits.SearchResultFilterPipeline.ApplyFilters/WouldFilteraccept the same optional context and pass it to each filter.RelevanceFilter(underSearch/Filters/) implements the contextual overload: stop-word-filtered token overlap between the result title andaudiobook.Title + audiobook.Authors, rejecting below 30%. Without an audiobook context the filter fails open.AutomaticSearchService,DownloadService.SearchAndDownloadAsync, andLibraryController.ProcessAudiobookForSearchAsync(the three auto-pick flows) invoke the pipeline with the audiobook context after raw search and before scoring.SignificantTokenshelper so other call sites can reuse it.What's not in this PR
The original draft also restricted automatic searches to the Newznab
Books > Audiobookcategory (3030) at query time and added a language filter that rejected foreign-language editions. Both have been dropped from this PR:SearchResultScoreralready penalises language mismatches viaLanguageMismatchPenalty/LanguageMissingPenalty, so a hard reject in this PR would be redundant.Test plan
SearchRelevanceFilterTests— bug-report scenario (Patterson Hood vs Dog Diaries), legitimate matches (David Copperfield by Charles Dickens with author-first and title-first naming), case- and punctuation-insensitivity, fail-open when audiobook has no significant tokens, and no-audiobook-context backwards-compat🤖 Generated with Claude Code