fix(scraper): tighten gamelist metadata scraping#858
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds MIME inference for path-backed media properties, extends media.meta with available image types, prefers thumbnail in image selection, changes stale-image handling to in-memory logging, refactors GamelistXML scraper for folded-path matching, ROM-level media tags, artwork fallback (including external roots), companion atomic writes, and updates tests/docs. ChangesContent Type & Media API
Gamelist XML Scraper Path-Based Refactor
Sequence Diagram (high-level scraper write flow): sequenceDiagram
participant API as media.scrape (API)
participant Scraper as GamelistXMLScraper
participant Loader as LoadRecords
participant Mapper as MapToDB
participant DB as database.ApplyScrapeResult
API->>Scraper: start scrape request
Scraper->>Loader: loadRecordIndexes (media/title indexes)
Loader->>Mapper: Map matched game -> MapResult (TitleProps/MediaProps, MediaTags)
Mapper->>DB: ApplyScrapeResult(scrape write with sentinel + tags/props)
DB-->>Scraper: write result (ok/error)
Scraper-->>API: progress/status updates
🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
pkg/database/scraper/gamelistxml/scraper_test.go (1)
1381-1389: ⚡ Quick winStrengthen
companionWriteMatcherto validate parent title metadata too.The matcher currently ignores
TitleTagsandTitleProps, so atomic-write regressions on companion parent metadata would still pass.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/scraper/gamelistxml/scraper_test.go` around lines 1381 - 1389, The matcher companionWriteMatcher currently only checks Sentinel and MediaTags; modify it to also validate the parent title metadata on the captured *database.ScrapeWrite*. Change the signature of companionWriteMatcher to accept expected title metadata (e.g., expectedTitleTags []database.TagInfo and expectedTitleProps map[string]string), and inside the mock.MatchedBy closure verify that w.TitleTags equals expectedTitleTags and w.TitleProps equals expectedTitleProps (use assert.ObjectsAreEqual or reflect.DeepEqual for the comparisons). Ensure you still check w != nil and the existing Sentinel and MediaTags checks so the matcher rejects writes missing the parent title metadata.pkg/database/scraper/gamelistxml/scraper.go (1)
934-939: ⚡ Quick winMove
companionStatsto the top-level type/const section.This new type is declared mid-file after many functions. Please move it near other type/const declarations for guideline compliance.
As per coding guidelines,
**/*.go: "Define Go types and consts near the top of the file, before functions and methods".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/scraper/gamelistxml/scraper.go` around lines 934 - 939, Move the companionStats type declaration out of the middle of the file and place it with the other top-level types/consts near the top of the file (before any functions or methods); specifically locate the companionStats type and cut/paste it into the file’s existing type/const block so it sits alongside other package-level type definitions, ensure no references need adjusting, then run gofmt/govet to verify formatting and imports.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/api/methods/media_content.go`:
- Around line 49-52: The function mediaContentType currently checks
strings.TrimSpace(contentType) but returns the original untrimmed contentType;
update mediaContentType to return the trimmed value (use
strings.TrimSpace(contentType)) when the check passes, and similarly ensure any
fallback return (the text parameter) is also strings.TrimSpace(text) so no
whitespace-padded MIME values are exposed.
In `@pkg/database/scraper/gamelistxml/scraper.go`:
- Around line 1223-1231: The companionChildTags function currently appends raw
Region and Lang values which can create inconsistent tags (e.g., "USA, EUR");
update companionChildTags to normalize these fields the same way MapToDB does:
split CSV values, trim whitespace, lowercase (or apply the same normalization
function MapToDB uses), and append each resulting token as a separate
database.TagInfo with Type string(tags.TagTypeRegion) or
string(tags.TagTypeLang) respectively; locate companionChildTags and reuse or
mirror the normalization logic used elsewhere (e.g., the MapToDB helper or
tokenization utility) so region/lang produce consistent individual tags.
---
Nitpick comments:
In `@pkg/database/scraper/gamelistxml/scraper_test.go`:
- Around line 1381-1389: The matcher companionWriteMatcher currently only checks
Sentinel and MediaTags; modify it to also validate the parent title metadata on
the captured *database.ScrapeWrite*. Change the signature of
companionWriteMatcher to accept expected title metadata (e.g., expectedTitleTags
[]database.TagInfo and expectedTitleProps map[string]string), and inside the
mock.MatchedBy closure verify that w.TitleTags equals expectedTitleTags and
w.TitleProps equals expectedTitleProps (use assert.ObjectsAreEqual or
reflect.DeepEqual for the comparisons). Ensure you still check w != nil and the
existing Sentinel and MediaTags checks so the matcher rejects writes missing the
parent title metadata.
In `@pkg/database/scraper/gamelistxml/scraper.go`:
- Around line 934-939: Move the companionStats type declaration out of the
middle of the file and place it with the other top-level types/consts near the
top of the file (before any functions or methods); specifically locate the
companionStats type and cut/paste it into the file’s existing type/const block
so it sits alongside other package-level type definitions, ensure no references
need adjusting, then run gofmt/govet to verify formatting and imports.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 024e2c7a-f4ed-475e-ab56-937c7ab997f2
📒 Files selected for processing (8)
docs/scraper.mdpkg/api/methods/media_content.gopkg/api/methods/media_image.gopkg/api/methods/media_image_test.gopkg/api/methods/media_meta.gopkg/api/methods/media_meta_test.gopkg/database/scraper/gamelistxml/scraper.gopkg/database/scraper/gamelistxml/scraper_test.go
|
Added core-only regression coverage for frontend issue ZaparooProject/zaparoo-frontend#161:
Validation rerun:
|
|
Actionable comments posted: 0 |
|
Actionable comments posted: 0 |
|
Added a follow-up fix for #794: regular gamelist.xml matching still uses concrete media paths, but now handles MiSTer zip-as-dir indexing by falling back from an unmatched .zip entry to exactly one indexed child path under that archive. Ambiguous multi-child archives are skipped rather than falling back to slug matching. |
|
Actionable comments posted: 0 |
|
Added a follow-up for #808. Coverage now distinguishes nested explicit artwork (already supported), nested filesystem fallback artwork, ignored nested gamelist.xml files, and thumbnail-only artwork. Fixes include mirrored subfolder media fallback (e.g. media/images/Japan/Game.png), box2dfront aliases, and media.image thumbnail support/default fallback. |
|
Actionable comments posted: 0 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/database/scraper/gamelistxml/scraper.go`:
- Around line 502-517: When building loadRecordIndexes in the loop over
allMedia, you must apply the same non-force scraped-media gate to
MediaByTitleDBID as you do for MediaByPathFold: check scrapedIDs for m.DBID and
only add to indexes.MediaByTitleDBID[m.MediaTitleDBID] when the media is not
already scraped (same condition used for indexes.MediaByPathFold using
pathFoldKey). Update the loop that constructs loadRecordIndexes (the indexes
variable handling MediaByPathFold and MediaByTitleDBID) so both maps skip
appending already-scraped media when scrapedIDs indicates it was scraped
(preserving behavior when the force-mode branch is active).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5f2a41b7-900b-4136-9726-4fc041e492b8
📒 Files selected for processing (10)
docs/scraper.mdpkg/api/methods/media_content.gopkg/api/methods/media_content_test.gopkg/api/methods/media_image.gopkg/api/methods/media_image_test.gopkg/api/methods/media_meta.gopkg/api/methods/media_meta_test.gopkg/api/models/responses.gopkg/database/scraper/gamelistxml/scraper.gopkg/database/scraper/gamelistxml/scraper_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/api/methods/media_content_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/api/methods/media_content.go
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/scraper.md (1)
210-210:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocumentation inconsistency with PR changes regarding stale property handling.
Line 210 states "Stale file path properties are removed automatically," but the AI summary indicates this PR changes stale-image handling to "in-memory logging" with "no DB deletion." The documentation should reflect that stale properties are now logged rather than deleted.
📝 Suggested correction
-`media.image` accepts one media ref plus image type preferences such as `image`, `boxart`, `boxart3d`, `screenshot`, `wheel`, `titleshot`, `map`, `marquee`, and `fanart`. These resolve to canonical image property tags; for example `boxart` becomes `property:image-boxart` and `image` becomes `property:image-image`. Media-level properties are preferred over title-level properties for the same type. Stale file path properties are removed automatically and lookup falls through to the next available source. +`media.image` accepts one media ref plus image type preferences such as `image`, `boxart`, `boxart3d`, `screenshot`, `wheel`, `titleshot`, `map`, `marquee`, and `fanart`. These resolve to canonical image property tags; for example `boxart` becomes `property:image-boxart` and `image` becomes `property:image-image`. Media-level properties are preferred over title-level properties for the same type. Stale file path properties are logged and lookup falls through to the next available source.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/scraper.md` at line 210, Update the docs text that describes stale-file handling for media.image to reflect the PR change: instead of saying "Stale file path properties are removed automatically" explain that stale image properties are now recorded via in-memory logging (no DB deletions) and that lookup still falls through to the next available source; reference the media.image field and the canonical image property tags (e.g., property:image-boxart, property:image-image) so readers know which properties this new logging behavior applies to.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@docs/scraper.md`:
- Line 210: Update the docs text that describes stale-file handling for
media.image to reflect the PR change: instead of saying "Stale file path
properties are removed automatically" explain that stale image properties are
now recorded via in-memory logging (no DB deletions) and that lookup still falls
through to the next available source; reference the media.image field and the
canonical image property tags (e.g., property:image-boxart,
property:image-image) so readers know which properties this new logging behavior
applies to.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 33eb049d-32fe-45d8-b8e1-4c7fca358f3f
📒 Files selected for processing (3)
docs/scraper.mdpkg/database/scraper/gamelistxml/scraper.gopkg/database/scraper/gamelistxml/scraper_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/database/scraper/gamelistxml/scraper.go
- pkg/database/scraper/gamelistxml/scraper_test.go
Summary
Validation
Summary by CodeRabbit
Documentation
New Features
Improvements
Tests