Summary
The research worker (ApproachNote Worker on Render, 2GB instance) plateaus at ~80% memory (~1.6GB RSS) while processing popular jazz standards, with one prior OOM-kill observed. The DuckDB cap shipped in eef620b brought the peak from ~95% to ~80% — but the plateau is uncomfortably close to the limit on a single heavy song, and one concurrent allocation away from OOM.
This is not a leak. It's the steady-state working set of three handlers chewing on the same song at once.
Evidence
Observation 1 — Ain't Misbehavin' (yesterday, pre-DuckDB cap)
- ~10-hour climb to 100% memory, then OOM at ~03:32 UTC.
- 1438 releases linked to the song.
- All three workers (
spotify.match_song, apple.match_song, youtube.match_recording) running concurrently for the song.
Observation 2 — Summertime (today, post-DuckDB cap)
Worker logs around 2026-05-14T19:24:09:
INFO research_worker.loop.spotify.match_song.job4391: claimed target=song/872d7739-… (Summertime)
INFO research_worker.loop.spotify: Found 3642 releases to process
INFO research_worker.loop.apple.match_song.job4392: claimed target=song/872d7739-… (Summertime)
INFO research_worker.loop.apple: Found 3642 releases to process
INFO research_worker.loop.youtube.match_recording.job4393+: claimed target=recording/…
Memory chart for the same instance: baseline ~5% → climb starting ~19:23 → plateau ~80% by ~19:28, sustained while both match_song jobs iterate.
Where the memory goes
The worker (research_worker/run.py) spawns one thread per registered (source, job_type), so spotify.match_song, apple.match_song, and youtube.match_recording run in parallel inside the same process (run.py:83-91). On a heavy song, three working sets stack:
| Tenant |
Source of working set |
Approx. size |
Apple match_song |
DuckDB buffer pool over Apple Music catalog parquets |
capped at 512MB (post-eef620b) |
Spotify match_song |
cur.fetchall() of N-row release×recording join with JSON-aggregated performers, plus parsed per-release Spotify API responses held live for the duration of the loop |
scales with releases — Summertime had 3642 rows |
YouTube match_recording |
per-recording client + matcher state across many concurrent jobs |
tens of MB |
| Python heap |
Arenas don't return to the OS after GC |
sticky overhead |
The Spotify get_releases_for_song query is the most stretchy of these — it materialises every row of a releases × recording_releases × recordings × release_streaming_links join with a JSON-aggregated performers subquery per row, then keeps the entire list reachable during the per-release loop in SpotifyMatcher.match_releases.
What's already shipped
eef620b — Cap DuckDB resource usage on the Apple Music catalog connection. Adds PRAGMA memory_limit='512MB' and PRAGMA threads=2 after every duckdb.connect(), with APPLE_DUCKDB_MEMORY_LIMIT / APPLE_DUCKDB_THREADS env overrides for ops tuning without a redeploy. Reduced observed peak from ~95% to ~80%.
Proposed fixes (cheapest first)
-
Tighten the DuckDB cap further via env. Set APPLE_DUCKDB_MEMORY_LIMIT=256MB on the Render worker. Frees ~250MB headroom; Apple queries may spill to temp disk on heavy songs but the cap is already implemented. Zero code change, instantly reversible.
-
Stream Spotify's release iteration. Replace cur.fetchall() in integrations/spotify/db.py::get_releases_for_song with a server-side cursor that yields batches of ~200 rows, and process per-batch inside SpotifyMatcher.match_releases. Caps Spotify's working set regardless of song popularity. ~30-50 LOC.
-
Upgrade worker instance to 4GB on Render. No code change. Buys runway but doesn't address the underlying scaling problem — the next high-coverage standard with double the releases will reproduce.
A complementary improvement worth tracking separately: a wall-clock watchdog on match_song handlers (abort + reschedule after, say, 30 min) so a stuck job can't pin DuckDB indefinitely the way the Ain't Misbehavin' run did.
Repro
- Queue a deep refresh on a song with > ~2000 releases (Summertime, Ain't Misbehavin', Body and Soul, Stardust, etc).
- Watch
/admin/research/ (filter source=spotify or apple, job_type=match_song) — both jobs claim within seconds of each other.
- Watch Render's memory chart for the worker; expect rapid climb to 70–85% during the in-flight window.
Acceptance criteria
- Worker RSS stays below 70% (~1.4GB) during concurrent
match_song runs on the heaviest songs in the catalog.
- Two heavy songs queued back-to-back do not OOM the worker.
Summary
The research worker (
ApproachNote Workeron Render, 2GB instance) plateaus at ~80% memory (~1.6GB RSS) while processing popular jazz standards, with one prior OOM-kill observed. The DuckDB cap shipped ineef620bbrought the peak from ~95% to ~80% — but the plateau is uncomfortably close to the limit on a single heavy song, and one concurrent allocation away from OOM.This is not a leak. It's the steady-state working set of three handlers chewing on the same song at once.
Evidence
Observation 1 — Ain't Misbehavin' (yesterday, pre-DuckDB cap)
spotify.match_song,apple.match_song,youtube.match_recording) running concurrently for the song.Observation 2 — Summertime (today, post-DuckDB cap)
Worker logs around 2026-05-14T19:24:09:
Memory chart for the same instance: baseline ~5% → climb starting ~19:23 → plateau ~80% by ~19:28, sustained while both
match_songjobs iterate.Where the memory goes
The worker (
research_worker/run.py) spawns one thread per registered(source, job_type), sospotify.match_song,apple.match_song, andyoutube.match_recordingrun in parallel inside the same process (run.py:83-91). On a heavy song, three working sets stack:match_songmatch_songcur.fetchall()of N-row release×recording join with JSON-aggregated performers, plus parsed per-release Spotify API responses held live for the duration of the loopmatch_recordingThe Spotify
get_releases_for_songquery is the most stretchy of these — it materialises every row of areleases × recording_releases × recordings × release_streaming_linksjoin with a JSON-aggregatedperformerssubquery per row, then keeps the entire list reachable during the per-release loop inSpotifyMatcher.match_releases.What's already shipped
eef620b— Cap DuckDB resource usage on the Apple Music catalog connection. AddsPRAGMA memory_limit='512MB'andPRAGMA threads=2after everyduckdb.connect(), withAPPLE_DUCKDB_MEMORY_LIMIT/APPLE_DUCKDB_THREADSenv overrides for ops tuning without a redeploy. Reduced observed peak from ~95% to ~80%.Proposed fixes (cheapest first)
Tighten the DuckDB cap further via env. Set
APPLE_DUCKDB_MEMORY_LIMIT=256MBon the Render worker. Frees ~250MB headroom; Apple queries may spill to temp disk on heavy songs but the cap is already implemented. Zero code change, instantly reversible.Stream Spotify's release iteration. Replace
cur.fetchall()inintegrations/spotify/db.py::get_releases_for_songwith a server-side cursor that yields batches of ~200 rows, and process per-batch insideSpotifyMatcher.match_releases. Caps Spotify's working set regardless of song popularity. ~30-50 LOC.Upgrade worker instance to 4GB on Render. No code change. Buys runway but doesn't address the underlying scaling problem — the next high-coverage standard with double the releases will reproduce.
A complementary improvement worth tracking separately: a wall-clock watchdog on
match_songhandlers (abort + reschedule after, say, 30 min) so a stuck job can't pin DuckDB indefinitely the way the Ain't Misbehavin' run did.Repro
/admin/research/(filter source=spotify or apple, job_type=match_song) — both jobs claim within seconds of each other.Acceptance criteria
match_songruns on the heaviest songs in the catalog.