Skip to content

Schedule periodic Apple Music catalog refresh #178

@dprodger

Description

@dprodger

Background

Spinning out of #162 (Apple Music database service). The infrastructure is in place — Render worker with 100 GB persistent disk at /data, chained worker job (apple/refresh_catalog × 3 + apple/rebuild_index), admin trigger button at /admin/apple-music-catalog/refresh, dedup so concurrent triggers collapse — but the chain runs only when manually clicked. Original #162 success criterion called for "manually or on a schedule." Apple's Feed publishes ~weekly, so a stale catalog is a real risk if nobody remembers to click the button.

Goal

Re-fire enqueue_refresh_chain() on a regular cadence so the catalog stays current without manual intervention. Honor the existing in-flight dedup (sentinel UUIDs + unique-while-in-flight index on research_jobs) so a scheduled run during an already-running refresh is a no-op.

Options

Three reasonable places to put the trigger; pick one:

  1. Cron job inside the worker process — a small apscheduler (or hand-rolled threading.Timer) inside research_worker/run.py that calls enqueue_refresh_chain() weekly. Fewest moving parts; the worker is already running and has DB access. Risk: scheduler dies if the worker crashes between fires.

  2. Render Cron Job service — a separate Render service of type: cron that runs python -c "from integrations.apple_music.refresh import enqueue_refresh_chain; enqueue_refresh_chain()" on a schedule. Adds another service to operate but completely decoupled from worker uptime.

  3. GitHub Actions schedule — a workflow on cron: '0 5 * * MON' that hits a small admin POST endpoint. Already free; same dedup behavior. Risk: secrets management for the auth token.

Recommendation: Option 2 (Render Cron). Lowest blast-radius coupling, native to the platform we already deploy on, and the cost is negligible. Option 1 is fine if you want zero new services. Option 3 isn't worth the auth complexity for a once-weekly job.

Tasks

  • Pick option (cron service vs in-worker scheduler vs Actions)
  • Implement: roughly 30 lines including config, plus a render.yaml entry if going with Option 2
  • Pick a cadence (weekly? Monday 05:00 UTC works — Apple publishes refreshed feeds Sunday/Monday)
  • Document the cadence in the apple-music-catalog admin page so the user knows when next run is
  • Confirm dedup works: trigger manually, then trigger via the schedule path while it's still running, observe a no-op (existing sentinel UUID dedup should handle this without changes)

Acceptance

  • A scheduled run fires weekly without human intervention
  • The status page shows the most recent refresh as scheduled (vs manual) — easy to add a flag on the research_jobs.payload
  • A scheduled run during an existing chain is a no-op (no duplicate work, no error)

Out of scope

Failure alerting on a scheduled run that fails — that's #179 (separate concern; orthogonal to scheduling).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions