Skip to content

Speed up the capture set list view#1301

Open
mihow wants to merge 6 commits into
mainfrom
perf/sourceimagecollection-cached-counts
Open

Speed up the capture set list view#1301
mihow wants to merge 6 commits into
mainfrom
perf/sourceimagecollection-cached-counts

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented May 13, 2026

Summary

Two changes, both targeting SourceImageCollectionViewSet.list query cost on large M2M tables:

  1. Denormalize the 3 source-image counts as cached columns on SourceImageCollection. Replaces 3 per-row Subquery(Count(...)) annotations with a plain column SELECT; O(1) read regardless of M2M size.
  2. Introduce CachedCountField in ami/base/models.py — a marker subclass of IntegerField — and apply it to all 12 existing cached count columns project-wide. Discoverable via Model._meta.get_fields() + isinstance(f, CachedCountField) so future cross-cutting tasks (admin display, refresh jobs, list-endpoint defer) can enumerate them.

Stacked on #1300 (select_related on captures list). Base branch is perf/list-endpoint-subquery-rewrite; will retarget to main once #1300 merges.

Denormalization details

  • 3 new columns on SourceImageCollection: source_images_count, source_images_with_detections_count, source_images_processed_count. Schema migration 0085_denormalize_sourceimagecollection_counts creates them and backfills via a single GROUP BY + FILTER scan of the M2M (computes all three counts in one pass). atomic=False so the UPDATE can chunk on prod-sized data.
  • SourceImageCollection.get_source_image_counts() returns the 3 counts as a dict from a single aggregate query (no writes — pure compute, testable in isolation). update_calculated_fields(save=True) is the side-effecting wrapper that also writes the columns.
  • Signal handlers in ami/main/signals.py:
    • m2m_changed on SourceImageCollection.images.through for post_add, post_remove, post_clear.
    • Detection post_save / post_delete recompute for any collection containing the affected source_image.
  • pipeline.save_results() explicitly recomputes for affected collections after its bulk_create batch, since bulk_create skips signals.
  • Drops the 3 with_*_count() annotation methods on SourceImageCollectionQuerySet and removes them from the viewset chain.

CachedCountField details

  • New class in ami/base/models.pyclass CachedCountField(models.IntegerField). Pure marker; no field-level behavior change.
  • Applied to: Deployment.{events,occurrences,captures,detections,taxa}_count, Event.{captures,detections,occurrences}_count, SourceImage.detections_count, SourceImageCollection.{source_images,source_images_with_detections,source_images_processed}_count.
  • Migration 0086_use_cached_count_field is recorded but the generated SQL is (no-op) for every AlterField — the column type is unchanged. The migration only updates Django's recorded model state so future operations see the right field class.
  • Kept separate from the 0085 backfill migration so the schema-and-data migration is auditable on its own.

Why denormalize rather than the subquery rewrite from the earlier draft of #1300

The collections list UI does not paginate. The paginator COUNT(*) win from rewriting the annotations as correlated subqueries (the original #1300 approach) is invisible to UX. The remaining cost was 3 per-row count subqueries against the M2M + main_detection; on a collection holding 500k+ images, each request still spent hundreds of ms per subquery.

Bench data from the earlier draft also showed the subquery rewrite was slower than the original Count(images, filter=..., distinct=True) on small projects and roughly equal on the heaviest project. Cached columns are an unambiguous win across all project sizes.

Precedent: Deployment already had 5 cached counts (events_count, occurrences_count, captures_count, detections_count, taxa_count). The CachedCountField marker formalizes that pattern.

Recompute granularity tradeoff

Per-Detection signal recompute fires on every individual Detection.save() / Detection.delete(). For ML pipeline writes this is a non-issue: Detection.objects.bulk_create() skips signals, and pipeline.save_results() batches one recompute per job. The per-row signal handles non-bulk paths (manual workflows, retroactive ML, admin actions).

Tests

  • New TestSourceImageCollectionCountsDenormalize (7 cases) — image add/remove, Detection create/delete, null-bbox processed-but-not-with-detections case, update_calculated_fields() recovery from drift, pure-compute get_source_image_counts() returns dict without writes. A tautological test_initial_counts_zero was dropped during review — it asserted Django's default=0, not our code.
  • New TestSourceImageCollectionListQueryCount (3 cases) — page-size scaling, with_counts=true, ordering=-source_images_count (sort hits the cached column directly, no extra subquery).
  • Full ami.main + ami.ml + ami.jobs suites pass on local.

Rollout

  • 0085 creates columns with default=0 and backfills in one deploy. Backfill is the slow step on prod-sized data — measure on staging arbutus-2026 first.
  • 0086 is no-op SQL.
  • Signals fire on new writes immediately after deploy. The viewset switches from annotation to column reads in the same commit, so the deploy must include both the migration and the code.

Test plan

  • Local: full ami.main + ami.ml + ami.jobs test suites
  • Staging arbutus-2026: measure 0085 backfill duration on production-shape M2M
  • Staging: NR check that /api/v2/captures/collections/ p99 drops vs production

Summary by CodeRabbit

  • New Features

    • Added persistent cached count columns for collections, deployments, events and images plus transaction-aware async recompute to keep them up to date.
  • Improvements

    • Simplified collection list queries to rely on stored counts and prefetching for more predictable performance.
    • Added an integrity check and repair flow to detect and reconcile stale cached counts.
  • Tests

    • Expanded tests to validate count consistency across M2M, detection changes, and bulk operations.

Review Change Stack

@netlify
Copy link
Copy Markdown

netlify Bot commented May 13, 2026

Deploy Preview for antenna-preview ready!

Name Link
🔨 Latest commit 9ac2ce1
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6a06c92eb7c50b0008eaea85
😎 Deploy Preview https://deploy-preview-1301--antenna-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 65 (no change from production)
Accessibility: 89 (no change from production)
Best Practices: 92 (🔴 down 8 from production)
SEO: 92 (no change from production)
PWA: 80 (no change from production)
View the detailed breakdown and full score reports
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

📝 Walkthrough

Walkthrough

Adds a CachedCountField marker and a per-transaction dedup scheduler that enqueues Celery recompute tasks; migrates and backfills denormalized count columns, implements model/queryset update_cached_counts, wires signals and tasks, adds reconciliation checks, and expands tests for correctness and query behavior.

Changes

Denormalized Count Field Infrastructure

Layer / File(s) Summary
CachedCountField & scheduler
ami/base/models.py, ami/base/cached_counts.py
Adds CachedCountField and a per-DB-connection schedule_recompute with on_commit flush that enqueues one recompute_cached_counts_task per unique (model_label, pk).
Database schema and backfill migrations
ami/main/migrations/0085_denormalize_sourceimagecollection_counts.py, ami/main/migrations/0086_use_cached_count_field.py
Adds three denormalized count columns to SourceImageCollection and backfills them with a grouped UPDATE; then converts cached count fields across models to CachedCountField.
Model conversions & SourceImageCollection computation
ami/main/models.py
Converts Deployment/Event/SourceImage count fields to CachedCountField; removes with_source_images_* queryset helpers; adds source_images_* cached fields on SourceImageCollection; implements get_source_image_counts() and update_calculated_fields(save=...).
Signals, ML pipeline, and API view
ami/main/signals.py, ami/ml/models/pipeline.py, ami/main/api/views.py
Adds m2m_changed and Detection post_save/post_delete handlers that call .update_cached_counts(); refreshes collections after bulk Detection creation; simplifies SourceImageCollectionViewSet.queryset.
Celery tasks & reconciliation checks
ami/main/tasks.py, ami/main/checks/cached_counts.py
Adds recompute_cached_counts_task and reconcile_cached_counts_task; implements discovery, stale-finding, and reconcile_cached_counts which can repair synchronously via update_cached_counts(run_async=False).
Tests
ami/main/tests.py
Adds transaction-aware tests for list query counts, denormalization correctness under M2M/Detection changes, and cached-count reconciliation; adjusts query-count assertions and imports.
Design spec
docs/superpowers/specs/2026-05-14-cached-counts-update-method-design.md
Design document describing scheduler, update_cached_counts API, call-site changes, risks, migration path, and test plan (including dedup-focused test).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

  • RolnickLab/antenna#1300: Prior rewrite of with_source_images_*_count annotation logic; this PR replaces that approach with denormalized cached fields.
  • RolnickLab/antenna#1093: Adds/uses the SourceImageCollection "processed" count; this PR migrates that field to denormalized storage and adjusts where it's computed/used.

Suggested labels

backend

Suggested reviewers

  • annavik

Poem

"🐰 I hop through counts both new and old,
Cached fields snug, their stories told,
Signals nudge and tasks dedupe,
Tests confirm we didn't goof,
Hooray — small carrots for the code!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Speed up the capture set list view' accurately describes the main objective of this PR - optimizing the performance of the SourceImageCollection list view by denormalizing cached counts and introducing CachedCountField markers.
Description check ✅ Passed The PR description comprehensively covers all required template sections: summary, detailed list of changes, denormalization/CachedCountField details with technical depth, rationale, test coverage, and rollout/deployment notes. All major sections from the template are addressed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/sourceimagecollection-cached-counts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mihow mihow changed the title Denormalize SourceImageCollection counts as cached columns Speed up the capture set list view May 13, 2026
Comment thread ami/main/signals.py Outdated
Comment thread ami/ml/models/pipeline.py Outdated
@mihow mihow force-pushed the perf/sourceimagecollection-cached-counts branch from af24585 to 39a8f02 Compare May 13, 2026 23:08
Comment thread ami/main/models.py Outdated

# Denormalized counts. Kept in sync via m2m_changed and pipeline-completion
# hooks. Reads are O(1).
source_images_count = models.IntegerField(default=0)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we indicate which fields are cached counts? field properties? prefix? I believe we have a cached_ prefix on other models, but then it makes the method replacements not 1:1. maybe thats okay. open to ideas.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says: Checked — no cached_ prefix exists on any model field in the codebase. All denormalized count fields use the bare <thing>_count pattern: Deployment.events_count, Deployment.captures_count, Event.captures_count, SourceImage.detections_count, etc. The new fields match this convention.

I'd lean against adding a cached_ prefix just to this PR's fields — it would create a split where some counts are prefixed and most aren't, and method names (occurrences_count, taxa_count on the same model) wouldn't be 1:1 anymore as you noted. If we want the prefix project-wide, that's a separate refactor PR covering all the count fields.

Happy to do either — flag if you'd prefer the new fields prefixed and we adopt project-wide later, or leave matching the existing convention.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simple way to mark/tag/categorize/group model fields? To indicate these are cached counts that could be potentially stale or empty at any time? declare in a list on the model? add a model field property, etc?

Comment thread ami/main/models.py Outdated
@mihow mihow force-pushed the perf/sourceimagecollection-cached-counts branch from 39a8f02 to 401a613 Compare May 13, 2026 23:29
@mihow mihow mentioned this pull request May 14, 2026
2 tasks
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented May 14, 2026

Claude says: PR body updated. Two new pieces since the initial draft:

  1. CachedCountField marker class added to ami/base/models.py and applied to all 12 existing cached-count columns project-wide (Deployment×5, Event×3, SourceImage×1, SourceImageCollection×3). Migration 0086_use_cached_count_field is recorded but the generated SQL is no-op for every AlterField — column type unchanged. Kept separate from the 0085 backfill migration on purpose.
  2. Dropped a tautological test_initial_counts_zero (asserted Django's default=0, not our code) and rewrote a handful of em-dash fragment comments added during earlier review rounds.

@coderabbitai full review please. Base branch is perf/list-endpoint-subquery-rewrite (sister PR #1300, which strips down to a single select_related line). Will retarget to main once #1300 merges.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

@mihow Sure, I'll kick off a full review of this PR now, taking into account the updated PR body and the two new additions (the CachedCountField marker class with its no-op migration 0086, and the dropped tautological test).

✅ Actions performed

Full review triggered.

Comment thread ami/base/models.py
from ami.users.models import User


class CachedCountField(models.IntegerField):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohamedelabbas1996 I think we discussed a field type for cached counts last year. it finally happened!

mihow added a commit that referenced this pull request May 14, 2026
Three planning notes from 3f34b78 were committed under docs/planning/
(wrong convention — should be docs/claude/planning/ per CLAUDE.md) and
have since rotted: PR-A was superseded by the denormalize approach in
#1301; the .only() recommendation was reverted per review; PR-B1 ships
in this PR. Two genuinely useful items have been harvested:

- New Relic baseline numbers for 9 list endpoints (post-NR-12.1.0)
  → memory entry reference_nr_baseline_2026_05_11.md
- 22:22 UTC PG connection-pool burst hypothesis
  → issue #1302
- SourceImageViewSet.retrieve 571-call N+1 audit
  → issue #1303

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow force-pushed the perf/sourceimagecollection-cached-counts branch from d12f477 to f1019cd Compare May 14, 2026 01:13
mihow added a commit that referenced this pull request May 14, 2026
* perf(api): trim captures list SELECT via .only()

Replaces the unconditional wide SELECT (every column on main_sourceimage +
main_deployment + main_event) on SourceImageViewSet.list with a `.only()`
list scoped to fields SourceImageListSerializer actually reads. Row width
drops from 52 columns to 26.

select_related("deployment__data_source") added so SourceImage.public_url()'s
presigned-URL fallback (when public_base_url is blank) does not fire a per-row
lazy fetch.

SOURCE_IMAGE_LIST_ONLY_FIELDS is exposed as a module-level constant with a
maintenance docstring covering how to keep it in sync as the serializer or
model methods change. SOURCE_IMAGE_PUBLIC_URL_DEPENDENCIES is split out as a
sub-constant so the 11 fields that exist only to support presigned-URL
generation are visibly grouped — that whole chain goes away once images are
served through the upcoming image-resizing/CDN layer instead of directly
from source buckets.

Only applied to the list action; retrieve keeps the wide SELECT so the
detail serializer's broader field set is unaffected.

Tests:
- TestSourceImageListQueryCount extended with response-shape assertion
  guarding against `.only()` over-restriction (verifies url, size_display,
  deployment.name, event present without lazy loads).

Refs: docs/planning/2026-05-11-list-endpoint-perf-continuation-plan.md PR-B1.

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: list-endpoint perf planning trail

Adds:
- 2026-05-11-newrelic-post-upgrade-findings.md — NR prod data after the
  agent upgrade that surfaced the list-endpoint hot paths.
- 2026-05-11-list-endpoint-perf-analysis.md — SQL-level root-cause
  analysis for the 3 endpoints (collection list, captures list, project
  charts).
- 2026-05-11-list-endpoint-perf-continuation-plan.md — sequenced PR plan.
  PR-A (collection subquery rewrite) superseded by the denormalize-counts
  approach on perf/sourceimagecollection-cached-counts. PR-B1 ships in
  this PR. PR-C/D/E queued.

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor(api): move SourceImage list-only field groups into viewset class

Per review on #1300: pulled `SOURCE_IMAGE_LIST_ONLY_FIELDS` /
`SOURCE_IMAGE_PUBLIC_URL_DEPENDENCIES` out of module scope and into
`SourceImageViewSet` as class attributes, grouped by what reads them
(core, deployment-nested, event-nested, public-url). Trimmed the
docstring and added a short note on why we whitelist with `.only()`
instead of blacklisting with `.defer()`, with a link to the Django ref.

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor(api): drop .only() — select_related is the N+1 fix

Per @mihow review on #1300: the `.only()` machinery added in this PR was
acting as a row-width trim on top of select_related, not as the N+1 fix.
The actual N+1 prevention is `select_related("deployment__data_source")`
ensuring `SourceImage.public_url()` doesn't lazy-load the data_source
chain per row. With that in place, dropping `.only()` keeps the captures
list query count flat (verified empirically — `TestSourceImageListQueryCount`
still passes at limit=1, 5, 25).

This removes the maintenance burden of keeping the field tuples in sync
with the serializer surface. django-zen-queries and the existing perf
tests are the safety net for regressions.

Renamed `test_list_response_shape_preserved_after_only` to
`test_list_response_shape_has_no_lazy_loads` so the contract it asserts
is independent of any specific deferred-fields mechanism.

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: drop list-endpoint perf planning docs

Three planning notes from 3f34b78 were committed under docs/planning/
(wrong convention — should be docs/claude/planning/ per CLAUDE.md) and
have since rotted: PR-A was superseded by the denormalize approach in
#1301; the .only() recommendation was reverted per review; PR-B1 ships
in this PR. Two genuinely useful items have been harvested:

- New Relic baseline numbers for 9 list endpoints (post-NR-12.1.0)
  → memory entry reference_nr_baseline_2026_05_11.md
- 22:22 UTC PG connection-pool burst hypothesis
  → issue #1302
- SourceImageViewSet.retrieve 571-call N+1 audit
  → issue #1303

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Base automatically changed from perf/list-endpoint-subquery-rewrite to main May 14, 2026 01:18
mihow and others added 3 commits May 13, 2026 18:34
The collection list endpoint previously ran 3 correlated count subqueries
per row (source_images_count, source_images_with_detections_count,
source_images_processed_count). On large collections each subquery scans
the M2M and Detection tables. Reads are now O(1) against cached columns.

- Add 3 IntegerField columns on SourceImageCollection with default=0
- Migration 0085 backfills via a single GROUP BY over the M2M with FILTER
  clauses; atomic=False so prod-sized M2M can chunk
- update_calculated_fields() recomputes all 3 counts in one aggregate query
- m2m_changed on images.through keeps counts fresh on add/remove (both
  directions; reverse post_clear documented as caller responsibility)
- Detection post_save/post_delete recompute for collections containing the
  affected source image
- pipeline.save_results() recomputes for affected collections after the
  bulk_create batch (which skips signals)
- Drop the 3 with_*_count() annotation methods on SourceImageCollectionQuerySet
  and the chain in SourceImageCollectionViewSet.queryset
- TestSourceImageCollectionCountsDenormalize covers initial state, add/remove,
  Detection create/delete, null-bbox processed-but-not-with-detections case,
  and update_calculated_fields() recovery from drift
- TestSourceImageCollectionListQueryCount tightened: with_counts now <=10q
  (was <=15), ordering by source_images_count uses column directly

Follow-up to PR #1300 — the paginator COUNT(*) win there is mostly
invisible since the UI does not paginate collections; this is the actual
UX-visible improvement.

Co-Authored-By: Claude <noreply@anthropic.com>
…ormalized count columns

Add `CachedCountField(IntegerField)` to `ami/base/models.py` as a marker
subclass for cached aggregate columns. Swap it in for the 12 existing
denormalized count fields across Deployment, Event, SourceImage, and
SourceImageCollection.

Discoverable via `Model._meta.get_fields()` + `isinstance(f, CachedCountField)`
for future use (refresh tasks, admin display, list-endpoint defer()). The
DB column type is unchanged (IntegerField -> IntegerField), so the
generated migration `0086_use_cached_count_field` is a pure no-op at the
SQL level; it only updates Django's recorded model state.

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove `test_initial_counts_zero` from `TestSourceImageCollectionCountsDenormalize`;
  it asserted the `default=0` field-level default rather than any code path
  we own.
- Rewrite a handful of em-dash fragment comments added during PR review
  rounds (`get_source_image_counts` docstring, captures-list query-count
  assertion message, collection-list ordering comment, denormalize-test
  side-effect note) in normal prose.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow force-pushed the perf/sourceimagecollection-cached-counts branch from f1019cd to 2725599 Compare May 14, 2026 01:34
@netlify
Copy link
Copy Markdown

netlify Bot commented May 14, 2026

Deploy Preview for antenna-ssec ready!

Name Link
🔨 Latest commit 9ac2ce1
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/6a06c92e77c58b00085a5dfd
😎 Deploy Preview https://deploy-preview-1301--antenna-ssec.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/signals.py`:
- Around line 210-211: The signal currently calls
instance.update_calculated_fields(save=True) synchronously for actions
("post_add", "post_remove", "post_clear"), which can persist stale counts under
concurrent transactions; change these immediate calls to run after the DB
transaction commits by wrapping the call in django.db.transaction.on_commit
(e.g., transaction.on_commit(lambda:
instance.update_calculated_fields(save=True))). Apply the same replacement for
the other occurrence around lines referencing the same actions so recomputation
is deferred until successful commit.

In `@ami/main/tests.py`:
- Line 3386: Test fixtures are using normalized [0–1] bbox values but this repo
expects pixel-space coordinates; update the Detection/BoundingBox fixtures
(e.g., the Detection instances setting bbox=[0.1,0.1,0.2,0.2] and the other
occurrences at the mentioned sites) to use absolute pixel coordinates consistent
with the image dimensions used in these tests. Locate the test rows creating
Detection objects (look for Detection(..., bbox=...) and any direct
BoundingBox(...) usages) and replace the normalized floats with the
corresponding pixel values (or explicit integers) so the fixtures represent true
pixel-space bboxes across the cases mentioned (lines ~3386, ~3398, ~3424,
~3443).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e0f8a9e-539a-4e36-8acc-5da8d6ba207a

📥 Commits

Reviewing files that changed from the base of the PR and between 0dedd4c and 2725599.

📒 Files selected for processing (8)
  • ami/base/models.py
  • ami/main/api/views.py
  • ami/main/migrations/0085_denormalize_sourceimagecollection_counts.py
  • ami/main/migrations/0086_use_cached_count_field.py
  • ami/main/models.py
  • ami/main/signals.py
  • ami/main/tests.py
  • ami/ml/models/pipeline.py

Comment thread ami/main/signals.py Outdated
Comment thread ami/main/tests.py Outdated
Address opus takeaway-review findings on the SourceImageCollection
denormalize work.

Signal handlers no longer block on the recompute:
- Detection post_save/post_delete dedupes affected source_image_ids
  on a per-connection set and fans out one task per affected collection
  via transaction.on_commit. A 10k-detection bulk_create-loop (where
  signals fire) now triggers at most one task per affected collection
  instead of 10k synchronous aggregates per detection.
- m2m_changed on SourceImageCollection.images defers to a Celery task
  via transaction.on_commit.

New ami.main.checks.cached_counts module catches drift from bulk write
paths that skip signals (bulk_create, bulk_update, raw SQL, ML post-
processors like small_size_filter). Generic across every model that
declares a CachedCountField column — discovered via
Model._meta.get_fields(). Layered defense: signals = best-effort sync,
periodic reconcile_cached_counts_task = safety net.

This also gives CachedCountField its first consumer, addressing the
"marker class with no consumer" critique.

Other tweaks from the same review pass:
- CachedCountField docstring documents migration-friction tradeoff
  (future AlterField for unrelated IntegerFields near cached ones).
- SourceImageCollection.update_calculated_fields docstring explains
  the .filter(pk=).update() vs .save() divergence on the other 3 models.
- TestSourceImageCollectionCountsDenormalize → TransactionTestCase +
  eager Celery so on_commit hooks actually fire in tests.
- TestCachedCountsIntegrityCheck (5 tests) covers discover / find /
  reconcile dry-run / reconcile repair / no-drift baseline.
- Detection bbox fixtures: [0.1...] → [10, 10, 20, 20] for consistency
  with codebase pixel-space convention (CodeRabbit nit).

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 14, 2026 01:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR speeds up capture-set list queries by replacing per-row source-image count annotations with denormalized cached columns and adding infrastructure to discover/reconcile cached count drift.

Changes:

  • Adds cached count fields and migrations for SourceImageCollection, plus CachedCountField marker usage.
  • Updates list/query logic, signals, tasks, and pipeline bulk-save handling to refresh cached collection counts.
  • Adds integrity-check helpers and regression/query-count tests.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
ami/base/models.py Adds CachedCountField marker class.
ami/main/models.py Converts cached count fields and adds SourceImageCollection count recomputation.
ami/main/api/views.py Removes source-image count annotations from collection queryset.
ami/main/signals.py Adds M2M/detection signal handlers for collection count refresh.
ami/main/tasks.py Adds collection refresh and cached-count reconciliation Celery tasks.
ami/main/checks/cached_counts.py Adds cached-count discovery, drift detection, and reconciliation helpers.
ami/ml/models/pipeline.py Refreshes affected collection counts after bulk-created detections.
ami/main/tests.py Adds denormalized count, integrity-check, and list query-count tests.
ami/main/migrations/0085_denormalize_sourceimagecollection_counts.py Adds/backfills SourceImageCollection cached count columns.
ami/main/migrations/0086_use_cached_count_field.py Updates migration state to use CachedCountField.
Comments suppressed due to low confidence (3)

ami/main/signals.py:268

  • This only refreshes collections for the detection's current source_image_id. Because Detection.source_image is writable through the serializer/API, moving a detection to another capture leaves collections containing the old source image with stale processed/with-detections counts; the old source image id needs to be captured before save and refreshed too when it changes.
@receiver(post_save, sender=Detection)
@receiver(post_delete, sender=Detection)
def update_collection_counts_on_detection_change(sender, instance, **kwargs):
    """Schedule a collection-counts refresh for every collection containing the affected SourceImage.

    The dedup + on_commit indirection means even tight per-row Detection write
    loops fan out to at most one task per affected collection. ``bulk_create``
    / ``bulk_update`` skip signals entirely — those rely on the periodic
    reconciliation task to repair drift.
    """
    if not instance.source_image_id:
        return
    _schedule_collection_refresh_for_source_image(instance.source_image_id)

ami/main/signals.py:253

  • This does not cover SourceImage deletion: Django removes auto-created M2M through rows during cascade delete without sending m2m_changed, and the detection post_delete path can no longer discover the collections once the image membership is gone. Deleting captures will therefore leave source_images_count (and derived processed/detection counts) stale unless affected collection ids are captured before the image is deleted.
@receiver(m2m_changed, sender=SourceImageCollection.images.through)
def update_collection_counts_on_m2m(sender, instance, action, **kwargs):
    """Recompute denormalized counts when images are added to or removed from a collection."""
    if action in ("post_add", "post_remove", "post_clear"):
        collection_pk = instance.pk
        transaction.on_commit(lambda: refresh_collection_cached_counts.delay(collection_pk))

ami/main/signals.py:253

  • The handler assumes instance is always a SourceImageCollection, but m2m_changed also fires for the reverse manager (source_image.collections.add/remove(...)) with instance as a SourceImage and the collection ids in pk_set. In that path this schedules a refresh for the source-image id instead of the affected collection ids, leaving the real collection counts stale.
@receiver(m2m_changed, sender=SourceImageCollection.images.through)
def update_collection_counts_on_m2m(sender, instance, action, **kwargs):
    """Recompute denormalized counts when images are added to or removed from a collection."""
    if action in ("post_add", "post_remove", "post_clear"):
        collection_pk = instance.pk
        transaction.on_commit(lambda: refresh_collection_cached_counts.delay(collection_pk))

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ami/main/signals.py Outdated
Comment on lines +235 to +245
pending = getattr(connection, _PENDING_SOURCE_IMAGE_IDS_ATTR, None)
is_new = pending is None
if is_new:
pending = set()
setattr(connection, _PENDING_SOURCE_IMAGE_IDS_ATTR, pending)
pending.add(source_image_id)
if is_new:
# Outside an atomic block, ``on_commit`` fires synchronously at
# registration time — so the ``add`` above must precede it or the
# flush sees an empty set.
transaction.on_commit(_flush_pending_collection_refreshes)
Comment on lines +88 to +89
for instance, stored, computed in find_stale_cached_counts(m, project_id=project_id):
result.checked += 1
Comment thread ami/main/models.py
Comment on lines +4184 to +4186
source_images_count = CachedCountField(default=0)
source_images_with_detections_count = CachedCountField(default=0)
source_images_processed_count = CachedCountField(default=0)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch

Comment on lines +73 to +87
migrations.AddField(
model_name="sourceimagecollection",
name="source_images_count",
field=models.IntegerField(default=0),
),
migrations.AddField(
model_name="sourceimagecollection",
name="source_images_with_detections_count",
field=models.IntegerField(default=0),
),
migrations.AddField(
model_name="sourceimagecollection",
name="source_images_processed_count",
field=models.IntegerField(default=0),
),
Comment thread ami/ml/models/pipeline.py
Comment on lines +1002 to +1005
# bulk_create above skips Detection signals; refresh affected collections explicitly.
source_image_ids = [img.pk for img in source_images]
for collection in SourceImageCollection.objects.filter(images__id__in=source_image_ids).distinct():
collection.update_calculated_fields(save=True)
Comment thread ami/main/tests.py
# occurrences/taxa subquery annotations.
self.assertLessEqual(count, 10, f"Collection list with_counts too many queries: {count}")

def test_list_query_count_ordering_by_annotated_count(self):
Comment thread ami/main/tests.py
Comment on lines +3325 to +3329
def test_list_query_count_does_not_scale_with_page_size(self):
small = self._list_query_count(f"/api/v2/captures/collections/?project_id={self.project.pk}&limit=1")
large = self._list_query_count(f"/api/v2/captures/collections/?project_id={self.project.pk}&limit=25")
print(f"\n[AUDIT] Collection list: limit=1 -> {small}q, limit=25 -> {large}q")
self.assertLessEqual(large, small + 2, f"Collection list scaling: {small} -> {large} (likely N+1)")
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented May 14, 2026

Claude says: Self-suggestion for follow-up PR (or fold into this one): simplify the cached-counts plumbing introduced here. Current shape works but has three per-model pieces (dedup state attr + flush helper + per-model Celery task). Next cached-count addition repeats all three.

Proposed consolidation

One free helper + queryset/instance methods on BaseModel / BaseQuerySet. Fits the established run_async= flag pattern (Deployment.save(regroup_async=), Job.retry(async_task=), process_single_source_image(run_async=)).

# ami/base/cached_counts.py
_PENDING_ATTR = "_pending_cached_count_recomputes"

def _schedule_recompute(model, pk):
    """Per-connection dedup + transaction.on_commit dispatch.

    Outside an atomic block, on_commit fires synchronously at registration
    time, so we set the pending set BEFORE registering the flush callback.
    """
    pending = getattr(connection, _PENDING_ATTR, None)
    is_new = pending is None
    if is_new:
        pending = set()
        setattr(connection, _PENDING_ATTR, pending)
    pending.add((model._meta.label, pk))
    if is_new:
        transaction.on_commit(_flush)

def _flush():
    pending = getattr(connection, _PENDING_ATTR, set())
    try: delattr(connection, _PENDING_ATTR)
    except AttributeError: pass
    for label, pk in pending:
        recompute_cached_counts_task.delay(label, pk)

@shared_task(ignore_result=True)
def recompute_cached_counts_task(model_label, pk):
    model = apps.get_model(model_label)
    try: instance = model.objects.get(pk=pk)
    except model.DoesNotExist: return
    instance.update_calculated_fields(save=True)
# ami/base/models.py
class BaseModel(models.Model):
    def update_cached_counts(self, run_async: bool = True) -> None:
        if run_async:
            _schedule_recompute(type(self), self.pk)
            return
        self.update_calculated_fields(save=True)

class BaseQuerySet(QuerySet):
    def update_cached_counts(self, run_async: bool = True) -> None:
        for pk in self.values_list("pk", flat=True):
            if run_async:
                _schedule_recompute(self.model, pk)
            else:
                self.model.objects.get(pk=pk).update_calculated_fields(save=True)

Call sites collapse

Pipeline save_results:

SourceImageCollection.objects.filter(
    images__id__in=source_image_ids
).update_cached_counts(run_async=False)

Signals:

@receiver(post_save, sender=Detection)
@receiver(post_delete, sender=Detection)
def update_collection_counts_on_detection_change(sender, instance, **kwargs):
    if not instance.source_image_id:
        return
    SourceImageCollection.objects.filter(
        images__id=instance.source_image_id
    ).update_cached_counts()  # async by default

Reconcile loop in ami/main/checks/cached_counts.py swaps update_calculated_fields(save=True)update_cached_counts(run_async=False).

What goes away

  • _PENDING_SOURCE_IMAGE_IDS_ATTR + _flush_pending_collection_refreshes + _schedule_collection_refresh_for_source_image in signals.py (per-model state attrs disappear)
  • refresh_collection_cached_counts(collection_id) task in ami/main/tasks.py (replaced by generic recompute_cached_counts_task(model_label, pk))
  • Per-model signal handlers that need to know which Celery task to call

What stays

  • CachedCountField marker (still drives reconcile discovery)
  • Per-model update_calculated_fields(save=True) (recompute logic lives where the model lives)
  • Periodic reconcile_cached_counts_task (safety net for bulk-write paths that bypass signals)
  • .filter(pk=).update() vs .save() shape divergence in update_calculated_fields — documented inline, leaving alone

Next cached-count addition

Three things instead of six:

  1. Field: foo_count = CachedCountField()
  2. Recompute body in update_calculated_fields(save)
  3. One signal handler per write source, each ending in .update_cached_counts()

Reconcile + drift detection are free.

What I'd NOT do

  • Declarative invalidate_on=[...] on the field — hides signal wiring, adds magic
  • Drop signals entirely + rely on TTL reconciliation — breaks interactive freshness (just-uploaded image not showing in collection)
  • Lazy recompute-on-read — read path doing writes is its own footgun

Happy to land as a follow-up PR (cleaner review scope) or fold into this branch if reviewers prefer the unified pattern in one place.

mihow and others added 2 commits May 14, 2026 23:50
Captures the design behind replacing PR #1301's per-source-table dedup
state + per-model Celery refresh tasks with a generic instance/queryset
method on BaseModel/BaseQuerySet. Wraps update_calculated_fields(save=True)
with caller-controlled sync vs async (run_async=True default) and per-
(model, pk) dedup keyed on the DB connection.

Co-Authored-By: Claude <noreply@anthropic.com>
…t method

Replace the per-source-table dedup state + per-model Celery refresh tasks
introduced in 1a242aa with a generic `update_cached_counts(run_async=True)`
method on `BaseModel` and `BaseQuerySet`. Wraps the existing
`update_calculated_fields(save=True)` body; per-(model_label, pk) dedup
keyed on the active DB connection collapses high-volume signal fan-out to
one task per affected row.

Architecture:
- `ami/base/cached_counts.py` — `schedule_recompute(label, pk)` queues a
  recompute in a per-connection dedup set; `_flush_pending_recomputes` is
  registered via `transaction.on_commit` (once per call, idempotent on the
  drain side so it survives rollback-then-fresh-transaction in tests).
- `BaseModel.update_cached_counts(run_async=True)` — schedules a recompute
  for `self`. With `run_async=False` runs inline.
- `BaseQuerySet.update_cached_counts(run_async=True)` — schedules one per
  row in the queryset.
- `ami/main/tasks.py::recompute_cached_counts_task` — generic Celery task,
  loads model via `apps.get_model(label)` and calls
  `update_calculated_fields(save=True)`.

Call site changes:
- Detection signal handler is now one line:
  `SourceImageCollection.objects.filter(images__id=instance.source_image_id).update_cached_counts()`
- Collection m2m handler: `instance.update_cached_counts()`.
- `refresh_project_cached_counts` body keeps the existing
  `project.update_related_calculated_fields()` (preserves the bulk subquery
  UPDATE for `SourceImage.detections_count`); body no longer duplicates the
  Events/Deployments loop here.
- Reconcile loop in `ami/main/checks/cached_counts.py` calls
  `instance.update_cached_counts(run_async=False)` for stylistic unity.

Removed:
- `_PENDING_SOURCE_IMAGE_IDS_ATTR`, `_flush_pending_collection_refreshes`,
  `_schedule_collection_refresh_for_source_image` (signals.py)
- `refresh_collection_cached_counts` task (tasks.py)

Cost of adding the next cached-count column drops from ~50 LOC (new field +
recompute body + dedup attr + flush helper + Celery task + signal handler)
to ~10 LOC (new field + recompute body + signal handler calling the method).

Design rationale and trade-offs in
`docs/superpowers/specs/2026-05-14-cached-counts-update-method-design.md`.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented May 15, 2026

Claude says: Landed in 9ac2ce17. update_cached_counts(run_async=True) lives on BaseModel and BaseQuerySet; backing module is ami/base/cached_counts.py. Detection signal handler is one line. refresh_collection_cached_counts task is gone; refresh_project_cached_counts body unchanged from this PR (kept the bulk-subquery UPDATE for SourceImage.detections_count).

Test isolation note worth flagging: the per-connection dedup set survives TestCase rollback because connection attribute state is not reset between tests, but the registered on_commit callbacks are discarded. Fix is to register on_commit on every schedule_recompute call and make _flush_pending_recomputes idempotent on the drain side (first call drains, subsequent calls see an empty set and no-op). That keeps the dedup benefit (10k Detection writes still collapse to one task per collection) without the rollback leak.

Reconcile loop in ami/main/checks/cached_counts.py switched to update_cached_counts(run_async=False) for stylistic unity — same behavior since reconcile is already in a worker.

Design doc + trade-offs in docs/superpowers/specs/2026-05-14-cached-counts-update-method-design.md (committed in 451c8be).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
ami/base/models.py (1)

74-80: ⚡ Quick win

Optimize the sync path to avoid N+1 queries.

The sync path (lines 79-80) loads instances one at a time with self.model.objects.get(pk=pk), creating N+1 queries. Since the async path only needs PKs but the sync path needs full instances, branch on run_async and use self.iterator() for the sync path to load instances efficiently in a single query.

⚡ Proposed refactor
-        model_label = self.model._meta.label
-        for pk in self.values_list("pk", flat=True):
-            if run_async:
+        if run_async:
+            model_label = self.model._meta.label
+            for pk in self.values_list("pk", flat=True):
                 schedule_recompute(model_label, pk)
-            else:
-                instance = self.model.objects.get(pk=pk)
+        else:
+            for instance in self.iterator():
                 instance.update_calculated_fields(save=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/base/models.py` around lines 74 - 80, The current sync path does N+1
queries by calling self.model.objects.get(pk=pk) for each pk; change the logic
to branch on run_async: if run_async keep the existing loop over
self.values_list("pk", flat=True) and call schedule_recompute(model_label, pk),
but for the sync path iterate the queryset itself using self.iterator() (or
another queryset iterator) to load instances efficiently and call
instance.update_calculated_fields(save=True) on each instance; reference
model_label, schedule_recompute, run_async, self.values_list("pk", flat=True),
self.iterator(), and update_calculated_fields when applying the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/tasks.py`:
- Line 11: The task function recompute_cached_counts_task currently types pk as
int but should accept any primary key type to match schedule_recompute in
ami/base/cached_counts.py; change the signature to use pk: Any and add "from
typing import Any" at the top of ami/main/tasks.py, then pass pk through to
schedule_recompute unchanged so type checkers no longer flag the mismatch (refer
to recompute_cached_counts_task and schedule_recompute).

---

Nitpick comments:
In `@ami/base/models.py`:
- Around line 74-80: The current sync path does N+1 queries by calling
self.model.objects.get(pk=pk) for each pk; change the logic to branch on
run_async: if run_async keep the existing loop over self.values_list("pk",
flat=True) and call schedule_recompute(model_label, pk), but for the sync path
iterate the queryset itself using self.iterator() (or another queryset iterator)
to load instances efficiently and call
instance.update_calculated_fields(save=True) on each instance; reference
model_label, schedule_recompute, run_async, self.values_list("pk", flat=True),
self.iterator(), and update_calculated_fields when applying the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e47d60b5-368e-40bd-a816-14549cda47b0

📥 Commits

Reviewing files that changed from the base of the PR and between 2725599 and 9ac2ce1.

📒 Files selected for processing (8)
  • ami/base/cached_counts.py
  • ami/base/models.py
  • ami/main/checks/cached_counts.py
  • ami/main/models.py
  • ami/main/signals.py
  • ami/main/tasks.py
  • ami/main/tests.py
  • docs/superpowers/specs/2026-05-14-cached-counts-update-method-design.md
✅ Files skipped from review due to trivial changes (1)
  • docs/superpowers/specs/2026-05-14-cached-counts-update-method-design.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • ami/main/signals.py
  • ami/main/models.py

Comment thread ami/main/tasks.py


@celery_app.task(ignore_result=True)
def recompute_cached_counts_task(model_label: str, pk: int) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Widen the pk type annotation to support non-integer primary keys.

The signature declares pk: int, but schedule_recompute in ami/base/cached_counts.py accepts pk: Any (line 27), and Django models can use UUIDField or other types as primary keys. While Celery serialization handles this at runtime, type checkers will flag the mismatch.

🔧 Proposed fix
-def recompute_cached_counts_task(model_label: str, pk: int) -> None:
+def recompute_cached_counts_task(model_label: str, pk: Any) -> None:

Add the import at the top if not already present:

+from typing import Any
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def recompute_cached_counts_task(model_label: str, pk: int) -> None:
from typing import Any
def recompute_cached_counts_task(model_label: str, pk: Any) -> None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/tasks.py` at line 11, The task function recompute_cached_counts_task
currently types pk as int but should accept any primary key type to match
schedule_recompute in ami/base/cached_counts.py; change the signature to use pk:
Any and add "from typing import Any" at the top of ami/main/tasks.py, then pass
pk through to schedule_recompute unchanged so type checkers no longer flag the
mismatch (refer to recompute_cached_counts_task and schedule_recompute).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants