Skip to content

Improve CO save performance for bulk updates#8057

Open
acwhite211 wants to merge 60 commits into
v7_12_0_6from
issue-8055
Open

Improve CO save performance for bulk updates#8057
acwhite211 wants to merge 60 commits into
v7_12_0_6from
issue-8055

Conversation

@acwhite211
Copy link
Copy Markdown
Member

@acwhite211 acwhite211 commented May 4, 2026

Fixes #8055

The new CollectionObject/Component catalognumber uniqueness rule was running during every CollectionObject save, even when the catalognumber and collection were unchanged. These catalognumber checks seemed to be unnecessary likely the main cause of the slowdown during batch edits, data set commits, and bulk copy operations.

The most straight forward fix seemed to be skipping the component catalognumber duplicate check when a CO save does not change catalognumber or collection. I also added caching to the CollectionPreferences uniqueness setting during bulk create, bulk copy, and data set upload operations. One other thing that might add some speed up is scoping component duplicate lookups to just the current collection? I added that for now, but might remove it if it doesn't provide much speedup or needs to remain unique between collections.

Still need to do some more testing and performance checking to confirm this solution completely solves the issue.

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add pr to documentation list

Testing instructions

#8055 (comment)

  • In Batch Edit, update a non-catalognumber field across many Collection Objects and confirm the operation completes noticeably faster than before.
  • In Data Set commit, create or update Collection Objects and confirm progress advances steadily without the previous long delay.
  • Create a Collection Object with a catalognumber that matches a Component in the same collection and confirm the duplicate warning still appears.
  • Create or update a Collection Object with a catalog number that matches a Component in a different collection and confirm it is allowed.
  • Confirm regular Collection Object create/edit workflows still save successfully.

Test cat num uniqueness feat from #6708

Uniqueness: Preference to have catalog number unique across CO and Component table

  • In your collection preference add:
    "uniqueCatalogNumberAccrossComponentAndCO": {"behavior": {"uniqueness": true}}
  • Save
  • Clear cache
  • Refresh
  • Create a new CO
  • Save
  • Open a new tab
  • Create a CO
  • Save
  • Add a component
  • Enter the cat num used in your first CO in your component cat number field
  • Try to save
  • Verify there is a save blocker saying that the cat num value is already in use
  • Do the same but this time use a cat num value from another component in your component cat num field
  • Verify there is also a save blocker when using another component cat num value

Summary by CodeRabbit

  • Performance Improvements

    • Added caching for bulk uploads and batch operations to reduce database queries during large data imports.
    • Optimized permission query lookups with result caching.
    • Cached remote preferences to improve view rendering performance.
  • Improvements

    • Enhanced catalog number uniqueness validation to be more collection-aware and performant.
  • Chores

    • Reorganized Redis cache utilities into a new module structure.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 104402fc-e096-459f-86d2-e8a0df917ea3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces thread-local caching for business-rule and permission queries to address database query performance bottlenecks. It adds a ThreadCache wrapper around ContextVar for context-scoped memoization, refactors unique catalog-number preference lookups and uniqueness-rule evaluation to use cached metadata, consolidates Redis utilities into a new module structure, refactors remote preference access, and wires cache contexts into bulk API and upload flows to reduce repeated database queries during multi-object operations.

Changes

Caching infrastructure and business-rule optimization

Layer / File(s) Summary
ThreadCache infrastructure
specifyweb/backend/cache/thread.py
Introduce ThreadCache[K, V], a generic wrapper around ContextVar holding an optional dict, with context-manager-activated caching, get/set/get_or_set operations using MISSING sentinels, and membership testing.
Unique catalog-number preference caching
specifyweb/backend/businessrules/utils.py
Cache unique-catalog-number preference lookups and component catalog-number existence queries via ThreadCache; add cache_unique_catnum_preferences() context manager, cached resolvers for Collection/Agent by id, and new public APIs get_cached_unique_catnum_across_comp_co_coll_pref() and get_unique_catnum_across_comp_co_coll_pref_by_ids().
Uniqueness rules caching and validate_unique refactor
specifyweb/backend/businessrules/uniqueness_rules.py
Introduce CachedUniquenessRule dataclass, per-context caches for rule metadata (model, scope fields, all fields, field names), migration-applied detection; refactor validate_unique to gate on migration-applied check and skip evaluation using cached rule metadata; add _rule_fields_exactly_match() helper to prevent redundant rule creation/removal.
Permission query caching
specifyweb/backend/permissions/permissions.py
Introduce _permission_query_cache ThreadCache keyed by PermRequest; extract policy lookups into _get_spuser_policies and _get_role_policies helpers; wrap result with get_or_set() for per-request memoization; add cache_permission_queries() context manager.
Collection object catalog-check refactoring
specifyweb/backend/businessrules/rules/collectionobject_rules.py
Add _collection_object_catalog_check_needed() to gate when catalog-number uniqueness enforcement runs; set collectionmemberid from collection_id and collectionobjecttype_id via default resolver; simplify duplication checking to use cached preference lookup and component_catalog_number_exists() with collection scoping.
Component cache lifecycle management
specifyweb/backend/businessrules/rules/component_rules.py
Clear _component_catnum_cache before pre_save processing and add component_pre_delete handler to clear cache prior to deletion; switch to cached preference helper.
Bulk API endpoints cache wiring
specifyweb/backend/bulk_copy/bulk_copy.py, specifyweb/specify/api/dispatch.py
Wrap per-object creation loops in collection_dispatch_bulk_copy, collection_dispatch_bulk, and dispatch rows endpoint inside combined cache contexts to enable memoization across multiple object creations.
Upload flow cache wiring
specifyweb/backend/workbench/upload/upload.py
Extend do_upload to activate caching contexts around main upload savepoint to reduce repeated queries across multiple object updates.
Remote preferences refactoring
specifyweb/backend/context/remote_prefs.py, specifyweb/backend/context/views.py, specifyweb/backend/stored_queries/format.py, specifyweb/backend/workbench/upload/auditlog.py
Refactor remote/global preference access to use separate thread-local ThreadCache instances; add typed preference keys, cache_remote_preferences() context manager, generic parsing, and cache-aware single-key getters (get_remote_pref, get_global_pref) instead of bulk-fetch-and-regex pattern.
Non-cached collectionobject rules layer update
specifyweb/businessrules/rules/collectionobject_rules.py
Update to use get_cached_unique_catnum_across_comp_co_coll_pref instead of non-cached variant.

Redis module consolidation and documentation

Layer / File(s) Summary
Redis module new structure and exports
specifyweb/backend/cache/redis/__init__.py, specifyweb/backend/cache/redis/connect.py, specifyweb/backend/cache/redis/store.py
Create new specifyweb/backend/cache/redis/ module structure; populate __init__.py to re-export Redis cache helpers; add documentation comments and REFACTOR note indicating connection-adapter transition.
Cache module README
specifyweb/backend/cache/README.md
Add description of cache utilities and adapters for performance improvements.
Redis import path updates throughout codebase
specifyweb/backend/setup_tool/\*, specifyweb/backend/trees/\*, specifyweb/specify/auth/support_login.py, specifyweb/specify/migrations/0043_normalize_many_to_many.py, specifyweb/specify/utils/autonumbering.py
Update all imports to use specifyweb.backend.cache.redis instead of old specifyweb.backend.redis_cache for consistency.
Old redis_cache module cleanup
specifyweb/backend/redis_cache/__init__.py
Remove re-exports of redis cache functions from old module.

Frontend helpers and business-rule tests

Layer / File(s) Summary
Frontend batch-edit helper extraction
specifyweb/frontend/js_src/lib/components/BatchEdit/index.tsx
Extract buildBatchEditFromQueryBody helper to construct batch-edit POST payload; refactor BatchEditFromQuery to call the helper with stored query, field captions, dataset ids, tree defs filtering, relationships config, and limit parameter.
Unique catalog-number and uniqueness caching tests
specifyweb/backend/businessrules/tests/test_collectionobject.py, specifyweb/backend/businessrules/tests/test_uniqueness_rules.py
Add tests for stale-id handling, collection-scoped catalog caching, migration-applied rechecking, cached rule field preservation, and superset/subset rule matching; add _create_uniqueness_rule helper for test rule creation.
Permission query caching test
specifyweb/backend/permissions/tests.py
Add test verifying permission query updates in real time when policies are created/deleted.
Scoping comparison fix
specifyweb/specify/utils/scoping.py
Update scope equality to compare pk instead of id fields.

Suggested reviewers

  • CarolineDenis
  • grantfitzsimmons
🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Testing Instructions ⚠️ Warning Documented testing instructions claimed in PR objectives do not exist. No manual testing steps for Batch Edit, Data Set, Component uniqueness, or workflows are present in PR or repository. Document manual testing steps for Batch Edit, Data Set commits, Component/CO uniqueness, and create/edit workflows with/without catalog number changes.
Out of Scope Changes check ❓ Inconclusive The PR includes infrastructure changes to Redis cache imports (specifyweb.backend.cache.redis refactoring) and creates new caching primitives (ThreadCache, context managers) that support the main performance fix but are somewhat tangential to the core catalognumber optimization. Clarify whether the Redis cache module reorganization and new ThreadCache abstractions are necessary dependencies for the performance fix, or whether they could be separated into a preparatory PR for better change isolation.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Improve CO save performance for bulk updates' directly summarizes the main change: optimizing CollectionObject save performance during bulk operations by skipping unnecessary uniqueness checks.
Linked Issues check ✅ Passed The PR comprehensively addresses #8055 by: (1) skipping catalognumber duplicate checks when unchanged, (2) adding CollectionPreferences caching during bulk operations, (3) implementing per-collection Component catalognumber caching, and (4) reverting case-sensitivity checks that contributed to the slowdown.
Automatic Tests ✅ Passed PR includes 7 substantive new test methods covering cache behavior, collection scoping, rule integrity, and data correctness. Tests use proper fixtures and assertions for core functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-8055

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@acwhite211 acwhite211 changed the base branch from main to v7_12_0_6 May 4, 2026 22:20
@acwhite211
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
specifyweb/backend/businessrules/utils.py (1)

33-57: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid caching the fallback False after a preference read/parsing failure.

Right now any exception path leaves unique_catnum_enabled = False and still stores that value in the context cache. A single transient get_app_resource(...)/JSON failure would then disable the uniqueness rule for every later save in the same bulk operation.

Suggested fix
 def get_unique_catnum_across_comp_co_coll_pref(collection, user) -> bool:
     import specifyweb.backend.context.app_resource as app_resource

     cache = _unique_catnum_pref_cache.get()
@@
-    unique_catnum_enabled: bool = False
+    unique_catnum_enabled: bool = False
+    cacheable = True
@@
     except json.JSONDecodeError:
+        cacheable = False
         logger.warning(f"Error: Could not decode JSON for collection preferences")
     except TypeError as e:
+        cacheable = False
         logger.warning(f"Error: Unexpected data structure in collection preferences: {e}")
     except Exception as e:
+        cacheable = False
         logger.warning(f"An unexpected error occurred: {e}")

-    if cache is not None:
+    if cache is not None and cacheable:
         cache[cache_key] = unique_catnum_enabled
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/backend/businessrules/utils.py` around lines 33 - 57, The current
try/except always leaves unique_catnum_enabled=False on errors and then writes
that fallback into cache; change this so the cache is only updated when the
preference read/parse actually succeeded. For example, after calling
app_resource.get_app_resource and json.loads and validating the nested keys (the
logic that computes unique_catnum_enabled from collection_prefs_json,
unique_catalog_number_pref and behavior), set a local success flag (or move the
cache write) and only assign cache[cache_key] = unique_catnum_enabled when
success is True and cache is not None; do not write to cache inside any except
path that handles JSONDecodeError/TypeError/Exception. Ensure references:
unique_catnum_enabled, cache, cache_key, app_resource.get_app_resource,
collection_prefs_json, json.loads, unique_catalog_number_pref, behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@specifyweb/backend/businessrules/utils.py`:
- Around line 33-57: The current try/except always leaves
unique_catnum_enabled=False on errors and then writes that fallback into cache;
change this so the cache is only updated when the preference read/parse actually
succeeded. For example, after calling app_resource.get_app_resource and
json.loads and validating the nested keys (the logic that computes
unique_catnum_enabled from collection_prefs_json, unique_catalog_number_pref and
behavior), set a local success flag (or move the cache write) and only assign
cache[cache_key] = unique_catnum_enabled when success is True and cache is not
None; do not write to cache inside any except path that handles
JSONDecodeError/TypeError/Exception. Ensure references: unique_catnum_enabled,
cache, cache_key, app_resource.get_app_resource, collection_prefs_json,
json.loads, unique_catalog_number_pref, behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7b811e07-b495-4def-9f5b-ac142c11a48f

📥 Commits

Reviewing files that changed from the base of the PR and between 52eaa9d and b3b0b2c.

📒 Files selected for processing (5)
  • specifyweb/backend/bulk_copy/bulk_copy.py
  • specifyweb/backend/businessrules/rules/collectionobject_rules.py
  • specifyweb/backend/businessrules/utils.py
  • specifyweb/backend/workbench/upload/upload.py
  • specifyweb/specify/api/dispatch.py

@acwhite211
Copy link
Copy Markdown
Member Author

Merci, Monsieur Lapin 🐰

@CarolineDenis CarolineDenis added this to the 7.12.0.6 milestone May 5, 2026
@acwhite211
Copy link
Copy Markdown
Member Author

If this fix isn't enough of a speed up, I have another solution idea that involves a schema migration for adding an index on the Component.catalognumber field:

    operations = [
        migrations.AddIndex(
            model_name="component",
            index=models.Index(
                fields=["catalognumber"],
                name="ComponentCatNumIDX",
            ),
        ),
    ]

@acwhite211 acwhite211 marked this pull request as ready for review May 6, 2026 15:19
@acwhite211 acwhite211 requested review from a team May 6, 2026 15:19
Copy link
Copy Markdown
Collaborator

@bhumikaguptaa bhumikaguptaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • In Batch Edit, update a non-catalognumber field across many Collection Objects and confirm the operation completes noticeably faster than before.
  • In Data Set commit, create or update Collection Objects and confirm progress advances steadily without the previous long delay.
  • Create a Collection Object with a catalognumber that matches a Component in the same collection and confirm the duplicate warning still appears.
  • Create or update a Collection Object with a catalog number that matches a Component in a different collection and confirm it is allowed.
  • Confirm regular Collection Object create/edit workflows still save successfully.

Everything works as expected except when I tried to upload a dataset with approximately 12000 records via Workbench, it took around 20 minutes (and is still going on) to validate and then upload the dataset. The time it took is similar to the time Demo fish/main took. I am unsure if it is supposed to take this much time.

Image

Copy link
Copy Markdown
Contributor

@kwhuber kwhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • In Batch Edit, update a non-catalognumber field across many Collection Objects and confirm the operation completes noticeably faster than before.

  • In Data Set commit, create or update Collection Objects and confirm progress advances steadily without the previous long delay.

  • Create a Collection Object with a catalognumber that matches a Component in the same collection and confirm the duplicate warning still appears.

  • Create or update a Collection Object with a catalog number that matches a Component in a different collection and confirm it is allowed.

  • Confirm regular Collection Object create/edit workflows still save successfully.

  • For Data set commit, I noticed marginal improvement, but it wasn't too high of a wait to begin with. I altered a comments field of ~4800 CO records. I tested Data set manipulation on uw_geo and on main experienced 17 records committed every 3 seconds; meanwhile on the issue-8055 branch I noticed about 20-25 records committed every 3 seconds so slightly quicker.

@CarolineDenis CarolineDenis requested a review from melton-jason May 6, 2026 19:56
@acwhite211
Copy link
Copy Markdown
Member Author

acwhite211 commented May 6, 2026

I worked some more on trying to speed up the data set commit by adding some caching functionality. I updated the fix so the bulk/dataset context now caches Component catalog numbers per collection.

Good to test out the speed of the data set commit again.

If the caching solution looks good, then maybe we can discus if adding a index to the component.catalognumber field would be better.

@CarolineDenis CarolineDenis requested review from a team, bhumikaguptaa and kwhuber May 7, 2026 08:44
@CarolineDenis
Copy link
Copy Markdown
Contributor

Updated 1153 COs in less than 5 min

Copy link
Copy Markdown
Contributor

@Iwantexpresso Iwantexpresso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • In Batch Edit, update a non-catalognumber field across many Collection Objects and confirm the operation completes noticeably faster than before.

  • In Data Set commit, create or update Collection Objects and confirm progress advances steadily without the previous long delay.

  • Create a Collection Object with a catalognumber that matches a Component in the same collection and confirm the duplicate warning still appears.

  • Create or update a Collection Object with a catalog number that matches a Component in a different collection and confirm it is allowed.

  • Confirm regular Collection Object create/edit workflows still save successfully.

dataset validation and Upload of 4810 records completed in under 2 minutes

Uniqueness: Preference to have catalog number unique across CO and Component table

  • Verify there is a save blocker saying that the cat num value is already in use

  • Verify there is also a save blocker when using another component cat num value

looking good so far!

@acwhite211
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specifyweb/backend/businessrules/utils.py`:
- Around line 194-196: The broad except Exception that logs and returns
unique_catnum_enabled should be tightened: replace the catch-all with only the
expected parsing/shape exceptions (e.g., ValueError, TypeError,
json.JSONDecodeError or the specific validation exception your parser raises) in
the try/except around the uniqueness-preference resolution, and either re-raise
any other exceptions or remove the generic except so unexpected errors propagate
(or raise a clear FatalError) instead of silently defaulting uniqueness to
False; update the logging in the handled-exception branch to include context
(function name and the offending input) and ensure the variable
unique_catnum_enabled is only returned after successful, validated resolution.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: f2d1fc72-6b15-43c3-8fb8-e1bc832d1fa4

📥 Commits

Reviewing files that changed from the base of the PR and between 2a87b7d and 495dc74.

📒 Files selected for processing (8)
  • specifyweb/backend/businessrules/rules/collectionobject_rules.py
  • specifyweb/backend/businessrules/tests/test_collectionobject.py
  • specifyweb/backend/businessrules/tests/test_uniqueness_rules.py
  • specifyweb/backend/businessrules/uniqueness_rules.py
  • specifyweb/backend/businessrules/utils.py
  • specifyweb/backend/permissions/tests.py
  • specifyweb/backend/workbench/upload/tests/test_batch_edit_table.py
  • specifyweb/backend/workbench/upload/upload.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • specifyweb/backend/businessrules/rules/collectionobject_rules.py
  • specifyweb/backend/workbench/upload/upload.py

Comment thread specifyweb/backend/businessrules/utils.py Outdated
@acwhite211
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specifyweb/backend/businessrules/uniqueness_rules.py`:
- Around line 277-280: The fast-path guard in the uniqueness check (the if using
instance.pk and changed_fields_include(instance, rule.all_fields)) ignores
fields that determine whether a rule applies (see _rule_applies_to_instance),
such as discipline/collection, allowing re-parenting to bypass validation;
update the guard to also include those applicability-driving fields (e.g., the
discipline/collection fields or a rule-provided list like
rule.applicability_fields) in the changed_fields_include test, or alternatively
disable the fast path for non-global rules (e.g., when not rule.is_global) so
that _rule_applies_to_instance is always re-evaluated on updates.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7bd49f5f-82eb-4ef5-a6be-cb5c712bfadc

📥 Commits

Reviewing files that changed from the base of the PR and between 2a87b7d and 6ae086c.

📒 Files selected for processing (8)
  • specifyweb/backend/businessrules/rules/collectionobject_rules.py
  • specifyweb/backend/businessrules/tests/test_collectionobject.py
  • specifyweb/backend/businessrules/tests/test_uniqueness_rules.py
  • specifyweb/backend/businessrules/uniqueness_rules.py
  • specifyweb/backend/businessrules/utils.py
  • specifyweb/backend/permissions/tests.py
  • specifyweb/backend/workbench/upload/tests/test_batch_edit_table.py
  • specifyweb/backend/workbench/upload/upload.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • specifyweb/backend/permissions/tests.py
  • specifyweb/backend/businessrules/rules/collectionobject_rules.py
  • specifyweb/backend/businessrules/tests/test_collectionobject.py
  • specifyweb/backend/workbench/upload/upload.py
  • specifyweb/backend/workbench/upload/tests/test_batch_edit_table.py
  • specifyweb/backend/businessrules/utils.py

Comment thread specifyweb/backend/businessrules/uniqueness_rules.py Outdated
@acwhite211 acwhite211 mentioned this pull request May 18, 2026
9 tasks
@melton-jason
Copy link
Copy Markdown
Contributor

melton-jason commented May 19, 2026

Hi all, to keep this PR somewhat short I've moved the BatchEdit specific changes to issue-8055-bulk-batch-edit .
The divergence point from this branch (i.e., the youngest common ancestor) is 3eb2c32.

Unfortunately, I think there might still be some more work regarding UniquenessRules to be done with this PR.
Primarily, before 5563add (this PR), Uniqueness Rules were not being checked on the backend.

You can observe that since 5563add (this PR), the slow-down has returned (because the previously skipped uniqueness rule(s) are now being ran).

The performance optimizations thus far are targeted to the fetching of the rule: not the calculation of the rule itself.
I think we may need to consider making optimizations to the checking of uniqueness rule(s), as that seems to be where the performance bottleneck lies when comparing the two versions (v7.11.4 and v7.12.0.5).
That is, the prior optimizations are nice to have but the issue will still persist without further changes.

Specifically, I'm pretty sure the slowdown when checking Uniqueness Rules from v7.11.4 and v7.12.0.5 comes from the changes of #7470, where we implement a binary cast when checking certain rules to enforce case-sensitivity:

if apply_case_sensitive:
conflicts_query, transformed_filters = apply_case_sensitive_filters(
conflicts_query,
model,
matchable,
filter_kwargs,
)
if transformed_filters:
filter_kwargs.update(transformed_filters)
conflicts = conflicts_query.filter(**filter_kwargs)

I have some ideas and a general design in mind when it comes to optimizing the checking of uniqueness rules for an arbitrary amount of rules. Currently, the number of uniqueness rules someone has defined can significantly increase the time it takes to evaluate the rules: we can significantly reduce this time by batching the separate queries for each rule into a single query for all rules (i.e., eliminate the "psuedo" N + 1 problem).

I do not yet have a solution or design when it comes to the case-sensitivity problem introduced in #7470 (maybe we can do the comparison outside of the database. That is, fetch a superset of values we would already be fetching from the database and reduce the set in the backend).

@grantfitzsimmons
Copy link
Copy Markdown
Member

grantfitzsimmons commented May 19, 2026

I do not yet have a solution or design when it comes to the case-sensitivity problem introduced in #7470 (maybe we can do the comparison outside of the database. That is, fetch a superset of values we would already be fetching from the database and reduce the set in the backend).

@melton-jason says "this is 90+% of the bottleneck"

We need to remove this. It is not proven, no use cases have been given, and it would simply revert to the behavior it had prior to 7.12. Jason will remove and test. We can come up with a more narrow fix for #3920 (comment), but it was reported only once in years.

Can you open a PR for issue-8055-bulk-batch-edit for 7.12.1 this week?

@melton-jason
Copy link
Copy Markdown
Contributor

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specifyweb/backend/businessrules/uniqueness_rules.py`:
- Around line 104-118: The function _cached_businessrules_migration_applied
incorrectly treats the cache "is_set" sentinel as truthy and can return True for
an unmigrated state; update the conditional in
_cached_businessrules_migration_applied to only return the cached value when the
cache is active and the retrieved marker is not the missing sentinel (use
_uniqueness_migration_cache.MISSING to check), i.e., check "cache_is_active and
is_set is not _uniqueness_migration_cache.MISSING" before returning; keep the
rest of the logic (calling _initial_businessrules_migration_applied and setting
the cache when is_applied is True) unchanged.

In `@specifyweb/backend/context/remote_prefs.py`:
- Around line 59-63: In get_preference, the regex treats dots as metacharacters
and uses a greedy (.+) that disallows empty values and can overrun into
following preferences; change the pattern to escape the key (use re.escape on
the key), anchor it so it matches either start or a delimiter (e.g. start or
';') before the key, and capture up to the next delimiter or end allowing empty
values (use a non-greedy or character-class like [^;]*); update the re.search in
get_preference to use that new pattern and keep returning match.group(1) when
present.

In `@specifyweb/backend/workbench/upload/auditlog.py`:
- Around line 153-161: Validate and sanitize audit_lifespan from
get_global_pref('AUDIT_LIFESPAN_MONTHS') by parsing it to an integer and
ensuring it is non-negative, then compute a cutoff datetime in Python (e.g.,
current date minus that many months using datetime/relativedelta) instead of
concatenating into SQL; use a parameterized cursor.execute for both delete
statements against spauditlogfield and spauditlog with a WHERE timestampcreated
< %s and pass the computed cutoff value as the parameter to prevent SQL
injection and avoid calling .lower() on the raw preference.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d3b8aa45-e4a8-4b40-8710-5a9284f70621

📥 Commits

Reviewing files that changed from the base of the PR and between 6ae086c and dc3025f.

📒 Files selected for processing (13)
  • specifyweb/backend/bulk_copy/bulk_copy.py
  • specifyweb/backend/businessrules/rules/collectionobject_rules.py
  • specifyweb/backend/businessrules/tests/test_uniqueness_rules.py
  • specifyweb/backend/businessrules/uniqueness_rules.py
  • specifyweb/backend/businessrules/utils.py
  • specifyweb/backend/context/remote_prefs.py
  • specifyweb/backend/context/views.py
  • specifyweb/backend/permissions/permissions.py
  • specifyweb/backend/stored_queries/format.py
  • specifyweb/backend/workbench/upload/auditlog.py
  • specifyweb/backend/workbench/upload/upload.py
  • specifyweb/frontend/js_src/lib/components/BatchEdit/index.tsx
  • specifyweb/specify/utils/scoping.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • specifyweb/backend/bulk_copy/bulk_copy.py
  • specifyweb/backend/businessrules/tests/test_uniqueness_rules.py
  • specifyweb/backend/businessrules/rules/collectionobject_rules.py

Comment thread specifyweb/backend/businessrules/uniqueness_rules.py
Comment thread specifyweb/backend/context/remote_prefs.py
Comment thread specifyweb/backend/workbench/upload/auditlog.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Dev Attention Needed

Development

Successfully merging this pull request may close these issues.

v7.12 slow down

8 participants