Skip to content

fix: detect IDB backing store corruption and heal or degrade gracefully#780

Draft
leshniak wants to merge 2 commits intoExpensify:mainfrom
callstack-internal:fix/idb-corruption-detect-and-heal
Draft

fix: detect IDB backing store corruption and heal or degrade gracefully#780
leshniak wants to merge 2 commits intoExpensify:mainfrom
callstack-internal:fix/idb-corruption-detect-and-heal

Conversation

@leshniak
Copy link
Copy Markdown
Contributor

@leshniak leshniak commented Apr 28, 2026

Details

Fixes three bugs that prevent Onyx from handling Chromium IDB corruption (UnknownError: Internal error opening backing store for indexedDB.open. — 884K errors/month, 26.3% of all storage errors, investigation).

Bug 1 — Cached rejected promise: createStore's getDB() caches the dbp promise. If indexedDB.open() rejects, dbp is a rejected promise that's truthy, so every subsequent call returns the same rejection forever. This creates infinite error loops on idle tabs (observed in Fullstory: errors every 30–60s with zero user interaction).

Bug 2 — No backing store handling: createStore's catch handler only retries InvalidStateError. The UnknownError backing store corruption falls through to throw error with no heal attempt.

Bug 3 — Async escape: tryOrDegradePerformance wraps fn() in a sync try/catch, but IDB operations return promises. The async rejection passes through resolve(fn()) and never hits the catch block, so degradePerformance is never called.

Changes:

  • createStore.ts: Clear dbp on rejection so next operation retries fresh. Detect backing store corruption → attempt indexedDB.deleteDatabase() heal → retry once. If heal fails, error propagates.
  • storage/index.ts: Rewrite tryOrDegradePerformance from new Promise + try/catch to .then().catch() — catches both sync throws and async rejections. Falls back to MemoryOnlyProvider as last resort.
  • 4 new tests covering: async rejection catch, heal+recover, unrecoverable degradation, error classification (QuotaExceeded doesn't trigger heal).

Related Issues

Expensify/App#87862

Automated Tests

4 new tests in tests/unit/storage/StorageCorruptionTest.ts:

  1. Async error handling — verifies tryOrDegradePerformance catches async IDB rejections and degrades to MemoryOnlyProvider
  2. Heal + recover — simulates backing store corruption via indexedDB.open mock, verifies deleteDatabase('OnyxDB') is called, operation succeeds after heal
  3. Unrecoverable degradation — simulates permanent corruption where deleteDatabase also fails, verifies fallback to MemoryOnlyProvider
  4. Error classification — verifies QuotaExceededError does NOT trigger corruption healing or degradation

All 439 tests pass.

Manual Tests

  1. Verify npm run typecheck passes
  2. Verify npm run lint passes
  3. Verify npm test passes (439/439)
  4. Integrate with Expensify/App and verify storage operations still work correctly on all platforms
  5. Simulate IDB corruption via DevTools (Application → IndexedDB → delete OnyxDB LevelDB files) and verify heal or degrade behavior

Author Checklist

  • I linked the correct issue in the ### Related Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android / native
    • Android / Chrome
    • iOS / native
    • iOS / Safari
    • MacOS / Chrome / Safari
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that the left part of a conditional rendering a React component is a boolean and NOT a string, e.g. myBool && <MyComponent />.
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained why the code was doing something instead of only explaining what the code was doing.
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named index.js. All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.js or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.
  • I have checked off every checkbox in the PR author checklist, including those that don't apply to this PR.

Screenshots/Videos

Android: Native

N/A — library-level change, no UI

Android: mWeb Chrome

N/A — library-level change, no UI

iOS: Native

N/A — library-level change, no UI

iOS: mWeb Safari

N/A — library-level change, no UI

MacOS: Chrome / Safari

N/A — library-level change, no UI

Three bugs prevented Onyx from handling Chromium IDB corruption
(884K errors/month, 26.3% of all storage errors):

1. createStore cached rejected promises — once indexedDB.open() failed,
   every subsequent operation returned the same rejection forever,
   creating infinite error loops on idle tabs.

2. createStore only retried InvalidStateError — the backing store
   UnknownError fell through unhandled with no heal attempt.

3. tryOrDegradePerformance used try/catch which only catches sync
   throws, but IDB errors are async rejections that bypassed it
   entirely, so degradePerformance was never called.

Fix: clear cached rejected dbp on failure, detect backing store
corruption and attempt deleteDatabase heal in createStore, rewrite
tryOrDegradePerformance to catch async rejections as last-resort
fallback to MemoryOnlyProvider.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant