Skip to content

[inscriptive] Several managers silently swallow sled iteration errors at construction (partial loads) #14

Description

@GideonBature

Summary

A handful of managers load their durable state into memory at new() by iterating the sled DB and discarding any iteration errors (.filter_map(|r| r.ok()) or .iter().flatten()). A transient disk I/O error mid-iteration is therefore silently dropped, and the manager loads with a partial view of its data — account balances, contract state, archival history, or UTXOs can be missing with no signal. The reference manager in this codebase (coin_manager) does this correctly, so this is an inconsistency to fix toward the better pattern.

Affected file(s) and locations

Manager File Line(s) Pattern
State manager src/inscriptive/state_manager/state_manager.rs ~69-73 .filter_map(|res| res.ok()) on the per-contract tree iteration
Archival manager src/inscriptive/archival_manager/archival_manager.rs ~85 .filter_map(|r| r.ok()) over all key/value pairs
UTXO set src/inscriptive/utxo_set/utxo_set.rs ~50 for (key, val) in utxos_db.iter().flatten()
Graveyard src/inscriptive/graveyard/graveyard.rs ~57 for (key, val) in graveyard_db.iter().flatten()

Reference (the correct pattern)

src/inscriptive/coin_manager/coin_manager.rs:121-130 returns the iteration error as a typed construction error instead of swallowing it:

for (index, item) in tree.iter().enumerate() {
    let (key, value) = match item {
        Ok((k, v)) => (k, v),
        Err(e) => {
            return Err(CMConstructionError::AccountConstructionError(
                CMConstructionAccountError::TreeIterError(index, e),
            ));
        }
    };
    ...
}

And again at lines ~225-234 for the contract tree. Registry (registry.rs ~lines 153-160, 309-316) also does this correctly.

Root cause / analysis

.flatten() and .filter_map(|r| r.ok()) are convenient for the happy path, but they treat every Err from sled::Iter as "end of iteration" / "skip this entry." sled can yield Err for real I/O reasons (page cache miss on a corrupt page, lock contention under flush, etc.). Dropping it means:

  • A contract loads with only its first k state keys.
  • An account balance is loaded as 0 or stale because its tree iteration errored.
  • The archival history is missing N most-recent batches.
  • The UTXO set is missing the exact outputs needed to validate a Lift prevout (validate_lifts, utxo_set.rs:164, would then reject a valid lift).

In all cases the node continues running, now holding a wrong view of the chain.

Impact

  • Silent, unrecoverable state corruption at startup or after a disk hiccup.
  • The node appears healthy but holds a subset of its data.
  • Hardest possible failure mode to diagnose because there is no error and no log.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions