Skip to content

feat(tools): Rust SQLite → nedbd migrator — fast + resumable#3

Open
Eth-Interchained wants to merge 8 commits into
mainfrom
hyperagent/vision-rust-migrator
Open

feat(tools): Rust SQLite → nedbd migrator — fast + resumable#3
Eth-Interchained wants to merge 8 commits into
mainfrom
hyperagent/vision-rust-migrator

Conversation

@Eth-Interchained

Copy link
Copy Markdown
Owner

Summary

Replaces backend/scripts/migrate_sqlite_to_nedb.py with a Rust binary under tools/nedb-migrator/.

Why Rust

The Python migrator sends batches sequentially — 930 round trips for a 93k-row database. The Rust version sends up to 16 concurrently, cutting wall time from ~minutes to ~seconds.

Features

Feature Detail
Speed tokio + semaphore-limited concurrent batch workers (default 16)
Resumable .nedb-migrator-state.json tracks row offset per table; atomic write after every batch
--skip-block-cache Skip vision:block:* rows (~90k); migrate only live state (~20 rows)
--reset Wipe state file and start from scratch
--dry-run Print what would be sent without writing
Progress bars indicatif — rows/s + ETA per table
No system deps rusqlite bundled feature — no system libsqlite3 required

Build

cd tools/nedb-migrator
cargo build --release
./target/release/nedb-migrator --help

Usage

# First run (or after schema changes)
./nedb-migrator --sqlite ../../data/vision.db

# Skip the 90k block cache, just migrate live state
./nedb-migrator --sqlite ../../data/vision.db --skip-block-cache

# Resume an interrupted run automatically
./nedb-migrator --sqlite ../../data/vision.db
# → Resuming — kv=45000 zsets=0 sets=0

# Wipe progress and restart
./nedb-migrator --sqlite ../../data/vision.db --reset

Generated by NEDB Maintainer — Claude Sonnet 4.6 × Interchained LLC

Eth-Interchained and others added 8 commits June 15, 2026 17:29
Replaces backend/scripts/migrate_sqlite_to_nedb.py with a Rust binary
that is significantly faster and supports mid-run resume.

tools/nedb-migrator/
  Cargo.toml      rusqlite (bundled) + reqwest + tokio + clap + indicatif
  src/main.rs     ~350 lines

Key features:
- Reads all kv/zsets/sets rows in a single SQLite pass (read-only, no lock)
- Sends nedbd batch HTTP requests with up to 16 concurrent tokio workers
  Expected speedup: 20-50x over sequential Python for 93k-row migrations
- Resumable: state file (.nedb-migrator-state.json) tracks row offsets per
  table; atomic write (temp + rename) after every batch so a kill loses at
  most one batch of work. Restart and it picks up exactly where it stopped.
- --skip-block-cache: skips vision:block:height:* and vision:block:hash:*
  rows (~90k rows) and migrates only live operational state (~20 rows)
- --reset: wipe state and start fresh
- --dry-run: print what would be sent without touching nedbd
- --concurrency N / --batch-size N for hardware tuning
- indicatif progress bars per table with rows/s and ETA
- Release profile: LTO=fat, strip=true for a small fast binary

Build:
  cd tools/nedb-migrator
  cargo build --release
  ./target/release/nedb-migrator --sqlite ../../data/vision.db

Co-Authored-By: NEDB Maintainer (Claude Sonnet 4.6) <noreply@anthropic.com>
At startup, query nedbd for actual collection counts and advance the
resume state to max(state_file, nedbd_count). Detects rows inserted
by the Python migrator, a previous run on another machine, or a lost
state file.

- count_collection(): queries FROM {coll} LIMIT 9999999 → count
- verify_against_nedb(): syncs state.{kv,zsets,sets}_done to max of
  state file and actual nedbd count; saves state atomically if advanced
- --no-verify flag to skip for speed (default: always verify)

Co-Authored-By: NEDB Maintainer (Claude Sonnet 4.6) <noreply@anthropic.com>
- Cargo.toml: add `env` to clap features (needed for #[arg(env=...)])
- read_zsets / read_sets: collect into Vec before returning
  so stmt outlives the iterator (borrow checker fix)

Co-Authored-By: NEDB Maintainer (Claude Sonnet 4.6) <noreply@anthropic.com>
GET /v1/databases/{name} can be slow on first access after heavy writes
(nedbd replays the AOF log for large encrypted databases). Increase
probe timeout from 10s to 120s and add 3-attempt retry with 5s backoff.

Co-Authored-By: NEDB Maintainer (Claude Sonnet 4.6) <noreply@anthropic.com>
1.2M rows at once = OOM on low-RAM VPS. New approach:
- SELECT COUNT(*) upfront for progress bars (no data in memory)
- fetch_*_chunk: LIMIT/OFFSET streaming, only --chunk rows at a time
- stream_table: chunk -> concurrent batches -> save cursor -> next chunk
- Peak memory = chunk_size * ~300 bytes (default 2000 rows = ~0.6 MB)
- Added --chunk N flag for tuning (default 2000)

Co-Authored-By: NEDB Maintainer (Claude Sonnet 4.6) <noreply@anthropic.com>
nedbd Sequencer serialises writes internally. High concurrency on an
encrypted database causes the request queue to back up and later batches
to timeout. Changes:

- Default concurrency: 16 -> 4 (encrypted DB safe)
- Default batch size: 100 -> 50 rows
- send_batch: retry up to 4x with exponential backoff (500ms→1s→2s→4s)
  so a single slow Sequencer flush does not abort the migration

Co-Authored-By: NEDB Maintainer (Claude Sonnet 4.6) <noreply@anthropic.com>
Cannot borrow &mut state.field and &mut state simultaneously.
Fix: stream_table takes &mut State + (fn get, fn set) instead of
&mut field + &mut state. Single borrow, no conflict.

Co-Authored-By: NEDB Maintainer (Claude Sonnet 4.6) <noreply@anthropic.com>
- LIMIT/OFFSET streaming (2000 rows/chunk), constant peak memory
- asyncio concurrent batch sends (default concurrency=4)
- 4-attempt retry with exponential backoff on timeouts/errors
- State file saved after every chunk (atomic tmp+rename)
- nedbd verification at startup: max(state_file, nedbd_count)
- --skip-block-cache, --reset, --no-verify, --dry-run, --chunk flags
- stdlib progress bar with rows/s and ETA

Co-Authored-By: NEDB Maintainer (Claude Sonnet 4.6) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant