Skip to content

Fix spurious logout on fly.io caused by SQLITE_BUSY#53

Merged
cubny merged 4 commits into
masterfrom
fix-sqlite-busy-logout
May 20, 2026
Merged

Fix spurious logout on fly.io caused by SQLITE_BUSY#53
cubny merged 4 commits into
masterfrom
fix-sqlite-busy-logout

Conversation

@cubny

@cubny cubny commented May 18, 2026

Copy link
Copy Markdown
Owner

Summary

  • Open SQLite with WAL + busy_timeout=5000 + synchronous=NORMAL so concurrent requests on Fly volumes don't hit SQLITE_BUSY.
  • Auth middleware now only returns 401 on sql.ErrNoRows; transient DB errors return 500 instead of logging the user out.
  • fly.toml: min_machines_running = 1 so cold-start request replays don't compound the issue.

Why

Moving a feed to a folder on fly.io kicked the user to the login page. Reorder fan-out fires many parallel PATCH /feeds/:id requests; on the network volume those collide on the sessions table and SQLite returns a busy error. The middleware treated any error from GetSession as "invalid token" → 401 → frontend clears token → redirect to /login. Localhost on fast SSD never tripped the contention window.

Test plan

  • go build ./...
  • go test ./...
  • Deploy to fly.io and verify moving a feed to a folder no longer logs the user out
  • Verify reorder fan-out across many feeds returns 204 consistently

🤖 Generated with Claude Code

Concurrent PATCH /feeds/:id requests (e.g. reorder fan-out) on the Fly.io
volume hit SQLITE_BUSY because the DB was opened without WAL or a busy
timeout. The auth middleware then mapped any GetSession error to 401, so
the frontend cleared the token and bounced the user to /login.

- Open SQLite with WAL, busy_timeout=5000, synchronous=NORMAL.
- Auth middleware: only 401 on sql.ErrNoRows; other DB errors return 500.
- fly.toml: min_machines_running=1 so cold-start replays don't compound.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses spurious logout behavior during SQLite contention by distinguishing invalid sessions from transient DB failures and configuring SQLite/Fly.io for more resilient operation.

Changes:

  • Auth middleware now returns 401 only for missing sessions and 500 for other session lookup errors.
  • SQLite is opened with WAL, busy timeout, synchronous NORMAL, and foreign keys enabled.
  • Fly.io config keeps one machine running to reduce cold-start replay effects.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
internal/infra/http/api/middleware/middleware.go Changes session lookup error handling in auth middleware.
internal/app.go Adds SQLite DSN pragmas for WAL and contention handling.
fly.toml Keeps one Fly machine running.

Comment thread internal/app.go Outdated
// WAL + a generous busy_timeout are required when the DB lives on
// slower storage (e.g. Fly.io volumes); without them, concurrent
// requests hit SQLITE_BUSY and surface to callers as generic errors.
dsn := a.cfg.DB.Path + "?_pragma=journal_mode(WAL)&_pragma=busy_timeout(5000)&_pragma=synchronous(NORMAL)&_pragma=foreign_keys(ON)"

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 19c365a. The DSN now sets busy_timeout(5000) before journal_mode(WAL), so the WAL PRAGMA on each newly opened pooled connection also waits on a held lock instead of failing with SQLITE_BUSY.

Comment on lines +80 to +85
if err != nil {
// Only treat "row not found" as an auth failure; transient
// DB errors (e.g. SQLITE_BUSY on slow volumes) must not
// log the user out.
if errors.Is(err, sql.ErrNoRows) {
writeUnauthorizedResponse(w, "Unauthorized - Invalid token")

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 19c365a — middleware_test.go now covers both branches: sql.ErrNoRows and a nil session return 401, while a transient DB error (e.g. SQLITE_BUSY) returns 500 without logging the user out. Also covers public-path skipping and userID context propagation.

cubny and others added 3 commits May 20, 2026 08:20
- Set busy_timeout before journal_mode(WAL) in the DSN so the WAL PRAGMA
  on each pooled connection also waits on a held lock.
- Add middleware tests covering the 401 (sql.ErrNoRows / nil session) vs
  500 (transient DB error) split and public-path skipping.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extract repeated "Bearer tok" string to a constant (goconst) and use
http.NoBody instead of nil request bodies (gocritic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cubny cubny merged commit b9dc254 into master May 20, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants