Rewrite descendant cloudPath on folder move and repair stale rows#457
Rewrite descendant cloudPath on folder move and repair stale rows#457tobihagemann wants to merge 2 commits into
Conversation
moveItemLocally only updated the moved folder's row, so every descendant's cloudPath went stale while its parentID chain stayed correct. Path-keyed lookups (getCachedMetadata(for: CloudPath), getAllCachedMetadata(inside:), checkLocalItemCollision, getItemIdentifier(for:)) then saw inconsistent state. Rewrite descendants recursively after the folder row is updated, using parentID to walk the subtree. A v5 repair migration fixes already-corrupted databases by walking the tree from root and rewriting any cloudPath whose canonical value disagrees with the stored value; unreachable rows and rows whose canonical slot is occupied are left alone. Refs #450.
WalkthroughThis PR implements a dual-layer approach to address cloudPath/parentID inconsistency in File Provider metadata. A v5 database migration performs breadth-first traversal repair on existing rows, recomputing stale cloudPath values from parent paths and item names while handling uniqueness conflicts and logging inconsistencies. Concurrently, the FileProvider adapter's moveItemLocally method now recursively rewrites descendant cloudPath entries during folder moves to prevent future inconsistency. Comprehensive tests validate the migration repair logic across multiple structural scenarios and edge cases, and verify that folder moves correctly update descendant paths through nested levels. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CryptomatorFileProvider/DB/DatabaseHelper.swift`:
- Around line 237-258: The SELECT used in Row.fetchAll that fetches siblings for
repair (the query with WHERE parentID = ? AND id != ? used with arguments
[parentID, NSFileProviderItemIdentifier.rootContainerDatabaseValue]) has no
ORDER BY, causing non-deterministic conflict resolution; modify that SQL in
DatabaseHelper (the Row.fetchAll call) to include "ORDER BY id" so siblings are
processed in a stable order during the repair loop that updates cloudPath and
enqueues folders (the code handling visitedCount, canonical, storedCloudPath,
and queue.append((id, canonical))).
In `@CryptomatorFileProvider/FileProviderAdapter.swift`:
- Around line 579-604: The parent row is persisted before descendants are
rewritten, which can leave a partially-updated subtree if descendant path
collisions occur; either preflight-validate all descendant target CloudPaths for
conflicts before calling updateMetadata(itemMetadata) and committing the
MoveItemLocallyResult/taskRecord, or perform the parent update plus
rewriteDescendantCloudPaths updates inside a single DB transaction so they all
rollback on failure. Concretely: use
rewriteDescendantCloudPaths(ofFolderID:newParentCloudPath:visited:) (or a new
non-mutating collector variant) to compute all descendant target CloudPaths,
check each against the metadata store for existing conflicting rows via
itemMetadataManager lookup, and only then call
itemMetadataManager.updateMetadata(itemMetadata) and persist the reparent task;
alternatively move the parent update and calls to rewriteDescendantCloudPaths
into the same DB transaction boundary so updateMetadata(child) failures roll
back parent update and taskRecord persistence.
In `@CryptomatorFileProviderTests/DB/DatabaseHelperMigrationTests.swift`:
- Around line 135-143: The test testRepairMigrationCreatesParentIDIndex is a
false-positive because setUpWithError already runs DatabaseHelper.migrate which
may create the index; before calling DatabaseHelper.repairCloudPathsMigration in
the test, explicitly DROP or assert the absence of the index
(itemMetadata_parentID) within the test's database context (use
database.write/database.read as in the test), then call
DatabaseHelper.repairCloudPathsMigration and finally assert the index now
exists—this ensures the test verifies repairCloudPathsMigration actually creates
the index rather than relying on prior migrate() behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0aa69c54-f7e4-4800-91c0-94388751947e
📒 Files selected for processing (5)
Cryptomator.xcodeproj/project.pbxprojCryptomatorFileProvider/DB/DatabaseHelper.swiftCryptomatorFileProvider/FileProviderAdapter.swiftCryptomatorFileProviderTests/DB/DatabaseHelperMigrationTests.swiftCryptomatorFileProviderTests/FileProviderAdapter/FileProviderAdapterMoveItemTests.swift
| let rows = try Row.fetchAll(db, sql: """ | ||
| SELECT id, name, type, cloudPath | ||
| FROM itemMetadata | ||
| WHERE parentID = ? AND id != ? | ||
| """, arguments: [parentID, NSFileProviderItemIdentifier.rootContainerDatabaseValue]) | ||
| rows: for row in rows { | ||
| let id: Int64 = row["id"] | ||
| let name: String = row["name"] | ||
| let itemType: CloudItemType = row["type"] | ||
| let storedCloudPath: CloudPath = row["cloudPath"] | ||
| visitedCount += 1 | ||
| let canonical = parentPath.appendingPathComponent(name) | ||
| if storedCloudPath != canonical { | ||
| do { | ||
| try db.execute(sql: "UPDATE itemMetadata SET cloudPath = ? WHERE id = ?", arguments: [canonical, id]) | ||
| } catch let error as DatabaseError where error.extendedResultCode == .SQLITE_CONSTRAINT_UNIQUE { | ||
| DDLogError("Repair migration: cloudPath \(canonical) already occupied; leaving id=\(id) at \(storedCloudPath)") | ||
| continue rows | ||
| } | ||
| } | ||
| if itemType == .folder { | ||
| queue.append((id, canonical)) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In SQLite, are rows returned by a SELECT without ORDER BY guaranteed to have a stable order across runs or versions?
💡 Result:
No. In SQLite, if a SELECT returns more than one row and you do not specify an ORDER BY clause, the order in which rows are returned is undefined (so it is not guaranteed stable across runs, data changes, query-plan changes, or SQLite versions) [1][2]. What you can rely on instead: - Add ORDER BY to specify the ordering you need [1]. - Note that even when two rows compare equal with respect to all ORDER BY expressions, SQLite states that the relative order of those rows is undefined unless you fully break ties with additional ORDER BY terms [1]. So for “SELECT … FROM table” without ORDER BY: you should not assume a stable order, even if it appears consistent during testing [1].
Citations:
Add ORDER BY to ensure deterministic conflict repair.
Without an ORDER BY clause on line 237, the query returns siblings in undefined order. When canonical paths conflict, whichever row SQLite yields first claims the slot, making the migration outcome non-deterministic across runs and devices. Since conflicted folders are skipped entirely, different orderings can cause different subtrees to be repaired. Add ORDER BY id to guarantee consistent repair behavior.
Suggested change
let rows = try Row.fetchAll(db, sql: """
SELECT id, name, type, cloudPath
FROM itemMetadata
WHERE parentID = ? AND id != ?
+ ORDER BY id
""", arguments: [parentID, NSFileProviderItemIdentifier.rootContainerDatabaseValue])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let rows = try Row.fetchAll(db, sql: """ | |
| SELECT id, name, type, cloudPath | |
| FROM itemMetadata | |
| WHERE parentID = ? AND id != ? | |
| """, arguments: [parentID, NSFileProviderItemIdentifier.rootContainerDatabaseValue]) | |
| rows: for row in rows { | |
| let id: Int64 = row["id"] | |
| let name: String = row["name"] | |
| let itemType: CloudItemType = row["type"] | |
| let storedCloudPath: CloudPath = row["cloudPath"] | |
| visitedCount += 1 | |
| let canonical = parentPath.appendingPathComponent(name) | |
| if storedCloudPath != canonical { | |
| do { | |
| try db.execute(sql: "UPDATE itemMetadata SET cloudPath = ? WHERE id = ?", arguments: [canonical, id]) | |
| } catch let error as DatabaseError where error.extendedResultCode == .SQLITE_CONSTRAINT_UNIQUE { | |
| DDLogError("Repair migration: cloudPath \(canonical) already occupied; leaving id=\(id) at \(storedCloudPath)") | |
| continue rows | |
| } | |
| } | |
| if itemType == .folder { | |
| queue.append((id, canonical)) | |
| let rows = try Row.fetchAll(db, sql: """ | |
| SELECT id, name, type, cloudPath | |
| FROM itemMetadata | |
| WHERE parentID = ? AND id != ? | |
| ORDER BY id | |
| """, arguments: [parentID, NSFileProviderItemIdentifier.rootContainerDatabaseValue]) | |
| rows: for row in rows { | |
| let id: Int64 = row["id"] | |
| let name: String = row["name"] | |
| let itemType: CloudItemType = row["type"] | |
| let storedCloudPath: CloudPath = row["cloudPath"] | |
| visitedCount += 1 | |
| let canonical = parentPath.appendingPathComponent(name) | |
| if storedCloudPath != canonical { | |
| do { | |
| try db.execute(sql: "UPDATE itemMetadata SET cloudPath = ? WHERE id = ?", arguments: [canonical, id]) | |
| } catch let error as DatabaseError where error.extendedResultCode == .SQLITE_CONSTRAINT_UNIQUE { | |
| DDLogError("Repair migration: cloudPath \(canonical) already occupied; leaving id=\(id) at \(storedCloudPath)") | |
| continue rows | |
| } | |
| } | |
| if itemType == .folder { | |
| queue.append((id, canonical)) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@CryptomatorFileProvider/DB/DatabaseHelper.swift` around lines 237 - 258, The
SELECT used in Row.fetchAll that fetches siblings for repair (the query with
WHERE parentID = ? AND id != ? used with arguments [parentID,
NSFileProviderItemIdentifier.rootContainerDatabaseValue]) has no ORDER BY,
causing non-deterministic conflict resolution; modify that SQL in DatabaseHelper
(the Row.fetchAll call) to include "ORDER BY id" so siblings are processed in a
stable order during the repair loop that updates cloudPath and enqueues folders
(the code handling visitedCount, canonical, storedCloudPath, and
queue.append((id, canonical))).
| try itemMetadataManager.updateMetadata(itemMetadata) | ||
| if itemMetadata.type == .folder, let id = itemMetadata.id { | ||
| try rewriteDescendantCloudPaths(ofFolderID: id, newParentCloudPath: cloudPath) | ||
| } | ||
|
|
||
| let localCachedFileInfo = try cachedFileManager.getLocalCachedFileInfo(for: itemMetadata) | ||
| let item = FileProviderItem(metadata: itemMetadata, domainIdentifier: domainIdentifier, localCachedFileInfo: localCachedFileInfo) | ||
| return MoveItemLocallyResult(item: item, reparentTaskRecord: taskRecord) | ||
| } | ||
|
|
||
| private func rewriteDescendantCloudPaths(ofFolderID folderID: Int64, newParentCloudPath: CloudPath) throws { | ||
| var visited: Set<Int64> = [folderID] | ||
| try rewriteDescendantCloudPaths(ofFolderID: folderID, newParentCloudPath: newParentCloudPath, visited: &visited) | ||
| } | ||
|
|
||
| private func rewriteDescendantCloudPaths(ofFolderID folderID: Int64, newParentCloudPath: CloudPath, visited: inout Set<Int64>) throws { | ||
| for child in try itemMetadataManager.getCachedMetadata(withParentID: folderID) { | ||
| guard let childID = child.id, visited.insert(childID).inserted else { continue } | ||
| let childCloudPath = newParentCloudPath.appendingPathComponent(child.name) | ||
| child.cloudPath = childCloudPath | ||
| try itemMetadataManager.updateMetadata(child) | ||
| if child.type == .folder { | ||
| try rewriteDescendantCloudPaths(ofFolderID: childID, newParentCloudPath: childCloudPath, visited: &visited) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Preflight descendant path collisions before persisting the folder move.
Line 579 writes the moved folder row before Lines 595-601 rewrite descendants. If a leftover orphan/conflict row already occupies one of those descendant target paths, updateMetadata(child) will fail only after the parent move and reparent task record have been committed, leaving the subtree partially rewritten again. Please validate descendant target paths up front, or apply the whole rewrite in a single DB transaction/rollback boundary.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@CryptomatorFileProvider/FileProviderAdapter.swift` around lines 579 - 604,
The parent row is persisted before descendants are rewritten, which can leave a
partially-updated subtree if descendant path collisions occur; either
preflight-validate all descendant target CloudPaths for conflicts before calling
updateMetadata(itemMetadata) and committing the
MoveItemLocallyResult/taskRecord, or perform the parent update plus
rewriteDescendantCloudPaths updates inside a single DB transaction so they all
rollback on failure. Concretely: use
rewriteDescendantCloudPaths(ofFolderID:newParentCloudPath:visited:) (or a new
non-mutating collector variant) to compute all descendant target CloudPaths,
check each against the metadata store for existing conflicting rows via
itemMetadataManager lookup, and only then call
itemMetadataManager.updateMetadata(itemMetadata) and persist the reparent task;
alternatively move the parent update and calls to rewriteDescendantCloudPaths
into the same DB transaction boundary so updateMetadata(child) failures roll
back parent update and taskRecord persistence.
| func testRepairMigrationCreatesParentIDIndex() throws { | ||
| try database.write { db in | ||
| try DatabaseHelper.repairCloudPathsMigration(db) | ||
| } | ||
| let indexName = try database.read { db in | ||
| try String.fetchOne(db, sql: "SELECT name FROM sqlite_master WHERE type='index' AND name='itemMetadata_parentID'") | ||
| } | ||
| XCTAssertEqual("itemMetadata_parentID", indexName) | ||
| } |
There was a problem hiding this comment.
testRepairMigrationCreatesParentIDIndex is currently a false-positive risk.
This test can pass even if repairCloudPathsMigration stops creating the index, because setUpWithError already runs DatabaseHelper.migrate (which may have created it). Drop/assert-absent first, then run the repair and assert it exists.
Suggested tightening
func testRepairMigrationCreatesParentIDIndex() throws {
try database.write { db in
+ try db.execute(sql: "DROP INDEX IF EXISTS itemMetadata_parentID")
try DatabaseHelper.repairCloudPathsMigration(db)
}
let indexName = try database.read { db in
try String.fetchOne(db, sql: "SELECT name FROM sqlite_master WHERE type='index' AND name='itemMetadata_parentID'")
}
XCTAssertEqual("itemMetadata_parentID", indexName)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func testRepairMigrationCreatesParentIDIndex() throws { | |
| try database.write { db in | |
| try DatabaseHelper.repairCloudPathsMigration(db) | |
| } | |
| let indexName = try database.read { db in | |
| try String.fetchOne(db, sql: "SELECT name FROM sqlite_master WHERE type='index' AND name='itemMetadata_parentID'") | |
| } | |
| XCTAssertEqual("itemMetadata_parentID", indexName) | |
| } | |
| func testRepairMigrationCreatesParentIDIndex() throws { | |
| try database.write { db in | |
| try db.execute(sql: "DROP INDEX IF EXISTS itemMetadata_parentID") | |
| try DatabaseHelper.repairCloudPathsMigration(db) | |
| } | |
| let indexName = try database.read { db in | |
| try String.fetchOne(db, sql: "SELECT name FROM sqlite_master WHERE type='index' AND name='itemMetadata_parentID'") | |
| } | |
| XCTAssertEqual("itemMetadata_parentID", indexName) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@CryptomatorFileProviderTests/DB/DatabaseHelperMigrationTests.swift` around
lines 135 - 143, The test testRepairMigrationCreatesParentIDIndex is a
false-positive because setUpWithError already runs DatabaseHelper.migrate which
may create the index; before calling DatabaseHelper.repairCloudPathsMigration in
the test, explicitly DROP or assert the absence of the index
(itemMetadata_parentID) within the test's database context (use
database.write/database.read as in the test), then call
DatabaseHelper.repairCloudPathsMigration and finally assert the index now
exists—this ensures the test verifies repairCloudPathsMigration actually creates
the index rather than relying on prior migrate() behavior.
Closes #450's stale-descendant bug.
moveItemLocallyonly updated the moved folder's row, so every descendant'scloudPathwent stale while itsparentIDchain stayed correct — every path-keyed lookup (getCachedMetadata(for: CloudPath),getAllCachedMetadata(inside:),checkLocalItemCollision,getItemIdentifier(for:)) then saw inconsistent state. Cached files on disk are fine because they're id-keyed.What changed
FileProviderAdapter.moveItemLocally: after the moved folder's row is updated, recursively rewrite every descendant'scloudPathby walkingparentID. Avisitedset is threaded through the recursion to defend against pre-existingparentIDcycles (the recursion would otherwise stack-overflow on a corrupted chain).v5) inDatabaseHelper: walks the tree breadth-first from the root viaparentIDand rewrites anycloudPathwhose canonical value disagrees with the stored value, so already-corrupted user databases self-heal on next launch. Orphans and rows whose canonical slot is occupied by another row are left alone (logged). Also creates an index onitemMetadata.parentIDso futureparentID-keyed lookups don't table-scan.DatabaseHelperMigrationTestscovers the v5 migration across stale-descendant, deep subtree, branching subtree, orphan/cycle, canonical-conflict skip, parentID-index, and end-to-end migrator-registration paths (including pre-existing-orphan replay).What this PR explicitly does not address
The companion comment on #450 also flagged a few adjacent issues that I decided to keep out of scope to keep the diff focused:
moveItemLocallyis not atomic — the reparent-task insert, metadata update, and descendant rewrite are separatedatabase.writeblocks. The pre-existing flow had the same hazard between the first two writes. Making the FileProvider DB-manager flows transactional is a project-wide concern (I considereddatabase: Databasesiblings on the manager protocols and chose not to introduce a one-off pattern here).getAllCachedMetadata(inside:)usesLIKEwithout an escape clause.cacheMetadataupserts by byte-exactcloudPathwhilegetCachedMetadata(for:)reads case-insensitively.visitedguard only prevents infinite recursion if such corruption already exists.