feat(clickhouse): add opt-in async cleanup via setAsyncCleanup()#117
Conversation
Greptile SummaryThis PR makes
Confidence Score: 5/5Safe to merge — the change is purely opt-in and the default synchronous path is entirely unchanged. The No files require special attention. The only changed file is Important Files Changed
Reviews (2): Last reviewed commit: "feat(clickhouse): add opt-in async clean..." | Re-trigger Greptile |
`cleanup()` defaults to ClickHouse's synchronous mutation behavior, so
existing tests and callers see no change. Consumers that just need to
schedule the DELETE (e.g. maintenance workers running per-project on a
shared multi-tenant table) can call `setAsyncCleanup(true)` to append
`SETTINGS lightweight_deletes_sync = 0` and have the HTTP call return
once the mutation is queued.
This avoids the 30s client timeout observed in production when the
per-tenant DELETE serialized N mutations through Keeper:
Failed to cleanup ClickHouse audit logs for project <id>:
ClickHouse query execution failed: Operation timed out after
30026 milliseconds with 0 bytes received
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2b17a1c to
a889e2e
Compare
Picks up the opt-in async cleanup setter (utopia-php/audit#117) needed for the deletes worker on the cloud side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Add
setAsyncCleanup(bool)to the ClickHouse adapter. When enabled,cleanup()appendsSETTINGS lightweight_deletes_sync = 0so the HTTP call returns once the mutation is queued, instead of blocking until ClickHouse finishes the DELETE. Default behavior is unchanged.Why
In production, the maintenance worker invokes
cleanup()per project against a shared multi-tenant audit table. The synchronous DELETE routinely exceeded the 30s client timeout:Two factors compound:
tenantis not in the table's sort key or skip-index set, so the per-tenant predicate forces a row-level scan inside surviving parts.SharedMergeTreeserializes mutations per table through Keeper, so N per-project mutations queue up and the HTTP client times out long before they drain.The maintenance worker only needs to schedule cleanup, not wait for it.
Why opt-in (not default)
The first iteration of this PR made async unconditional and broke
testCleanup,testFind,testCount— they assert on row counts immediately aftercleanup(). Making it opt-in via a setter:setSharedTables/setTenantconfiguration style on this adapter,$auditAdapter->setAsyncCleanup(true).Test plan
composer lint(Pint)composer check(PHPStan level max, no errors)setAsyncCleanup(true)and verifies the 30s timeout is gone in staging🤖 Generated with Claude Code