Skip to content

feat(clickhouse): add opt-in async cleanup via setAsyncCleanup()#117

Merged
lohanidamodar merged 1 commit into
mainfrom
chore/cleanup-async-deletes
May 14, 2026
Merged

feat(clickhouse): add opt-in async cleanup via setAsyncCleanup()#117
lohanidamodar merged 1 commit into
mainfrom
chore/cleanup-async-deletes

Conversation

@lohanidamodar
Copy link
Copy Markdown
Contributor

@lohanidamodar lohanidamodar commented May 14, 2026

Summary

Add setAsyncCleanup(bool) to the ClickHouse adapter. When enabled, cleanup() appends SETTINGS lightweight_deletes_sync = 0 so the HTTP call returns once the mutation is queued, instead of blocking until ClickHouse finishes the DELETE. Default behavior is unchanged.

Why

In production, the maintenance worker invokes cleanup() per project against a shared multi-tenant audit table. The synchronous DELETE routinely exceeded the 30s client timeout:

Failed to cleanup ClickHouse audit logs for project <id>:
ClickHouse query execution failed: Operation timed out after 30026 milliseconds with 0 bytes received

Two factors compound:

  1. tenant is not in the table's sort key or skip-index set, so the per-tenant predicate forces a row-level scan inside surviving parts.
  2. SharedMergeTree serializes mutations per table through Keeper, so N per-project mutations queue up and the HTTP client times out long before they drain.

The maintenance worker only needs to schedule cleanup, not wait for it.

Why opt-in (not default)

The first iteration of this PR made async unconditional and broke testCleanup, testFind, testCount — they assert on row counts immediately after cleanup(). Making it opt-in via a setter:

  • preserves the existing contract for every other consumer (and for the test suite),
  • matches the existing setSharedTables / setTenant configuration style on this adapter,
  • lets cloud's deletes worker opt in with a single line: $auditAdapter->setAsyncCleanup(true).

Test plan

  • composer lint (Pint)
  • composer check (PHPStan level max, no errors)
  • CI: existing test suite passes (default-sync behavior is untouched)
  • Cloud follow-up: deletes worker calls setAsyncCleanup(true) and verifies the 30s timeout is gone in staging

🤖 Generated with Claude Code

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR makes cleanup() non-blocking on the ClickHouse adapter by appending SETTINGS lightweight_deletes_sync = 0 to the lightweight DELETE statement when the new asyncCleanup flag is enabled. The default remains synchronous (false), so existing behavior and tests are unaffected.

  • Adds $asyncCleanup property (default false) with setAsyncCleanup/isAsyncCleanup accessors following the same pattern as the existing sharedTables configuration.
  • When enabled, the query-level SETTINGS lightweight_deletes_sync = 0 is appended after the WHERE clause, which is the correct position in ClickHouse's DELETE syntax.

Confidence Score: 5/5

Safe to merge — the change is purely opt-in and the default synchronous path is entirely unchanged.

The asyncCleanup flag defaults to false, leaving all existing tests and callers on the synchronous code path. The SETTINGS lightweight_deletes_sync = 0 clause is placed correctly at the end of the DELETE statement after the WHERE clause, which matches ClickHouse's lightweight-delete syntax. The accessor API follows the same pattern already used for sharedTables. No logic, no data integrity concern, and no test breakage on the current code paths.

No files require special attention. The only changed file is src/Audit/Adapter/ClickHouse.php and the modification is minimal and well-scoped.

Important Files Changed

Filename Overview
src/Audit/Adapter/ClickHouse.php Adds opt-in async cleanup via setAsyncCleanup(true); appends SETTINGS lightweight_deletes_sync = 0 correctly after the WHERE clause in the DELETE statement. Default is synchronous, so existing behavior is unchanged.

Reviews (2): Last reviewed commit: "feat(clickhouse): add opt-in async clean..." | Re-trigger Greptile

`cleanup()` defaults to ClickHouse's synchronous mutation behavior, so
existing tests and callers see no change. Consumers that just need to
schedule the DELETE (e.g. maintenance workers running per-project on a
shared multi-tenant table) can call `setAsyncCleanup(true)` to append
`SETTINGS lightweight_deletes_sync = 0` and have the HTTP call return
once the mutation is queued.

This avoids the 30s client timeout observed in production when the
per-tenant DELETE serialized N mutations through Keeper:

    Failed to cleanup ClickHouse audit logs for project <id>:
    ClickHouse query execution failed: Operation timed out after
    30026 milliseconds with 0 bytes received

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lohanidamodar lohanidamodar force-pushed the chore/cleanup-async-deletes branch from 2b17a1c to a889e2e Compare May 14, 2026 03:37
@lohanidamodar lohanidamodar changed the title fix(clickhouse): make cleanup() non-blocking via lightweight_deletes_sync=0 feat(clickhouse): add opt-in async cleanup via setAsyncCleanup() May 14, 2026
@lohanidamodar lohanidamodar merged commit 807d14e into main May 14, 2026
4 checks passed
@lohanidamodar lohanidamodar deleted the chore/cleanup-async-deletes branch May 14, 2026 03:55
lohanidamodar added a commit to appwrite/appwrite that referenced this pull request May 14, 2026
Picks up the opt-in async cleanup setter (utopia-php/audit#117)
needed for the deletes worker on the cloud side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants