feat(clickhouse): Query::select + notStartsWith / notEndsWith / regex / orderRandom#116
Conversation
…pter Adds the missing Query types so audit callers can opt in to slim projections and the full filter/order menu rather than being limited to a subset: - `Query::select(['col', ...])` — column projection. Multiple `select()` calls combine; `id` is always projected so the Log model still has its identifier. Each requested column is validated against the schema and identifier-escaped at SQL build time. Without `select()`, the existing full-column behaviour is unchanged. - `Query::notStartsWith(...)` / `Query::notEndsWith(...)` — symmetric with the existing `startsWith` / `endsWith`, emitted as `NOT startsWith(col, val)` / `NOT endsWith(col, val)`. - `Query::regex(col, pattern)` — compiled to ClickHouse's `match(haystack, pattern)`. Pattern is parameter-bound, never inlined. - `Query::orderRandom()` — `ORDER BY rand()`. Mutually exclusive with cursor pagination — combining the two throws, since cursor needs a stable order to anchor the next page on. `select`, `regex`, `notStartsWith`, `notEndsWith` are added to `VALUE_REQUIRED_METHODS` so they fail loudly when given an empty values array, matching the existing contract. Skipped: full-text `search`/`notSearch`, `exists`/`notExists`, `containsAny`/`containsAll`/`elemMatch`, vector / spatial types — none map cleanly to the audit table's scalar columns. `and` / `or` logical combinations are filed as a separate follow-up because they need a recursive filter compiler that we don't have today. Eight new tests cover happy paths, unknown-column rejection, empty-values rejection, and the cursor + random incompatibility.
When `sharedTables` is enabled the full-projection path already appends the `tenant` column to every SELECT, so the slim `Query::select(...)` path should match — `tenant` is metadata callers expect on every row regardless of which columns they explicitly listed. Force-include it alongside `id` (which was already always-projected so the Log model keeps its identifier).
Greptile SummaryThis PR fills out the
Confidence Score: 5/5Safe to merge; all SQL is parameterised, identifier-escaped, and schema-validated. The one edge case (select projection missing a cursor order column) throws a clear exception rather than silently corrupting results. The change is well-contained: new query types are added to an isolated switch, forced columns are always injected, and incompatible combinations (orderRandom+cursor, orderRandom+orderBy) are explicitly rejected. The single usability gap — a caller omitting a cursor order column from their projection — surfaces as a named exception on the second page call, not as wrong data. No files require special attention; the select+cursor interaction in ClickHouse.php is the only area worth a second look. Important Files Changed
Reviews (3): Last reviewed commit: "Merge branch 'main' into feat-clickhouse..." | Re-trigger Greptile |
Two follow-ups from greptile on c1b85ad: - buildProjection no longer re-validates user-supplied select columns. parseQueries already calls validateAttributeName on each column inside the TYPE_SELECT branch, so the second walk through getAttributes() in buildProjection was wasted work. The forced columns (id, tenant) still get the defensive check since they're injected here, not user input. - orderRandom combined with orderAsc/orderDesc now throws. Previously rand() silently took precedence over the requested column order, which is inconsistent with how the cursor + random combination is rejected. The new guard mirrors that pattern so callers see the conflict explicitly rather than getting unexpected results.
Summary
Fills out the supported
Querymethod set on the ClickHouse adapter so callers can opt into slim projections and the rest of the filter / order menu. Previously these methods were silently ignored when passed tofind()(the projection always returned every column, etc.).Added:
Query::select(['col', ...])— column projection. Multipleselect()calls combine; duplicates dropped;idis always projected so theLogmodel still has its identifier without callers having to remember. Each requested column is validated against the schema (validateAttributeName) and identifier-escaped at SQL build time. Withoutselect(), the previous full-column behaviour is unchanged.Query::notStartsWith(...)/Query::notEndsWith(...)— symmetric with the existingstartsWith/endsWith, compiled asNOT startsWith(col, val)/NOT endsWith(col, val).Query::regex(col, pattern)— compiled to ClickHouse'smatch(haystack, pattern). Pattern is parameter-bound, never inlined.Query::orderRandom()—ORDER BY rand(). Mutually exclusive with cursor pagination — combining the two throws because cursor needs a stable order to anchor the next page on.select,regex,notStartsWith,notEndsWithwere added toVALUE_REQUIRED_METHODSso they fail loudly withSelect queries require at least one value.(etc.) on an empty values array, matching the existing contract.Why
Most directly, the cloud audit-events list endpoint is doing a
SELECT *on rows with a sizeabledataJSON payload — for UI listing pages that only needid,time,event,userId,resource, projecting just those columns avoids the dominant I/O cost.Query::selectwas the missing piece. Adding the rest while we're in here so the supported set matchesutopia-php/query's contract.Not in this PR
search/notSearch— no fulltext index on the audit tableexists/notExists— doc-store concept, doesn't map cleanlycontainsAny/containsAll/elemMatch— array-column methods, audit columns are scalarand/orlogical combinations — would need a recursive filter compiler that doesn't exist today; filed as a follow-upAPI
Test plan
composer lintpassescomposer check(PHPStan max) passes🤖 Generated with Claude Code