feat: warnings as a kubernetes events#209
Conversation
|
|
There was a problem hiding this comment.
Pull request overview
This PR surfaces rows from ClickHouse's system.warnings table as Kubernetes Warning events on the ClickHouseCluster object. A new reconcile step (reconcileWarnings) polls every replica's system.warnings in parallel and emits one event per warning message via the existing event recorder, using a hash of the message as the event "action" to drive aggregation. An e2e test triggers a warning by setting max_table_num_to_warn: 1 and asserts both the row in system.warnings and the corresponding ClickHouseWarning event.
Changes:
- New
Warningsstep in the ClickHouse reconcile chain that calls a newcommander.Warnings()helper and emitsClickHouseWarningevents with hashed actions for dedup. - New
EventWarningRecordreason constant,WarningsPollInterval(30s) constant, and unconditional 30s requeue. - e2e coverage plus a
cert-managerinstall-wait timeout bump from 5m to 10m.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| api/v1alpha1/events.go | Adds EventWarningRecord event reason constant. |
| internal/controller/constants.go | Adds WarningsPollInterval = 30s. |
| internal/controller/clickhouse/commands.go | Adds commander.Warnings() querying system.warnings. |
| internal/controller/clickhouse/sync.go | Adds reconcileWarnings step + warningAction hash helper; emits per-warning events and unconditionally requeues. |
| test/e2e/clickhouse_e2e_test.go | New e2e test verifying warnings flow end-to-end. |
| test/testutil/utils.go | Bumps cert-manager webhook wait timeout from 5m to 10m. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…rns in commands.go
| return fmt.Sprintf("Warning-%016x", h.Sum64()) | ||
| } | ||
|
|
||
| func (r *clickhouseReconciler) reconcileWarnings(ctx context.Context, log ctrlutil.Logger) (chctrl.StepResult, error) { |
There was a problem hiding this comment.
This will send an event for every warning on every reconciliation. We need a way to deduplicate it
There was a problem hiding this comment.
kubernetes does deduplication by itself - https://github.com/kubernetes/design-proposals-archive/blob/main/instrumentation/events-redesign.md
Deduplication logic groups together Event which have the same:
Source Component and Host
InvolvedObject Kind, Namespace, Name, API version and UID,
Type (works as Severity)
Reason
There was a problem hiding this comment.
btw that's why we need to add a hash in warningAction. Otherwise it will merge all messages into one and my test will become flaky
Why
We want to track warnings from pods. This PR allows to add records from the
system.warningtable into k8s events