Skip to content

HYPERFLEET-859 - docs: revise delete/update test cases for hard-delete design and terminology#84

Open
kuudori wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
kuudori:HYPERFLEET-859
Open

HYPERFLEET-859 - docs: revise delete/update test cases for hard-delete design and terminology#84
kuudori wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
kuudori:HYPERFLEET-859

Conversation

@kuudori
Copy link
Copy Markdown
Contributor

@kuudori kuudori commented Apr 27, 2026

Align test case documents with the implemented hard-delete mechanism and current API terminology after recent feature merges.

  • Replace "Ready" with "Reconciled" as the primary convergence condition
  • Remove stale preconditions (hard-delete endpoints, mutation guard TODOs)
  • Add namespace cleanup verification steps (Step 5) to deletion tests
  • Add hard-delete mechanism note linking to architecture design doc
  • Add three new test cases: LIST soft-deleted clusters, cascade DELETE while nodepool already deleting, cascade DELETE during nodepool update
  • Fix recreate-same-name test to explicitly reuse cluster name via jq
  • Update cross-references and hard-delete descriptions across all files

Summary

Test Plan

  • Unit tests added/updated
  • make test-all passes
  • make lint passes
  • Helm chart changes validated with make test-helm (if applicable)
  • Deployed to a development cluster and verified (if Helm/config changes)
  • E2E tests passed (if cross-component or major changes)

Summary by CodeRabbit

  • Documentation

    • Standardized lifecycle checks to use Reconciled/Available instead of Ready; clarified hard-delete is computed inline and tests should poll until GET returns 404. Relaxed strict timestamp/finalizer matching and removed several implementation preconditions.
  • Tests

    • Added deterministic soft-delete visibility tests (LIST/GET show tombstoned entries), cascade-delete edge cases, labels-only and no-op PATCH generation tests, and explicit Kubernetes namespace-cleanup polling. Updated many lifecycle tests to assert Reconciled/observed_generation.
  • Bug Fix

    • Improved namespace-not-found error messaging for clearer diagnostics.

@openshift-ci openshift-ci Bot requested review from ma-hill and pnguyen44 April 27, 2026 18:52
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rh-amarin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This pull request revises test specifications to treat hard-delete as an inline operation computed during POST /adapter_statuses when Reconciled=True, replacing many Ready-based waits with Reconciled (and Available where added). Tests now poll GET until HTTP 404 for hard-delete completion and assert Kubernetes namespace cleanup by polling until NotFound. Soft-delete visibility is made deterministic by fencing reconciliation (scaling sentinels), then observing tombstoned resources with deleted_time before resuming reconciliation. Several new cascade and visibility test cases were added. A client change wraps namespace not-found errors.

Sequence Diagram(s)

sequenceDiagram
participant Tester as Test Runner / Client
participant API as Control Plane API
participant Adapter as Adapter (adapter_statuses)
participant Store as Persistent Store
participant K8s as Kubernetes Namespace

Tester->>API: DELETE /clusters/{id} (soft-delete)
API->>Store: mark deleted_time (soft-deleted)
API-->>Tester: 202 / 200 (soft-deleted representation)

Tester->>API: GET /clusters/{id} or LIST
API->>Store: fetch (include soft-deleted until hard-delete)
alt soft-deletion window
API-->>Tester: 200 with deleted_time
else after hard-delete completion
API-->>Tester: 404 Not Found
end

Adapter->>API: POST /adapter_statuses (reports Reconciled state)
alt Adapter reports Reconciled=True for deleted resource
API->>Store: compute inline hard-delete (remove resource)
API->>K8s: delete associated namespace
K8s-->>API: 202 Accepted (async deletion)
end

loop wait for namespace removal
API->>K8s: GET Namespace
K8s-->>API: 404 Not Found
end

API-->>Tester: subsequent GET returns 404 (terminal removal)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: revising test case documentation for hard-delete design and updated terminology.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
test-design/testcases/update-delete-test-matrix.md (1)

37-40: Clarify or normalize non-contiguous test IDs

The matrix skips #19 and #20 without explanation. Either renumber contiguously or add a short note that IDs are intentionally stable/reserved, so cross-file references stay unambiguous.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-design/testcases/update-delete-test-matrix.md` around lines 37 - 40, The
test matrix currently omits IDs 19 and 20 causing non-contiguous numbering;
either renumber the rows so IDs are contiguous (e.g., change 21→19, 22→20,
23→21) or add a short clarifying note above the table stating that IDs 19–20 are
intentionally reserved/stable for cross-file references; update the table header
or nearby text accordingly and ensure links (e.g., the references in the rows
currently labeled 21, 22, 23) remain correct after renumbering or unchanged if
you choose the reservation note approach.
test-design/testcases/update-cluster.md (1)

502-522: No-op PATCH command examples are not directly executable

<canonical_spec_from_step_1>, <canonical_spec_keys_reordered_or_pretty_printed>, and ... make these steps ambiguous for automation and manual runs. Prefer concrete shell-variable examples (or jq-generated payloads piped to curl -d @-``) so the test is reproducible without interpretation.

Also applies to: 544-551

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-design/testcases/update-cluster.md` around lines 502 - 522, Replace
ambiguous placeholders like `<canonical_spec_from_step_1>`,
`<canonical_spec_keys_reordered_or_pretty_printed>`, and ellipses with concrete,
executable examples or instructions: show how to capture the canonical spec into
a shell variable (e.g., CANONICAL_SPEC) or generate it with `jq` and then pipe
to `curl -d `@-``, and provide an explicit example of a reordered-key payload
(e.g., using `jq` to reorder keys) for both Case A and Case B so the PATCH
commands are copy-paste runnable; update the test text around those symbols to
include the exact shell-variable names and one `jq`-piped payload example for
reproducibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test-design/testcases/delete-cluster.md`:
- Around line 133-138: The test currently uses
Eventually(...).ShouldNot(Succeed()) which only checks for any error; update the
assertion to specifically wait for a NotFound result from h.GetNamespace(ctx,
clusterID) instead. Replace the generic ShouldNot(Succeed()) with an assertion
that the call returns an error and that the error satisfies the Kubernetes "not
found" predicate (e.g., errors.Is(err, apierrors.NewNotFound(...)) /
apierrors.IsNotFound(err) or your project's helper for IsNotFound) while keeping
the same Eventually timeout/polling (h.Cfg.Timeouts.Adapter.Processing and
h.Cfg.Polling.Interval) and the same invocation of h.GetNamespace so the test
fails on other errors like auth/network/API faults.

In `@test-design/testcases/delete-nodepool.md`:
- Around line 558-585: The GET (Step 3) and LIST (Step 4) checks race with
hard-delete; combine them into a single polling block using the test framework's
Eventually/timeout (e.g., the existing polling used in Step 3) that on each poll
performs both: GET /nodepools/{deleted_nodepool_id} and LIST /nodepools, and
succeeds if either (a) GET returns 200 with deleted_time and LIST contains both
{active_nodepool_id} (deleted_time null) and {deleted_nodepool_id} (deleted_time
set), or (b) both endpoints return the terminal 404 indicating hard-delete
completed; update the test flow to stop running Step 4 separately and instead
use this unified check to avoid the narrow race window.

---

Nitpick comments:
In `@test-design/testcases/update-cluster.md`:
- Around line 502-522: Replace ambiguous placeholders like
`<canonical_spec_from_step_1>`,
`<canonical_spec_keys_reordered_or_pretty_printed>`, and ellipses with concrete,
executable examples or instructions: show how to capture the canonical spec into
a shell variable (e.g., CANONICAL_SPEC) or generate it with `jq` and then pipe
to `curl -d `@-``, and provide an explicit example of a reordered-key payload
(e.g., using `jq` to reorder keys) for both Case A and Case B so the PATCH
commands are copy-paste runnable; update the test text around those symbols to
include the exact shell-variable names and one `jq`-piped payload example for
reproducibility.

In `@test-design/testcases/update-delete-test-matrix.md`:
- Around line 37-40: The test matrix currently omits IDs 19 and 20 causing
non-contiguous numbering; either renumber the rows so IDs are contiguous (e.g.,
change 21→19, 22→20, 23→21) or add a short clarifying note above the table
stating that IDs 19–20 are intentionally reserved/stable for cross-file
references; update the table header or nearby text accordingly and ensure links
(e.g., the references in the rows currently labeled 21, 22, 23) remain correct
after renumbering or unchanged if you choose the reservation note approach.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: f64e226b-6e49-410d-a6d3-6c44d3620b7e

📥 Commits

Reviewing files that changed from the base of the PR and between 4a8d1a8 and 32d7df6.

📒 Files selected for processing (5)
  • test-design/testcases/delete-cluster.md
  • test-design/testcases/delete-nodepool.md
  • test-design/testcases/update-cluster.md
  • test-design/testcases/update-delete-test-matrix.md
  • test-design/testcases/update-nodepool.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • test-design/testcases/update-nodepool.md

Comment thread test-design/testcases/delete-cluster.md
Comment thread test-design/testcases/delete-nodepool.md
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
test-design/testcases/delete-cluster.md (1)

255-264: ⚠️ Potential issue | 🟠 Major

Namespace cleanup assertion can false-pass on unrelated errors

Line 255–Line 264 still uses ShouldNot(Succeed()), which passes on any error (auth/network/transient failures included) and does not prove namespace deletion. Assert explicit NotFound semantics instead.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-design/testcases/delete-cluster.md` around lines 255 - 264, The current
Eventually block uses ShouldNot(Succeed()) which false‑passes on any error;
change the assertion to explicitly assert Kubernetes NotFound semantics. Modify
the Eventually call around h.GetNamespace(ctx, clusterID) to return a boolean
(or convert the returned error) and assert apierrors.IsNotFound(err) (from
k8s.io/apimachinery/pkg/api/errors) — e.g., call h.GetNamespace in the closure,
capture err, and use apierrors.IsNotFound(err) with .Should(BeTrue()) (keep the
same timeouts: h.Cfg.Timeouts.Adapter.Processing and h.Cfg.Polling.Interval and
the same clusterID/Eventually wrapper).
test-design/testcases/delete-nodepool.md (1)

558-585: ⚠️ Potential issue | 🟠 Major

Unify GET/LIST soft-delete checks into a single polling block to avoid flake

Line 558–Line 585 split transient tombstone assertions across two steps. In fast runs, hard-delete can finish between them, so LIST fails even when behavior is correct. Merge these into one Eventually block that, per poll, evaluates both GET and LIST and accepts either (a) tombstone-visible state or (b) terminal 404 state.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-design/testcases/delete-nodepool.md` around lines 558 - 585, Merge the
separate soft-delete assertions into a single Eventually polling block that, on
each poll, performs both the GET (for {deleted_nodepool_id}) and the LIST (for
the cluster) and accepts either: (a) tombstone-visible state (GET returns 200
with deleted_time set and LIST contains both {active_nodepool_id} with no
deleted_time and {deleted_nodepool_id} with deleted_time) or (b) terminal 404
from GET indicating hard-delete completed; use the framework-configured
polling/timeout (e.g., Eventually with 500ms interval, 10s timeout) and
short-circuit the Eventually when either condition is satisfied so the test is
robust to fast hard-delete timing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test-design/testcases/delete-cluster.md`:
- Around line 132-139: The FetchNamespace function currently returns a new error
that discards the original Kubernetes error type, causing
apierrors.IsNotFound(err) to fail; update the error return in FetchNamespace
(pkg/client/kubernetes/client.go) to wrap the original err using fmt.Errorf with
the %w verb (e.g., "namespace %s not found: %w") so the sentinel NotFound type
is preserved for apierrors.IsNotFound(), or if you prefer not to change
FetchNamespace, update the test in delete-cluster.md to assert on the error
string instead of using apierrors.IsNotFound().

---

Duplicate comments:
In `@test-design/testcases/delete-cluster.md`:
- Around line 255-264: The current Eventually block uses ShouldNot(Succeed())
which false‑passes on any error; change the assertion to explicitly assert
Kubernetes NotFound semantics. Modify the Eventually call around
h.GetNamespace(ctx, clusterID) to return a boolean (or convert the returned
error) and assert apierrors.IsNotFound(err) (from
k8s.io/apimachinery/pkg/api/errors) — e.g., call h.GetNamespace in the closure,
capture err, and use apierrors.IsNotFound(err) with .Should(BeTrue()) (keep the
same timeouts: h.Cfg.Timeouts.Adapter.Processing and h.Cfg.Polling.Interval and
the same clusterID/Eventually wrapper).

In `@test-design/testcases/delete-nodepool.md`:
- Around line 558-585: Merge the separate soft-delete assertions into a single
Eventually polling block that, on each poll, performs both the GET (for
{deleted_nodepool_id}) and the LIST (for the cluster) and accepts either: (a)
tombstone-visible state (GET returns 200 with deleted_time set and LIST contains
both {active_nodepool_id} with no deleted_time and {deleted_nodepool_id} with
deleted_time) or (b) terminal 404 from GET indicating hard-delete completed; use
the framework-configured polling/timeout (e.g., Eventually with 500ms interval,
10s timeout) and short-circuit the Eventually when either condition is satisfied
so the test is robust to fast hard-delete timing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 527d626b-9807-4706-a702-3a96b214f6dd

📥 Commits

Reviewing files that changed from the base of the PR and between 32d7df6 and e47f412.

📒 Files selected for processing (5)
  • test-design/testcases/delete-cluster.md
  • test-design/testcases/delete-nodepool.md
  • test-design/testcases/update-cluster.md
  • test-design/testcases/update-delete-test-matrix.md
  • test-design/testcases/update-nodepool.md

Comment thread test-design/testcases/delete-cluster.md
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
test-design/testcases/delete-cluster.md (1)

260-264: ⚠️ Potential issue | 🟠 Major

Namespace deletion check can false-pass on non-NotFound errors

Line 260 currently succeeds on any error (ShouldNot(Succeed())), so auth/network/API failures can be miscounted as successful namespace deletion.

Proposed doc fix
-Eventually(func() error {
-    _, err := h.GetNamespace(ctx, clusterID)
-    return err
-}, h.Cfg.Timeouts.Adapter.Processing, h.Cfg.Polling.Interval).ShouldNot(Succeed())
+Eventually(func(g Gomega) {
+    _, err := h.GetNamespace(ctx, clusterID)
+    g.Expect(err).To(HaveOccurred())
+    g.Expect(apierrors.IsNotFound(err)).To(BeTrue(), "expected NotFound from GetNamespace, got: %v", err)
+}, h.Cfg.Timeouts.Adapter.Processing, h.Cfg.Polling.Interval).Should(Succeed())

Verification (read-only) to find remaining broad matcher usage:

#!/bin/bash
rg -n -C2 'Eventually\(func\(\) error \{' test-design/testcases/delete-cluster.md
rg -n 'ShouldNot\(Succeed\(\)\)' test-design/testcases/delete-cluster.md
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-design/testcases/delete-cluster.md` around lines 260 - 264, The current
Eventually block treats any error from h.GetNamespace(ctx, clusterID) as success
because it uses ShouldNot(Succeed()); change the predicate to only consider a
NotFound response as the success condition: call h.GetNamespace inside the
Eventually anonymous function and if the call returns a NotFound error return
nil (so Eventually succeeds), otherwise return the actual error (so
network/auth/API errors keep retrying/fail properly). Keep the same timeout and
interval (h.Cfg.Timeouts.Adapter.Processing, h.Cfg.Polling.Interval) and
reference the existing symbols: Eventually, h.GetNamespace, clusterID, ctx, and
the time config values.
test-design/testcases/delete-nodepool.md (1)

567-574: ⚠️ Potential issue | 🟠 Major

GET visibility step can pass without observing the soft-deleted nodepool

Line 567 allows the step to succeed when hard-delete already returned 404. That skips the core assertion that the nodepool is observable in tombstoned state before hard-delete.

Proposed doc adjustment
-- Poll GET with `Eventually` until either (a) the tombstoned nodepool is observed via HTTP 200, or (b) hard-delete completes via HTTP 404.
+- Poll GET with `Eventually` until the tombstoned nodepool is observed via HTTP 200 with `deleted_time` populated.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-design/testcases/delete-nodepool.md` around lines 567 - 574, The GET
visibility step currently allows success if any poll returns 404 (hard-delete)
before observing the soft-deleted/tombstoned nodepool; update the
polling/assertion logic for the step that uses Eventually against the GET
${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{deleted_nodepool_id}
so it requires observing at least one HTTP 200 response with the nodepool and
deleted_time populated before treating a subsequent HTTP 404 as terminal.
Concretely, modify the Eventually condition (the test step that describes "Poll
GET with `Eventually`..." / the GET visibility assertion) to first wait for and
record a 200-with-deleted_time event, and only then allow the sequence to end on
404 (hard-delete) or timeout; keep the existing framework polling/timeout
values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test-design/testcases/delete-cluster.md`:
- Around line 340-347: Update the polling acceptance criteria in the "Poll GET
with `Eventually`..." step so an immediate HTTP 404 cannot satisfy the test:
require observing an HTTP 200 response with the cluster object and
`deleted_time` (and `generation == 2`) at least once before treating a
subsequent HTTP 404 as success; implement this by changing the Eventually
predicate for the GET ${API_URL}/api/hyperfleet/v1/clusters/{cluster_id} to
track whether a 200 has been seen and only allow final success on 404 if that
flag is set; apply the same change to the corresponding block referenced at
lines 1460-1469.

---

Duplicate comments:
In `@test-design/testcases/delete-cluster.md`:
- Around line 260-264: The current Eventually block treats any error from
h.GetNamespace(ctx, clusterID) as success because it uses ShouldNot(Succeed());
change the predicate to only consider a NotFound response as the success
condition: call h.GetNamespace inside the Eventually anonymous function and if
the call returns a NotFound error return nil (so Eventually succeeds), otherwise
return the actual error (so network/auth/API errors keep retrying/fail
properly). Keep the same timeout and interval
(h.Cfg.Timeouts.Adapter.Processing, h.Cfg.Polling.Interval) and reference the
existing symbols: Eventually, h.GetNamespace, clusterID, ctx, and the time
config values.

In `@test-design/testcases/delete-nodepool.md`:
- Around line 567-574: The GET visibility step currently allows success if any
poll returns 404 (hard-delete) before observing the soft-deleted/tombstoned
nodepool; update the polling/assertion logic for the step that uses Eventually
against the GET
${API_URL}/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{deleted_nodepool_id}
so it requires observing at least one HTTP 200 response with the nodepool and
deleted_time populated before treating a subsequent HTTP 404 as terminal.
Concretely, modify the Eventually condition (the test step that describes "Poll
GET with `Eventually`..." / the GET visibility assertion) to first wait for and
record a 200-with-deleted_time event, and only then allow the sequence to end on
404 (hard-delete) or timeout; keep the existing framework polling/timeout
values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: e10d1715-0d97-43ca-9300-2fea77892ea1

📥 Commits

Reviewing files that changed from the base of the PR and between e47f412 and f490bd1.

📒 Files selected for processing (6)
  • pkg/client/kubernetes/client.go
  • test-design/testcases/delete-cluster.md
  • test-design/testcases/delete-nodepool.md
  • test-design/testcases/update-cluster.md
  • test-design/testcases/update-delete-test-matrix.md
  • test-design/testcases/update-nodepool.md
✅ Files skipped from review due to trivial changes (1)
  • pkg/client/kubernetes/client.go

Comment thread test-design/testcases/delete-cluster.md Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test-design/testcases/delete-cluster.md (1)

258-264: ⚠️ Potential issue | 🟠 Major

Namespace cleanup check can false-pass on non-NotFound errors

Line 258–264 uses ShouldNot(Succeed()), which succeeds on any error, not specifically namespace deletion. This can mask transient/auth/API faults as a pass.

Suggested fix
-Eventually(func() error {
-    _, err := h.GetNamespace(ctx, clusterID)
-    return err
-}, h.Cfg.Timeouts.Adapter.Processing, h.Cfg.Polling.Interval).ShouldNot(Succeed())
+Eventually(func(g Gomega) {
+    _, err := h.GetNamespace(ctx, clusterID)
+    g.Expect(err).To(HaveOccurred())
+    g.Expect(apierrors.IsNotFound(err)).To(BeTrue(), "expected NotFound from GetNamespace, got: %v", err)
+}, h.Cfg.Timeouts.Adapter.Processing, h.Cfg.Polling.Interval).Should(Succeed())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-design/testcases/delete-cluster.md` around lines 258 - 264, The current
Eventually(...) closure uses ShouldNot(Succeed()) which treats any error as
success and can false-pass; change the closure around h.GetNamespace to
explicitly detect a NotFound by using the Kubernetes API error helper
(apierrors.IsNotFound / k8s.io/apimachinery/pkg/api/errors.IsNotFound): call _,
err := h.GetNamespace(ctx, clusterID); if apierrors.IsNotFound(err) { return nil
} else { return err } and then assert Eventually(..., ...).Should(Succeed()) so
the wait only succeeds when a NotFound is observed and surfaces other errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test-design/testcases/delete-cluster.md`:
- Around line 258-264: The current Eventually(...) closure uses
ShouldNot(Succeed()) which treats any error as success and can false-pass;
change the closure around h.GetNamespace to explicitly detect a NotFound by
using the Kubernetes API error helper (apierrors.IsNotFound /
k8s.io/apimachinery/pkg/api/errors.IsNotFound): call _, err :=
h.GetNamespace(ctx, clusterID); if apierrors.IsNotFound(err) { return nil } else
{ return err } and then assert Eventually(..., ...).Should(Succeed()) so the
wait only succeeds when a NotFound is observed and surfaces other errors.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 60205f43-6e25-4b0b-9ed7-a68f34af98ba

📥 Commits

Reviewing files that changed from the base of the PR and between f490bd1 and b9ce779.

📒 Files selected for processing (6)
  • pkg/client/kubernetes/client.go
  • test-design/testcases/delete-cluster.md
  • test-design/testcases/delete-nodepool.md
  • test-design/testcases/update-cluster.md
  • test-design/testcases/update-delete-test-matrix.md
  • test-design/testcases/update-nodepool.md
✅ Files skipped from review due to trivial changes (1)
  • pkg/client/kubernetes/client.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test-design/testcases/update-nodepool.md

**Action:**
- Poll using `h.GetNamespace()` within `Eventually()` until a `NotFound` error is returned:
```go
Eventually(func(g Gomega) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of Eventually

Comment thread test-design/testcases/delete-cluster.md Outdated

**Action:**
- Retrieve the cluster immediately after soft-delete (before hard-delete completes):
- Poll GET with `Eventually` until the tombstoned cluster is observed via HTTP 200 with `deleted_time` populated. While the Sentinel fence is active, HTTP 404 in this step is a failure (it means visibility was not proven). Use framework-configured polling/timeout values (for example, `500ms` interval and `10s` timeout):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the term tombstone/tombstoned a lot here, I know its mentioned in the tickets, but I think we removed this term from the architecture repo. Might be worth to normalize it here, just to avoid future drift.

Personally I like the term, reminds me of the undertaker from WWE, but might be best to move away from the informal term and stick with the aligned architecture repo

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word tombstone has been tombstoned 🪦

Comment thread test-design/testcases/delete-cluster.md Outdated
Eventually(func() error {
_, err := h.GetNamespace(ctx, clusterID)
return err
}, h.Cfg.Timeouts.Adapter.Processing, h.Cfg.Polling.Interval).ShouldNot(Succeed())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

Blocking

Category: Inconsistency

Test #2 (cascade delete) Step 5 still uses the weaker ShouldNot(Succeed()) namespace assertion that was fixed in test #1. The happy-path test at line 138 correctly uses:

Eventually(func(g Gomega) {
    _, err := h.GetNamespace(ctx, clusterID)
    g.Expect(err).To(HaveOccurred())
    g.Expect(apierrors.IsNotFound(err)).To(BeTrue(), "expected NotFound from GetNamespace, got: %v", err)
}, h.Cfg.Timeouts.Adapter.Processing, h.Cfg.Polling.Interval).Should(Succeed())

But the cascade test Step 5 at line 263 still has the old pattern that passes on any error:

Eventually(func() error {
    _, err := h.GetNamespace(ctx, clusterID)
    return err
}, h.Cfg.Timeouts.Adapter.Processing, h.Cfg.Polling.Interval).ShouldNot(Succeed())

Should use the same func(g Gomega) + IsNotFound pattern from test #1 Step 5 for consistency and correctness.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Definitely need to revise Gomega docs, fixed 🙂

…e design and terminology

Align test case documents with the implemented hard-delete mechanism
and current API terminology after recent feature merges.

- Replace "Ready" with "Reconciled" as the primary convergence condition
- Remove stale preconditions (hard-delete endpoints, mutation guard TODOs)
- Add namespace cleanup verification steps (Step 5) to deletion tests
- Add hard-delete mechanism note linking to architecture design doc
- Add three new test cases: LIST soft-deleted clusters, cascade DELETE
  while nodepool already deleting, cascade DELETE during nodepool update
- Fix recreate-same-name test to explicitly reuse cluster name via jq
- Update cross-references and hard-delete descriptions across all files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants