Skip to content

OCPBUGS-92792: vsphere - fix CNS volume destroy to aggregate errors#10658

Open
jcpowermac wants to merge 1 commit into
openshift:mainfrom
jcpowermac:OCPBUGS-92792/vsphere-cns-destroy-error-handling
Open

OCPBUGS-92792: vsphere - fix CNS volume destroy to aggregate errors#10658
jcpowermac wants to merge 1 commit into
openshift:mainfrom
jcpowermac:OCPBUGS-92792/vsphere-cns-destroy-error-handling

Conversation

@jcpowermac

@jcpowermac jcpowermac commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

deleteCnsVolumes returned on the first error, which silently skipped remaining volumes on that vCenter and all volumes on subsequent vCenters. GetCnsVolumes had the same early-return on per-volume query failures, discarding already-queried results.

Align both functions with the error aggregation pattern used by deleteVirtualMachines: collect errors, attempt all items, and return utilerrors.NewAggregate. Also log deleted/found counts at Info level so partial cleanup is visible without --log-level=debug.

Summary by CodeRabbit

  • Bug Fixes
    • Improved vSphere CNS volume cleanup to be best-effort: it now continues processing when individual CNS volume lookups or deletions fail.
    • Volume discovery and deletion now handle partial results, preserving successful deletions while reporting all encountered issues.
    • Added clearer per-vCenter summary reporting during cleanup runs.
  • Tests
    • Expanded vSphere cleanup test coverage for partial failures and multi-client error handling scenarios.

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 25, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@jcpowermac: This pull request references Jira Issue OCPBUGS-92792, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

deleteCnsVolumes returned on the first error, which silently skipped remaining volumes on that vCenter and all volumes on subsequent vCenters. GetCnsVolumes had the same early-return on per-volume query failures, discarding already-queried results.

Align both functions with the error aggregation pattern used by deleteVirtualMachines: collect errors, attempt all items, and return utilerrors.NewAggregate. Also log deleted/found counts at Info level so partial cleanup is visible without --log-level=debug.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcpowermac jcpowermac changed the title OCPBUGS-92792: vsphere: fix CNS volume destroy to aggregate errors OCPBUGS-92792: vsphere - fix CNS volume destroy to aggregate errors Jun 25, 2026
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 78feee82-f1dc-47e3-9ed1-84e2658ee2a7

📥 Commits

Reviewing files that changed from the base of the PR and between 8e38d64 and 0223f4f.

📒 Files selected for processing (3)
  • pkg/destroy/vsphere/client.go
  • pkg/destroy/vsphere/vsphere.go
  • pkg/destroy/vsphere/vsphere_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/destroy/vsphere/vsphere.go
  • pkg/destroy/vsphere/vsphere_test.go
  • pkg/destroy/vsphere/client.go

📝 Walkthrough

Walkthrough

CNS volume lookup now keeps successful matches while aggregating per-volume query errors. Deletion continues across all vCenter clients, aggregates failures, and is covered for deletion failures and partial query results.

Changes

vSphere CNS volume deletion resilience

Layer / File(s) Summary
CNS lookup aggregation
pkg/destroy/vsphere/client.go
GetCnsVolumes logs per-volume query failures, keeps matched volumes, and returns an aggregated error when any query fails.
Best-effort CNS deletion
pkg/destroy/vsphere/vsphere.go
deleteCnsVolumes continues across all vCenter clients, tracks deleted-versus-found counts, logs per-client summaries, and returns aggregated errors.
Deletion failure coverage
pkg/destroy/vsphere/vsphere_test.go
Tests cover deletion failures, client query failures, and partial query results while processing continues.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: vSphere CNS volume destroy now aggregates errors instead of failing fast.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The changed tests use static, descriptive subtest titles; no dynamic values, timestamps, or generated identifiers appear in test names.
Test Structure And Quality ✅ Passed The added tests are isolated mock-based subtests, use fresh gomock controllers per case, and have no cluster ops or waits that need extra timeouts/cleanup.
Microshift Test Compatibility ✅ Passed Added tests are plain Go unit tests with gomock/testify, not Ginkgo e2e tests, and they reference no MicroShift-unsupported OpenShift APIs.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; pkg/destroy/vsphere/vsphere_test.go is unit tests with gomock/testify and no SNO assumptions.
Topology-Aware Scheduling Compatibility ✅ Passed Touched code is vSphere destroy logic only; no manifests, replicas, affinity, node selectors, or topology-aware scheduling constraints were added.
Ote Binary Stdout Contract ✅ Passed The PR only adds structured logrus output in destroy helpers; the main binary already discards logrus/klog stdout and routes logs to stderr, so no new process-level stdout writes.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PASS: The PR only adds Go unit tests in pkg/destroy/vsphere; no Ginkgo e2e tests, IPv4 literals, or external connectivity are introduced.
No-Weak-Crypto ✅ Passed Changed code only adjusts CNS cleanup error handling/logging; no weak crypto, custom crypto, or secret comparisons appear in touched files.
Container-Privileges ✅ Passed The PR only changes vSphere destroy Go logic/tests; no container/K8s manifests were touched and no privileged/hostPID/hostNetwork/allowPrivilegeEscalation settings appear in changed files.
No-Sensitive-Data-In-Logs ✅ Passed New logs only add opaque CNS volume IDs at debug level and deletion counts; no passwords, tokens, PII, session IDs, or hostnames were added.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from rvanderp3 and vr4manta June 25, 2026 21:14
@jcpowermac

Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Jun 25, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@jcpowermac: This pull request references Jira Issue OCPBUGS-92792, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jun 25, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@jcpowermac: This pull request references Jira Issue OCPBUGS-92792, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

deleteCnsVolumes returned on the first error, which silently skipped remaining volumes on that vCenter and all volumes on subsequent vCenters. GetCnsVolumes had the same early-return on per-volume query failures, discarding already-queried results.

Align both functions with the error aggregation pattern used by deleteVirtualMachines: collect errors, attempt all items, and return utilerrors.NewAggregate. Also log deleted/found counts at Info level so partial cleanup is visible without --log-level=debug.

Summary by CodeRabbit

  • Bug Fixes
  • Improved vSphere volume cleanup to continue working through partial failures instead of stopping at the first error.
  • Volume discovery and deletion now attempt to process all available clients and volumes, while still reporting any issues encountered.
  • Successful deletions are preserved even when some individual operations fail, giving more complete cleanup results.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/destroy/vsphere/vsphere_test.go (1)

727-812: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

LGTM! The three cases map cleanly onto the new behaviors: continue-after-delete-failure, all-clients-processed-after-first-fails, and partial-results-still-reported. Mock expectations (Times) precisely pin the continuation semantics.

Optional: the new cases assert only assert.Error, whereas the existing ones use assert.Regexp to pin the source message. Matching the failing volume's error text would tighten regression detection, but it's not blocking.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/destroy/vsphere/vsphere_test.go` around lines 727 - 812, The new
deleteCnsVolumes test cases only assert that an error occurred, but they should
also verify the propagated error text like the existing assertions do. Update
the three t.Run blocks in deleteCnsVolumes tests to assert on the specific
failing message/source using assert.Regexp or an equivalent message match, so
regressions in error reporting from
deleteCnsVolumes/GetCnsVolumes/DeleteCnsVolumes are caught. Use the existing
test names and mock expectations in deleteCnsVolumes as the anchor points.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/destroy/vsphere/vsphere_test.go`:
- Around line 727-812: The new deleteCnsVolumes test cases only assert that an
error occurred, but they should also verify the propagated error text like the
existing assertions do. Update the three t.Run blocks in deleteCnsVolumes tests
to assert on the specific failing message/source using assert.Regexp or an
equivalent message match, so regressions in error reporting from
deleteCnsVolumes/GetCnsVolumes/DeleteCnsVolumes are caught. Use the existing
test names and mock expectations in deleteCnsVolumes as the anchor points.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a9f450eb-b6c0-44ac-beb5-a30f7833fb1e

📥 Commits

Reviewing files that changed from the base of the PR and between 16a0282 and 8e38d64.

📒 Files selected for processing (3)
  • pkg/destroy/vsphere/client.go
  • pkg/destroy/vsphere/vsphere.go
  • pkg/destroy/vsphere/vsphere_test.go

@jcpowermac

Copy link
Copy Markdown
Contributor Author

/assign @AnnaZivkovic @vr4manta

deleteCnsVolumes returned on the first error, which silently skipped
remaining volumes on that vCenter and all volumes on subsequent
vCenters. GetCnsVolumes had the same early-return on per-volume query
failures, discarding already-queried results.

Align both functions with the error aggregation pattern used by
deleteVirtualMachines: collect errors, attempt all items, and return
utilerrors.NewAggregate. Also log deleted/found counts at Info level
so partial cleanup is visible without --log-level=debug.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jcpowermac jcpowermac force-pushed the OCPBUGS-92792/vsphere-cns-destroy-error-handling branch from 8e38d64 to 0223f4f Compare June 26, 2026 11:48
@jcpowermac

Copy link
Copy Markdown
Contributor Author

/cherry-pick release-4.22

@openshift-cherrypick-robot

Copy link
Copy Markdown

@jcpowermac: once the present PR merges, I will cherry-pick it on top of release-4.22 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@vr4manta

Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 26, 2026
@jcpowermac

Copy link
Copy Markdown
Contributor Author

/verify by @jcpowermac via ci and unit tests

@vr4manta

Copy link
Copy Markdown
Contributor

/approve

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vr4manta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2026
@jcpowermac

Copy link
Copy Markdown
Contributor Author

/verified by @jcpowermac via ci and unit tests

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 26, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@jcpowermac: This PR has been marked as verified by @jcpowermac via ci and unit tests.

Details

In response to this:

/verified by @jcpowermac via ci and unit tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 6831905 and 2 for PR HEAD 0223f4f in total

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 4644679 and 1 for PR HEAD 0223f4f in total

@openshift-ci

openshift-ci Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

@jcpowermac: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn-devpreview 0223f4f link false /test e2e-vsphere-ovn-devpreview
ci/prow/e2e-aws-ovn 0223f4f link unknown /test e2e-aws-ovn
ci/prow/e2e-vsphere-multi-vcenter-ovn 0223f4f link false /test e2e-vsphere-multi-vcenter-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants