docs(cozystack-upgrade): add KubeVirt 1.6→1.8 VM cold-restart workflow by kvaps · Pull Request #7 · cozystack/ccp

kvaps · 2026-04-28T21:36:07Z

Summary

Adds a procedure to the cozystack-upgrade skill for the KubeVirt 1.6.x → 1.8.x bump that's coming with Cozystack release-1.4 (cozystack/cozystack#2502).

When that upgrade is applied via helm upgrade cozystack, every VM that was running pre-upgrade fails to live-migrate afterwards because the new QEMU can't reload the old in-memory virtio-net device state (kubevirt/kubevirt#16386). KubeVirt's workloadUpdateMethods keeps retrying, the cluster ends up flapping.

Validated end-to-end on staging (hidora-hikube-lab) and production (hidora-hikube): 161 running VMs, ~85 minutes total, no customer-visible incidents.

Changes

references/known-failures.md — new entry #8 with the exact pre-upgrade prep (workloadUpdateMethods: [], suspend the kubevirt HR), the paced cold-restart loop, post-upgrade verification, and the steady-state cleanup.
SKILL.md — adds a red-flag table row and a top-level "KubeVirt 1.6.x → 1.8.x special handling" note so the skill catches this before running helm upgrade.

The flow is built around the conventional Cozystack upgrade path (helm upgrade cozystack ...), not ad-hoc make apply. Coordination with VM owners is the main requirement: every non-excluded VM gets ~30-60s downtime in a controlled order.

Why "do not merge"

Blocked on cozystack/cozystack#2502 (the actual KubeVirt 1.8.2 bump). This skill change describes the workflow for a Cozystack release that doesn't exist yet — merging earlier would point users at a procedure they don't need.

Merge condition: merge once cozystack/cozystack#2502 lands in a Cozystack release (currently targeted at release-1.4). If a better upstream fix appears for kubevirt/kubevirt#16386 before then (e.g. a way to pin per-VMI launcher images so existing VMs don't need cold-restart), revisit this PR — the workflow may no longer be needed.

Cozystack release-1.4 will bump KubeVirt from 1.6.3 to 1.8.2 (cozystack PR #2502). Every VM that was running before the upgrade then fails to live-migrate because the in-memory QEMU device state can't be reloaded by the new QEMU on the target launcher (kubevirt/kubevirt#16386, virtio-net specifically). Add a known-failures entry covering: - pre-upgrade: set workloadUpdateMethods=[] and suspend the kubevirt HelmRelease - post-upgrade: paced cold-restart of all running VMs (with an exclusion list for tenants who can't take the downtime window) - steady state: re-enable workloadUpdateMethods once the cluster is uniformly on the new launcher image Also add a SKILL.md red-flag row and a top-level "KubeVirt 1.6.x → 1.8.x special handling" note so the operator catches this before running helm upgrade. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

coderabbitai · 2026-04-28T21:36:15Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ab9f06b-5433-477c-99ed-b0a2f69f841f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/kubevirt-1.6-to-1.8-vm-restart

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request adds documentation and a detailed recovery procedure for KubeVirt upgrades from version 1.6.x to 1.8.x, addressing a known issue where live-migrations fail due to a QEMU version bump. The feedback suggests improving the provided bash scripts by using a generic default for the namespace exclusion list and adding a status filter to ensure only running pods are targeted during the phased cold-restart process.

gemini-code-assist · 2026-04-28T21:37:59Z

+```bash
+# 6. Build the worklist of VMIs to restart. Excludes any that the operator
+#    must leave alone (replace EXCLUDED_NS as needed).
+EXCLUDED_NS=tenant-edoors      # comma-separated if more than one; adjust grep below


The example value tenant-edoors is very specific and might be accidentally used if the user copy-pastes the block. It's better to provide an empty default. Also, the comment mentions "adjust grep below" but the implementation uses awk.

Suggested change

EXCLUDED_NS=tenant-edoors # comma-separated if more than one; adjust grep below

EXCLUDED_NS="" # comma-separated list of namespaces to exclude

gemini-code-assist · 2026-04-28T21:37:59Z

+  pod=$(kubectl -n "$ns" get pods -l kubevirt.io=virt-launcher,vm.kubevirt.io/name="$vmi" \
+    -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)


To ensure the script targets the active workload and avoids issues with pods in Terminating or Failed states (which might exist if a VM is undergoing issues), it's safer to filter for Running pods.

Suggested change

pod=$(kubectl -n "$ns" get pods -l kubevirt.io=virt-launcher,vm.kubevirt.io/name="$vmi" \

-o jsonpath='{.items[0].metadata.name}' 2>/dev/null)

pod=$(kubectl -n "$ns" get pods -l kubevirt.io=virt-launcher,vm.kubevirt.io/name="$vmi" \

--field-selector=status.phase=Running -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)

kvaps added documentation Improvements or additions to documentation do not merge Do not merge until linked dependency is resolved labels Apr 28, 2026

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(cozystack-upgrade): add KubeVirt 1.6→1.8 VM cold-restart workflow#7

docs(cozystack-upgrade): add KubeVirt 1.6→1.8 VM cold-restart workflow#7
kvaps wants to merge 1 commit intomainfrom
feat/kubevirt-1.6-to-1.8-vm-restart

kvaps commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	EXCLUDED_NS=tenant-edoors # comma-separated if more than one; adjust grep below
	EXCLUDED_NS="" # comma-separated list of namespaces to exclude

		pod=$(kubectl -n "$ns" get pods -l kubevirt.io=virt-launcher,vm.kubevirt.io/name="$vmi" \
		-o jsonpath='{.items[0].metadata.name}' 2>/dev/null)

Conversation

kvaps commented Apr 28, 2026

Summary

Changes

Why "do not merge"

Uh oh!

coderabbitai Bot commented Apr 28, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant