Skip to content

Fix system.vhd loss during failed MSI upgrade#40524

Open
yeelam-gordon wants to merge 3 commits into
masterfrom
fix/system-vhd-rollback-and-checks
Open

Fix system.vhd loss during failed MSI upgrade#40524
yeelam-gordon wants to merge 3 commits into
masterfrom
fix/system-vhd-rollback-and-checks

Conversation

@yeelam-gordon
Copy link
Copy Markdown
Contributor

@yeelam-gordon yeelam-gordon commented May 13, 2026

Fix: all installed files lost on failed MSI upgrade

Fixes #40488

Problem

MajorUpgrade with no Schedule attribute (our previous state) uses the WiX default afterInstallValidate, which removes the old product outside the MSI transaction. If the new install fails afterward, all old files (~30 files, ~1.1GB including system.vhd) are gone with no rollback.

Even Worse: once files are lost, reinstalling does not recover them. Running msiexec /i again reports success but does nothing -- MSI thinks the product is already installed and skips all file operations. Recovery requires msiexec /fa (repair), REINSTALL=ALL, or full uninstall + reinstall -- none of which a typical user would know to try.

Fix

Add Schedule="afterInstallInitialize" -- this moves old product removal inside the transaction. On failure, MSI restores all files from .rbf backups automatically. The unrecoverable state is never reached.

Tradeoff: ~700MB extra temporary disk during upgrade (freed on commit). No other downsides identified -- all custom actions are guarded by (not UPGRADINGPRODUCTCODE).

Not in scope

Locked-file reboot-pending deletes (kernel-mode lock holders that MSIRMSHUTDOWN can't kill). That's a separate issue.

Test results

Scenario Without fix (default) With fix (afterInstallInitialize)
Upgrade fails mid-install ❌ WSL completely broken -- all files gone (system.vhd, wsl.exe, wslservice.exe, ...) ✅ All files automatically restored, WSL still works
User runs installer again to fix it ❌ Installer reports success but does nothing -- files still missing ✅ Not needed -- files were never lost

How the fix works (observed via filesystem monitoring):

During upgrade, old files are atomically renamed to .rbf backups (not deleted). On failure, MSI restores them from the .rbf files automatically.

Copilot AI review requested due to automatic review settings May 13, 2026 14:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens WSL against unrecoverable failures when an MSI major upgrade fails mid-install by (1) moving old-product removal into the MSI transaction for rollback safety and (2) adding a dedicated, localized error path when required packaged VHD files (e.g., system.vhd, modules.vhd) are missing so users get an actionable message instead of a generic HCS failure.

Changes:

  • Adjust WiX MajorUpgrade scheduling so RemoveExistingProducts runs inside the MSI transaction (rollback restores the previous install on failure).
  • Introduce WSL_E_SYSTEM_DISTRO_MISSING and a localized message for missing packaged files.
  • Add runtime existence checks in VM startup paths and wire the new HRESULT into common error-string handling.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
msipackage/package.wix.in Schedules MajorUpgrade removal to occur inside the MSI transaction for rollback protection.
src/windows/service/inc/wslservice.idl Adds new HRESULT WSL_E_SYSTEM_DISTRO_MISSING (0x33).
src/windows/service/exe/WslCoreVm.cpp Replaces debug-only asserts with production checks that throw a user-facing localized error when packaged VHDs are missing.
src/windows/service/exe/HcsVirtualMachine.cpp Adds packaged-file existence validation before attaching boot VHDs (currently without setting a user-facing message).
src/windows/common/wslutil.cpp Adds the new HRESULT to common error mappings and returns a localized fallback string (currently hard-coded to system.vhd).
localization/strings/en-US/Resources.resw Adds MessageSystemDistroMissing localized string resource.

Comment thread src/windows/service/exe/HcsVirtualMachine.cpp Outdated
Comment thread src/windows/common/wslutil.cpp Outdated
Comment thread src/windows/service/exe/WslCoreVm.cpp Outdated
@benhillis
Copy link
Copy Markdown
Member

Thanks for investigating, would it be possible to try to root cause the issue instead of a band-aid? A slightly better error message isn’t going to help users that get into this state.

@yeelam-gordon yeelam-gordon force-pushed the fix/system-vhd-rollback-and-checks branch 2 times, most recently from 2ad3626 to 05c4925 Compare May 14, 2026 03:37
@benhillis benhillis added msix Installer issue. file system labels May 17, 2026
Move MajorUpgrade Schedule to afterInstallInitialize so RemoveExistingProducts
runs inside the MSI transaction. On upgrade failure, the old product is restored
instead of leaving files permanently deleted.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 18, 2026 03:35
@yeelam-gordon yeelam-gordon force-pushed the fix/system-vhd-rollback-and-checks branch from 05c4925 to c1f0d2c Compare May 18, 2026 03:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@yeelam-gordon
Copy link
Copy Markdown
Contributor Author

Apologies for the earlier noise — I've cleaned this up. The PR is now a single-line MSI fix (the root cause), no runtime checks. The previous defense-in-depth changes have been removed to keep this focused on what actually prevents the data loss.

@yeelam-gordon yeelam-gordon marked this pull request as ready for review May 19, 2026 01:11
@yeelam-gordon yeelam-gordon requested a review from a team as a code owner May 19, 2026 01:11
Copilot AI review requested due to automatic review settings May 19, 2026 01:11
@microsoft-github-policy-service
Copy link
Copy Markdown
Contributor

Hello! Could you please provide more logs to help us better diagnose your issue?

To collect WSL logs, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The script will output the path of the log file once done.

Once completed please upload the output files to this GitHub issue.

See Collect WSL logs (recommended method).

If you choose to email these logs instead of attaching them to the bug, please send them to wsl-gh-logs@microsoft.com with the GitHub issue number in the subject, and include a link to your GitHub issue comment in the message body.

Thank you!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

@ptrivedi ptrivedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be a righteous fix targeted towards preventing data loss. It would be good to keep an eye out at figuring out why some of these installs fail as we look at more issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot start WSL: Wsl/Service/CreateInstance/CreateVm/MountVhd/HCS/ERROR_FILE_NOT_FOUND

4 participants