Fix system.vhd loss during failed MSI upgrade#40524
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens WSL against unrecoverable failures when an MSI major upgrade fails mid-install by (1) moving old-product removal into the MSI transaction for rollback safety and (2) adding a dedicated, localized error path when required packaged VHD files (e.g., system.vhd, modules.vhd) are missing so users get an actionable message instead of a generic HCS failure.
Changes:
- Adjust WiX
MajorUpgradescheduling soRemoveExistingProductsruns inside the MSI transaction (rollback restores the previous install on failure). - Introduce
WSL_E_SYSTEM_DISTRO_MISSINGand a localized message for missing packaged files. - Add runtime existence checks in VM startup paths and wire the new HRESULT into common error-string handling.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| msipackage/package.wix.in | Schedules MajorUpgrade removal to occur inside the MSI transaction for rollback protection. |
| src/windows/service/inc/wslservice.idl | Adds new HRESULT WSL_E_SYSTEM_DISTRO_MISSING (0x33). |
| src/windows/service/exe/WslCoreVm.cpp | Replaces debug-only asserts with production checks that throw a user-facing localized error when packaged VHDs are missing. |
| src/windows/service/exe/HcsVirtualMachine.cpp | Adds packaged-file existence validation before attaching boot VHDs (currently without setting a user-facing message). |
| src/windows/common/wslutil.cpp | Adds the new HRESULT to common error mappings and returns a localized fallback string (currently hard-coded to system.vhd). |
| localization/strings/en-US/Resources.resw | Adds MessageSystemDistroMissing localized string resource. |
|
Thanks for investigating, would it be possible to try to root cause the issue instead of a band-aid? A slightly better error message isn’t going to help users that get into this state. |
2ad3626 to
05c4925
Compare
Move MajorUpgrade Schedule to afterInstallInitialize so RemoveExistingProducts runs inside the MSI transaction. On upgrade failure, the old product is restored instead of leaving files permanently deleted. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
05c4925 to
c1f0d2c
Compare
|
Apologies for the earlier noise — I've cleaned this up. The PR is now a single-line MSI fix (the root cause), no runtime checks. The previous defense-in-depth changes have been removed to keep this focused on what actually prevents the data loss. |
|
Hello! Could you please provide more logs to help us better diagnose your issue? To collect WSL logs, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt: The script will output the path of the log file once done. Once completed please upload the output files to this GitHub issue. See Collect WSL logs (recommended method). If you choose to email these logs instead of attaching them to the bug, please send them to wsl-gh-logs@microsoft.com with the GitHub issue number in the subject, and include a link to your GitHub issue comment in the message body. Thank you! |
ptrivedi
left a comment
There was a problem hiding this comment.
Seems to be a righteous fix targeted towards preventing data loss. It would be good to keep an eye out at figuring out why some of these installs fail as we look at more issues.
Fix: all installed files lost on failed MSI upgrade
Fixes #40488
Problem
MajorUpgradewith noScheduleattribute (our previous state) uses the WiX defaultafterInstallValidate, which removes the old product outside the MSI transaction. If the new install fails afterward, all old files (~30 files, ~1.1GB including system.vhd) are gone with no rollback.Even Worse: once files are lost, reinstalling does not recover them. Running
msiexec /iagain reports success but does nothing -- MSI thinks the product is already installed and skips all file operations. Recovery requiresmsiexec /fa(repair),REINSTALL=ALL, or full uninstall + reinstall -- none of which a typical user would know to try.Fix
Add
Schedule="afterInstallInitialize"-- this moves old product removal inside the transaction. On failure, MSI restores all files from.rbfbackups automatically. The unrecoverable state is never reached.Tradeoff: ~700MB extra temporary disk during upgrade (freed on commit). No other downsides identified -- all custom actions are guarded by
(not UPGRADINGPRODUCTCODE).Not in scope
Locked-file reboot-pending deletes (kernel-mode lock holders that
MSIRMSHUTDOWNcan't kill). That's a separate issue.Test results
afterInstallInitialize)How the fix works (observed via filesystem monitoring):
During upgrade, old files are atomically renamed to
.rbfbackups (not deleted). On failure, MSI restores them from the.rbffiles automatically.