Surface MSI reboot-required failures instead of silently dropping VHDs#40609
Surface MSI reboot-required failures instead of silently dropping VHDs#40609benhillis wants to merge 2 commits into
Conversation
When the embedded MSI in the Microsoft Store MSIX cannot replace a locked file (most commonly system.vhd, which is mounted whenever a WSL2 instance is running), Windows Installer renames the existing file to C:\Windows\Installer\Config.Msi\<hex>.rbf, schedules the new file via MoveFileEx(MOVEFILE_DELAY_UNTIL_REBOOT), and returns ERROR_SUCCESS_REBOOT_REQUIRED (3010). InstallMsipackageImpl in wslinstaller silently converted 3010 to ERROR_SUCCESS and returned success to its caller. The user was told nothing; their next wsl invocation hit a now-empty C:\Program Files\WSL install dir (system.vhd physically gone until reboot) and produced a confusing "vhd missing" failure - perceived as data loss. This change: * Stops swallowing 3010 in InstallMsipackageImpl. The MSI log is now preserved on 3010 (previously deleted) to aid diagnostics. * Sets a volatile registry marker (HKLM\Software\Microsoft\Windows\CurrentVersion\Lxss\MSI\RebootPending) using REG_OPTION_VOLATILE so the kernel auto-clears it on reboot. No cleanup path is needed; the marker is gone iff the user has rebooted. * Adds a marker check in LxssUserSession::_CreateInstance (service side) which throws a localized user-facing error (MessageUpdateRebootRequired) any time a client tries to launch a distro while the reboot is pending. This catches all distro-launching client paths through the single service entry point: wsl.exe (lifted and MSI-installed), wslg.exe, bash.exe, VS Code Remote-WSL, etc. * Also checks the marker on entry to CallMsiPackage and throws on 3010 in WaitForMsiInstall, so the wsl --update / lifted-client paths surface the same error. User-visible behavior: > wsl WSL was updated but a system restart is required to complete the installation. Please reboot your machine and try again. Error code: Wsl/Service/CreateInstance/0x80070bc2 The user reboots; the volatile key is destroyed by the kernel; Windows' pending file-rename queue swaps the staged file into place; WSL works again. Adds an integration test, InstallerTests::UpgradeWithLockedFileReportsRebootRequired, that exercises the full path: uninstalls the MSI, memory-maps a dummy file at the install path to make it unrenameable, runs the MSIX installer to drive the WslInstaller service, polls for the marker, then verifies wsl echo OK fails with the expected message before cleaning up. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR improves resiliency and user-facing diagnostics when the embedded MSI upgrade returns ERROR_SUCCESS_REBOOT_REQUIRED (3010), ensuring WSL does not continue operating against a half-upgraded installation (e.g., missing/renamed system.vhd) and instead prompts the user to reboot.
Changes:
- Propagate MSI exit code 3010 (instead of treating it as success) and persist MSI logs for post-mortem diagnostics.
- Add a volatile registry “reboot required” marker and enforce it by blocking distro instance creation with a localized user error until reboot.
- Add an end-to-end installer test and a new localized string for the reboot-required message.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/windows/InstallerTests.cpp | Adds an end-to-end test that simulates a locked system.vhd upgrade and verifies reboot-required behavior is surfaced. |
| src/windows/wslinstaller/exe/WslInstaller.cpp | Stops collapsing 3010 to success; sets reboot marker and retains MSI logs for 3010 cases. |
| src/windows/service/exe/LxssUserSession.cpp | Blocks instance creation when the reboot-required marker is present, surfacing a clear user error. |
| src/windows/common/install.h | Declares marker helper APIs (SetRebootRequiredMarker, IsRebootRequired). |
| src/windows/common/install.cpp | Implements marker helpers and enforces reboot-required behavior during MSI package dispatch. |
| localization/strings/en-US/Resources.resw | Adds MessageUpdateRebootRequired localized string. |
| // If a previous MSI install returned ERROR_SUCCESS_REBOOT_REQUIRED, files like system.vhd | ||
| // are pending delayed-rename and the install directory is incomplete. Block early rather | ||
| // than launching against a broken install. | ||
| if (IsRebootRequired()) | ||
| { | ||
| THROW_HR_WITH_USER_ERROR(HRESULT_FROM_WIN32(ERROR_SUCCESS_REBOOT_REQUIRED), wsl::shared::Localization::MessageUpdateRebootRequired()); | ||
| } |
…er on success The original CallMsiPackage() early-throw blocked every MSIX-lifted wsl.exe invocation (including --version, --list, --shutdown, --update) regardless of whether it would touch the half-installed files. Remove the early throw and rely on the service-side _CreateInstance check, which already gates exactly the distro-launching paths that actually depend on the broken install dir. Also add ClearRebootRequiredMarker() and call it from any MSI install path that completes with ERROR_SUCCESS, so a 'wsl --shutdown; wsl --update' flow can self-recover without requiring an actual reboot. Extend the integration test to verify wsl --version still succeeds with the marker set, and that the marker is cleared after a clean reinstall. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Good catch — fixed in 02969ca. Removed the early IsRebootRequired() throw in CallMsiPackage(); the service-side check in _CreateInstance covers the distro-launching paths that actually depend on the half-installed files, and read-only / cleanup commands (--version, --shutdown, --list, --update) now work as expected. Also added ClearRebootRequiredMarker() and wired it into the ERROR_SUCCESS paths of both the COM-driven and wsl --update install flows, so a user who shuts WSL down and re-runs the install can self-recover without an actual reboot. Extended the integration test to verify wsl --version succeeds with the marker set and that the marker is cleared after a clean reinstall. |
Summary of the Pull Request
Stop dropping
ERROR_SUCCESS_REBOOT_REQUIRED(3010) on the floor when the embedded MSI inside the Microsoft Store MSIX can't replace a locked file (most commonlysystem.vhdwhile a WSL2 distro is running). Today this manifests to users assystem.vhddisappearing after a Store update — Windows Installer has renamed it toC:\Windows\Installer\Config.Msi\*.rbfand scheduled the new file viaMoveFileEx(MOVEFILE_DELAY_UNTIL_REBOOT), butwslinstallersilently turns 3010 into success, so the user is never told to reboot and just sees a broken WSL.PR Checklist
InstallerTests::UpgradeWithLockedFileReportsRebootRequiredMessageUpdateRebootRequiredstring added with{Locked="WSL"}commentDetailed Description of the Pull Request / Additional comments
Defense-in-depth fix across three layers:
WslInstaller::InstallMsipackageImplno longer converts 3010 → 0. Propagates the real HRESULT and now keeps the MSI log on 3010 for diagnostics.SetRebootRequiredMarker()/IsRebootRequired()ininstall.cppwrite a volatile registry key (REG_OPTION_VOLATILE) atHKLM\Software\Microsoft\Windows\CurrentVersion\Lxss\MSI\RebootPending. The kernel auto-clears it on reboot, so there is no cleanup code path and no stuck-marker failure mode.LxssUserSession::_CreateInstance(the single funnel for all distro-starting service calls) checks the marker on entry and throwsTHROW_HR_WITH_USER_ERROR(HRESULT_FROM_WIN32(ERROR_SUCCESS_REBOOT_REQUIRED), MessageUpdateRebootRequired()). This catches every client path uniformly — liftedwsl.exe, MSI-installedwsl.exe,wslg.exe,bash.exe, VS Code Remote-WSL, Terminal, etc. — without any new IDL/ABI surface.CallMsiPackagealso checks the marker as a belt-and-braces measure for the lifted client.UX
During the Store auto-update: nothing visible (no toast, no prompt). On the user's next
wsl <anything>invocation:After reboot: kernel destroys the volatile key, Windows pending-rename queue swaps the staged file into place,
wslworks again. Zero cleanup code, no follow-up state to manage.Commands that don't start a distro (
wsl --version,wsl --update,wsl --shutdown,wsl --list) still work in this state — which is what we want, particularly for--updateand--shutdown.Out of scope (follow-ups)
system.vhdvia MSI (architectural — generate/repair it from a sidecar at first-run instead). Would eliminate the class of bug entirely.The fix in this PR is necessary as a backstop regardless of either follow-up: silent data-loss-shaped bugs are unacceptable.
Validation Steps Performed
InstallerTests::UpgradeWithLockedFileReportsRebootRequiredpasses end-to-end:MapViewOfFile— required to actually defeat SYSTEM-level renames; plainCreateFileWwithoutFILE_SHARE_DELETEis not sufficient), fakes the version to be older than the MSIX-embedded MSI, then drives the install throughInstallMsix()→WslInstallerservice.wsl echo OKfails with output containing "restart".cmake --build . -- -m) succeeds.0x80070bc2=HRESULT_FROM_WIN32(ERROR_SUCCESS_REBOOT_REQUIRED)) is consistent across all paths in this change.