Skip to content

Surface MSI reboot-required failures instead of silently dropping VHDs#40609

Draft
benhillis wants to merge 2 commits into
masterfrom
benhillis/reboot-required-marker
Draft

Surface MSI reboot-required failures instead of silently dropping VHDs#40609
benhillis wants to merge 2 commits into
masterfrom
benhillis/reboot-required-marker

Conversation

@benhillis
Copy link
Copy Markdown
Member

Summary of the Pull Request

Stop dropping ERROR_SUCCESS_REBOOT_REQUIRED (3010) on the floor when the embedded MSI inside the Microsoft Store MSIX can't replace a locked file (most commonly system.vhd while a WSL2 distro is running). Today this manifests to users as system.vhd disappearing after a Store update — Windows Installer has renamed it to C:\Windows\Installer\Config.Msi\*.rbf and scheduled the new file via MoveFileEx(MOVEFILE_DELAY_UNTIL_REBOOT), but wslinstaller silently turns 3010 into success, so the user is never told to reboot and just sees a broken WSL.

PR Checklist

  • Closes: Link to issue #xxx
  • Communication: Discussed offline
  • Tests: Added InstallerTests::UpgradeWithLockedFileReportsRebootRequired
  • Localization: New MessageUpdateRebootRequired string added with {Locked="WSL"} comment
  • Dev docs: N/A
  • Documentation updated: N/A

Detailed Description of the Pull Request / Additional comments

Defense-in-depth fix across three layers:

  1. WslInstaller::InstallMsipackageImpl no longer converts 3010 → 0. Propagates the real HRESULT and now keeps the MSI log on 3010 for diagnostics.
  2. New SetRebootRequiredMarker() / IsRebootRequired() in install.cpp write a volatile registry key (REG_OPTION_VOLATILE) at HKLM\Software\Microsoft\Windows\CurrentVersion\Lxss\MSI\RebootPending. The kernel auto-clears it on reboot, so there is no cleanup code path and no stuck-marker failure mode.
  3. LxssUserSession::_CreateInstance (the single funnel for all distro-starting service calls) checks the marker on entry and throws THROW_HR_WITH_USER_ERROR(HRESULT_FROM_WIN32(ERROR_SUCCESS_REBOOT_REQUIRED), MessageUpdateRebootRequired()). This catches every client path uniformly — lifted wsl.exe, MSI-installed wsl.exe, wslg.exe, bash.exe, VS Code Remote-WSL, Terminal, etc. — without any new IDL/ABI surface. CallMsiPackage also checks the marker as a belt-and-braces measure for the lifted client.

UX

During the Store auto-update: nothing visible (no toast, no prompt). On the user's next wsl <anything> invocation:

> wsl
WSL was updated but a system restart is required to complete the installation. Please reboot your machine and try again.
Error code: Wsl/Service/CreateInstance/0x80070bc2

After reboot: kernel destroys the volatile key, Windows pending-rename queue swaps the staged file into place, wsl works again. Zero cleanup code, no follow-up state to manage.

Commands that don't start a distro (wsl --version, wsl --update, wsl --shutdown, wsl --list) still work in this state — which is what we want, particularly for --update and --shutdown.

Out of scope (follow-ups)

  • Defer the silent update when WSL is actively in use so we don't have to reach this code path at all. Worth investigating as a follow-up.
  • Stop shipping system.vhd via MSI (architectural — generate/repair it from a sidecar at first-run instead). Would eliminate the class of bug entirely.

The fix in this PR is necessary as a backstop regardless of either follow-up: silent data-loss-shaped bugs are unacceptable.

Validation Steps Performed

  • New automated test InstallerTests::UpgradeWithLockedFileReportsRebootRequired passes end-to-end:
    • Uninstalls the MSI, places a dummy file at the install path, memory-maps it (MapViewOfFile — required to actually defeat SYSTEM-level renames; plain CreateFileW without FILE_SHARE_DELETE is not sufficient), fakes the version to be older than the MSIX-embedded MSI, then drives the install through InstallMsix()WslInstaller service.
    • Polls for the volatile marker, then verifies wsl echo OK fails with output containing "restart".
    • Cleans up (deletes marker, reinstalls MSI). End state is identical to start.
  • Full Windows build (cmake --build . -- -m) succeeds.
  • Manually verified the error code (0x80070bc2 = HRESULT_FROM_WIN32(ERROR_SUCCESS_REBOOT_REQUIRED)) is consistent across all paths in this change.

When the embedded MSI in the Microsoft Store MSIX cannot replace a locked
file (most commonly system.vhd, which is mounted whenever a WSL2 instance
is running), Windows Installer renames the existing file to
C:\Windows\Installer\Config.Msi\<hex>.rbf, schedules the new file via
MoveFileEx(MOVEFILE_DELAY_UNTIL_REBOOT), and returns
ERROR_SUCCESS_REBOOT_REQUIRED (3010).

InstallMsipackageImpl in wslinstaller silently converted 3010 to
ERROR_SUCCESS and returned success to its caller. The user was told
nothing; their next wsl invocation hit a now-empty C:\Program Files\WSL
install dir (system.vhd physically gone until reboot) and produced a
confusing "vhd missing" failure - perceived as data loss.

This change:

* Stops swallowing 3010 in InstallMsipackageImpl. The MSI log is now
  preserved on 3010 (previously deleted) to aid diagnostics.
* Sets a volatile registry marker
  (HKLM\Software\Microsoft\Windows\CurrentVersion\Lxss\MSI\RebootPending)
  using REG_OPTION_VOLATILE so the kernel auto-clears it on reboot. No
  cleanup path is needed; the marker is gone iff the user has rebooted.
* Adds a marker check in LxssUserSession::_CreateInstance (service
  side) which throws a localized user-facing error
  (MessageUpdateRebootRequired) any time a client tries to launch a
  distro while the reboot is pending. This catches all distro-launching
  client paths through the single service entry point: wsl.exe (lifted
  and MSI-installed), wslg.exe, bash.exe, VS Code Remote-WSL, etc.
* Also checks the marker on entry to CallMsiPackage and throws on 3010
  in WaitForMsiInstall, so the wsl --update / lifted-client paths
  surface the same error.

User-visible behavior:

  > wsl
  WSL was updated but a system restart is required to complete the
  installation. Please reboot your machine and try again.
  Error code: Wsl/Service/CreateInstance/0x80070bc2

The user reboots; the volatile key is destroyed by the kernel; Windows'
pending file-rename queue swaps the staged file into place; WSL works
again.

Adds an integration test, InstallerTests::UpgradeWithLockedFileReportsRebootRequired,
that exercises the full path: uninstalls the MSI, memory-maps a dummy
file at the install path to make it unrenameable, runs the MSIX
installer to drive the WslInstaller service, polls for the marker, then
verifies wsl echo OK fails with the expected message before cleaning up.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 20, 2026 20:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves resiliency and user-facing diagnostics when the embedded MSI upgrade returns ERROR_SUCCESS_REBOOT_REQUIRED (3010), ensuring WSL does not continue operating against a half-upgraded installation (e.g., missing/renamed system.vhd) and instead prompts the user to reboot.

Changes:

  • Propagate MSI exit code 3010 (instead of treating it as success) and persist MSI logs for post-mortem diagnostics.
  • Add a volatile registry “reboot required” marker and enforce it by blocking distro instance creation with a localized user error until reboot.
  • Add an end-to-end installer test and a new localized string for the reboot-required message.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/windows/InstallerTests.cpp Adds an end-to-end test that simulates a locked system.vhd upgrade and verifies reboot-required behavior is surfaced.
src/windows/wslinstaller/exe/WslInstaller.cpp Stops collapsing 3010 to success; sets reboot marker and retains MSI logs for 3010 cases.
src/windows/service/exe/LxssUserSession.cpp Blocks instance creation when the reboot-required marker is present, surfacing a clear user error.
src/windows/common/install.h Declares marker helper APIs (SetRebootRequiredMarker, IsRebootRequired).
src/windows/common/install.cpp Implements marker helpers and enforces reboot-required behavior during MSI package dispatch.
localization/strings/en-US/Resources.resw Adds MessageUpdateRebootRequired localized string.

Comment thread src/windows/common/install.cpp Outdated
Comment on lines +269 to +275
// If a previous MSI install returned ERROR_SUCCESS_REBOOT_REQUIRED, files like system.vhd
// are pending delayed-rename and the install directory is incomplete. Block early rather
// than launching against a broken install.
if (IsRebootRequired())
{
THROW_HR_WITH_USER_ERROR(HRESULT_FROM_WIN32(ERROR_SUCCESS_REBOOT_REQUIRED), wsl::shared::Localization::MessageUpdateRebootRequired());
}
…er on success

The original CallMsiPackage() early-throw blocked every MSIX-lifted wsl.exe invocation (including --version, --list, --shutdown, --update) regardless of whether it would touch the half-installed files. Remove the early throw and rely on the service-side _CreateInstance check, which already gates exactly the distro-launching paths that actually depend on the broken install dir.

Also add ClearRebootRequiredMarker() and call it from any MSI install path that completes with ERROR_SUCCESS, so a 'wsl --shutdown; wsl --update' flow can self-recover without requiring an actual reboot.

Extend the integration test to verify wsl --version still succeeds with the marker set, and that the marker is cleared after a clean reinstall.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@benhillis
Copy link
Copy Markdown
Member Author

Good catch — fixed in 02969ca. Removed the early IsRebootRequired() throw in CallMsiPackage(); the service-side check in _CreateInstance covers the distro-launching paths that actually depend on the half-installed files, and read-only / cleanup commands (--version, --shutdown, --list, --update) now work as expected.

Also added ClearRebootRequiredMarker() and wired it into the ERROR_SUCCESS paths of both the COM-driven and wsl --update install flows, so a user who shuts WSL down and re-runs the install can self-recover without an actual reboot.

Extended the integration test to verify wsl --version succeeds with the marker set and that the marker is cleared after a clean reinstall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants