Protect critical WSL processes under heavy memory load with cgroup & isolate distro cgroups#40519
Open
chemwolf6922 wants to merge 30 commits into
Open
Conversation
Co-authored-by: Copilot <copilot@github.com>
…es-with-memory-cgroup
…es-with-memory-cgroup
…es-with-memory-cgroup
Contributor
Author
|
If we want this change. Please help determine the reserved memory size. 128M might be too much. |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to improve WSL2 reliability under heavy memory pressure by placing user/workload processes into a cgroup v2 with a memory cap, while keeping critical WSL system processes outside that cap to reduce the chance of catastrophic failures after OOM events.
Changes:
- Add cgroup path constants for a
wsl-usercgroup and itscgroup.procs/memory.maxcontrol files. - Create and configure the
wsl-usercgroup at mini_init startup, settingmemory.maxtototalram - 128MB. - Move key workload processes (session leaders, boot command, systemd-spawned workload) into
wsl-userby writing0tocgroup.procs.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/linux/init/util.h | Adds constants for the wsl-user cgroup paths. |
| src/linux/init/util.cpp | Moves create-process children into wsl-user cgroup before exec. |
| src/linux/init/main.cpp | Adds cgroup setup routine and invokes it after mounting cgroup2. |
| src/linux/init/init.cpp | Moves session leaders and systemd into wsl-user cgroup. |
| src/linux/init/config.cpp | Moves boot command into wsl-user cgroup. |
Comments suppressed due to low confidence (2)
src/linux/init/main.cpp:3944
- Enabling the memory controller in cgroup v2 via /sys/fs/cgroup/cgroup.subtree_control generally requires the parent cgroup to have no internal processes ("no internal process" rule). At this point mini_init is in the root cgroup, and wsl-user is created before enabling the controller, so the write is likely to fail (e.g., EBUSY) and the feature becomes a no-op. Consider creating a dedicated top-level cgroup hierarchy (e.g., move system processes into a sibling cgroup and leave the root empty), and enable the controller before creating/using children so the memory controller is actually available.
if (UtilMkdir(WSL_USER_CGROUP_PATH, 0755) < 0)
{
LOG_ERROR("Failed to create wsl-user cgroup directory {}", errno);
return;
}
if (WriteToFile(CGROUP_MOUNTPOINT "/cgroup.subtree_control", "+memory") < 0)
{
LOG_ERROR("Failed to enable memory controller {}", errno);
return;
}
src/linux/init/init.cpp:1265
- This cgroup move is relied on to ensure session leaders are subject to the memory cap, but the return value from WriteToFile is ignored. If it fails, the session leader will stay in the root cgroup and can still starve system processes. Consider checking the return and logging a warning/error (or propagating failure) so the protection isn't silently skipped.
"SessionLeader", [ListenSocket = std::move(ListenSocket), &Channel, &Config, Mask = Config.Umask, SocketAddress]() {
// Move session leader into the memory-limited user cgroup.
WriteToFile(WSL_USER_CGROUP_PROCS, "0");
…es-with-memory-cgroup
benhillis
reviewed
May 19, 2026
…es-with-memory-cgroup
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (2)
src/linux/init/init.cpp:1272
- This cgroup move is treated as non-critical but logs failures with LOG_ERROR. To avoid noisy/false errors, consider downgrading to LOG_WARNING/LOG_INFO and stating the impact (session leader not placed in wsl-user cgroup).
// Move session leader into the memory-limited user cgroup.
if (WriteToFile(WSL_USER_CGROUP_PROCS, "0") != 0)
{
// Non-critical.
LOG_ERROR("Failed to move session leader into user cgroup, {}", errno);
}
src/linux/init/init.cpp:2413
- This cgroup move is non-critical but uses LOG_ERROR on failure. Consider logging at warning/info level instead and clarifying that memory protection for systemd (and its subtree) is disabled if the move fails.
// Move systemd into the memory-limited user cgroup.
if (WriteToFile(WSL_USER_CGROUP_PROCS, "0") != 0)
{
// Non-critical.
LOG_ERROR("Failed to move systemd to user cgroup {}", errno);
}
…o user/chemwolf6922/protect-wsl-core-processes-with-memory-cgroup
…es-with-memory-cgroup
Comment on lines
+2245
to
+2247
| auto MiniInitDirectChildPidPath = std::filesystem::read_symlink(PROCFS_PATH "/self"); | ||
| pid_t MiniInitDirectChildPid = std::stoul(MiniInitDirectChildPidPath.string()); | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of the Pull Request
Issue 1:
When under heavy memory load, the critical WSL processes can fail due to various OOM failures. And may end up in an error state after the memory storm ended. Causing "Catastrophic Failures" afterwards.
Issue 2:
All distro's systemd instances shared the same sets of cgroup. This will cause conflicts. For example, when booting multiple wsl distros at the same time. Only one can create the systemd user session successfully.
Changes
This PR reorganize the wsl cgroup with the follow structure:
The wsl-user cgroup has memory.max and cpu.max set so it can only take max - 32MiB of RAM and max - 0.05 CPU cores. Effectively reserving 32MiB RAM and 0.05 cores for processes not under this cgroup.
Note
Since this uses cgroupv2. Enabling this will not allow the use of cgroupv1. A .wslconfig option IsolateDistroCgroup (default true) is added so users can opt-out of this and keep using cgorupv1.
PR Checklist
Detailed Description of the Pull Request / Additional comments
Validation Steps Performed
Issue 1
Validated by the OP of this issue: #40458 (comment)
The original issue is not deterministic and is hard to repro. Thus, no new test is added.
Issue 2
The test_distro won't repro the issue. No new test is added.
Validated manually with launching Ubuntu and Debian at the same time.
General
Add tests to validate the cgroup isolation works:
UnitTests::UnitTests::IsolatedCgroupLayout
UnitTests::UnitTests::IsolatedCgroupLayoutSystemd
UnitTests::UnitTests::IsolatedCgroupLayoutDisabled