Skip to content

Protect critical WSL processes under heavy memory load with cgroup & isolate distro cgroups#40519

Open
chemwolf6922 wants to merge 30 commits into
masterfrom
user/chemwolf6922/protect-wsl-core-processes-with-memory-cgroup
Open

Protect critical WSL processes under heavy memory load with cgroup & isolate distro cgroups#40519
chemwolf6922 wants to merge 30 commits into
masterfrom
user/chemwolf6922/protect-wsl-core-processes-with-memory-cgroup

Conversation

@chemwolf6922
Copy link
Copy Markdown
Contributor

@chemwolf6922 chemwolf6922 commented May 13, 2026

Summary of the Pull Request

Issue 1:

When under heavy memory load, the critical WSL processes can fail due to various OOM failures. And may end up in an error state after the memory storm ended. Causing "Catastrophic Failures" afterwards.

Issue 2:

All distro's systemd instances shared the same sets of cgroup. This will cause conflicts. For example, when booting multiple wsl distros at the same time. Only one can create the systemd user session successfully.

systemd[1023]: user@1000.service: Failed to attach to cgroup /user.slice/user-1000.slice/user@1000.service: Device or resource busy

Changes

This PR reorganize the wsl cgroup with the follow structure:

# When systemd is enabled
/                                                  # critical wsl processes live in root with no resource limit.
    --wsl-user                                     # resource limited cgroup
        -- non-distro                              # for processes not belonging to a distro, like plugins
        -- distro-<init pid>                       # per distro root
            -- non-systemd                         # for non systemd user processes
            -- systemd                             # for systemd and it's cgroup tree

# When systemd is not enabled
/                                                  # critical wsl processes live in root with no resource limit.
    --wsl-user                                     # resource limited cgroup
        -- non-distro                              # for processes not belonging to a distro, like plugins
        -- distro-<init pid>                       # per distro cgroup

The wsl-user cgroup has memory.max and cpu.max set so it can only take max - 32MiB of RAM and max - 0.05 CPU cores. Effectively reserving 32MiB RAM and 0.05 cores for processes not under this cgroup.

Note

Since this uses cgroupv2. Enabling this will not allow the use of cgroupv1. A .wslconfig option IsolateDistroCgroup (default true) is added so users can opt-out of this and keep using cgorupv1.

PR Checklist

Detailed Description of the Pull Request / Additional comments

Validation Steps Performed

Issue 1

Validated by the OP of this issue: #40458 (comment)
The original issue is not deterministic and is hard to repro. Thus, no new test is added.

Issue 2

The test_distro won't repro the issue. No new test is added.
Validated manually with launching Ubuntu and Debian at the same time.

General

Add tests to validate the cgroup isolation works:
UnitTests::UnitTests::IsolatedCgroupLayout
UnitTests::UnitTests::IsolatedCgroupLayoutSystemd
UnitTests::UnitTests::IsolatedCgroupLayoutDisabled

@chemwolf6922 chemwolf6922 requested a review from a team as a code owner May 13, 2026 05:59
Copilot AI review requested due to automatic review settings May 13, 2026 05:59
@chemwolf6922
Copy link
Copy Markdown
Contributor Author

If we want this change. Please help determine the reserved memory size. 128M might be too much.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve WSL2 reliability under heavy memory pressure by placing user/workload processes into a cgroup v2 with a memory cap, while keeping critical WSL system processes outside that cap to reduce the chance of catastrophic failures after OOM events.

Changes:

  • Add cgroup path constants for a wsl-user cgroup and its cgroup.procs / memory.max control files.
  • Create and configure the wsl-user cgroup at mini_init startup, setting memory.max to totalram - 128MB.
  • Move key workload processes (session leaders, boot command, systemd-spawned workload) into wsl-user by writing 0 to cgroup.procs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/linux/init/util.h Adds constants for the wsl-user cgroup paths.
src/linux/init/util.cpp Moves create-process children into wsl-user cgroup before exec.
src/linux/init/main.cpp Adds cgroup setup routine and invokes it after mounting cgroup2.
src/linux/init/init.cpp Moves session leaders and systemd into wsl-user cgroup.
src/linux/init/config.cpp Moves boot command into wsl-user cgroup.
Comments suppressed due to low confidence (2)

src/linux/init/main.cpp:3944

  • Enabling the memory controller in cgroup v2 via /sys/fs/cgroup/cgroup.subtree_control generally requires the parent cgroup to have no internal processes ("no internal process" rule). At this point mini_init is in the root cgroup, and wsl-user is created before enabling the controller, so the write is likely to fail (e.g., EBUSY) and the feature becomes a no-op. Consider creating a dedicated top-level cgroup hierarchy (e.g., move system processes into a sibling cgroup and leave the root empty), and enable the controller before creating/using children so the memory controller is actually available.
    if (UtilMkdir(WSL_USER_CGROUP_PATH, 0755) < 0)
    {
        LOG_ERROR("Failed to create wsl-user cgroup directory {}", errno);
        return;
    }

    if (WriteToFile(CGROUP_MOUNTPOINT "/cgroup.subtree_control", "+memory") < 0)
    {
        LOG_ERROR("Failed to enable memory controller {}", errno);
        return;
    }

src/linux/init/init.cpp:1265

  • This cgroup move is relied on to ensure session leaders are subject to the memory cap, but the return value from WriteToFile is ignored. If it fails, the session leader will stay in the root cgroup and can still starve system processes. Consider checking the return and logging a warning/error (or propagating failure) so the protection isn't silently skipped.
            "SessionLeader", [ListenSocket = std::move(ListenSocket), &Channel, &Config, Mask = Config.Umask, SocketAddress]() {
                // Move session leader into the memory-limited user cgroup.
                WriteToFile(WSL_USER_CGROUP_PROCS, "0");

Comment thread src/linux/init/main.cpp
Comment thread src/linux/init/main.cpp Outdated
Comment thread src/linux/init/util.cpp Outdated
Comment thread src/linux/init/init.cpp Outdated
Comment thread src/linux/init/init.cpp Outdated
Comment thread src/linux/init/config.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread src/linux/init/main.cpp Outdated
Comment thread src/linux/init/main.cpp Outdated
Copilot AI review requested due to automatic review settings May 20, 2026 08:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (2)

src/linux/init/init.cpp:1272

  • This cgroup move is treated as non-critical but logs failures with LOG_ERROR. To avoid noisy/false errors, consider downgrading to LOG_WARNING/LOG_INFO and stating the impact (session leader not placed in wsl-user cgroup).
                // Move session leader into the memory-limited user cgroup.
                if (WriteToFile(WSL_USER_CGROUP_PROCS, "0") != 0)
                {
                    // Non-critical.
                    LOG_ERROR("Failed to move session leader into user cgroup, {}", errno);
                }

src/linux/init/init.cpp:2413

  • This cgroup move is non-critical but uses LOG_ERROR on failure. Consider logging at warning/info level instead and clarifying that memory protection for systemd (and its subtree) is disabled if the move fails.
            // Move systemd into the memory-limited user cgroup.
            if (WriteToFile(WSL_USER_CGROUP_PROCS, "0") != 0)
            {
                // Non-critical.
                LOG_ERROR("Failed to move systemd to user cgroup {}", errno);
            }

Comment thread src/linux/init/util.cpp Outdated
Comment thread src/linux/init/init.cpp Outdated
Comment thread src/linux/init/config.cpp Outdated
Comment thread src/linux/init/main.cpp Outdated
Copilot AI review requested due to automatic review settings May 21, 2026 06:58
@chemwolf6922 chemwolf6922 changed the title Protect critical WSL processes under heavy memory load with cgroup Protect critical WSL processes under heavy memory load with cgroup & isolate distro cgroups May 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Comment thread src/linux/init/util.cpp
Comment thread src/linux/init/main.cpp
Comment thread src/linux/init/main.cpp Outdated
Comment thread src/linux/init/main.cpp Outdated
Comment thread src/linux/init/main.cpp Outdated
Comment thread src/linux/init/init.cpp
Copilot AI review requested due to automatic review settings May 21, 2026 09:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comment thread src/linux/init/util.h Outdated
Comment thread src/linux/init/main.cpp
Copilot AI review requested due to automatic review settings May 22, 2026 03:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Comment thread src/linux/init/init.cpp Outdated
Copilot AI review requested due to automatic review settings May 22, 2026 04:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Comment thread src/linux/init/main.cpp
Comment thread src/linux/init/init.cpp
Comment thread test/windows/UnitTests.cpp Outdated
Comment thread src/linux/init/util.cpp
Copilot AI review requested due to automatic review settings May 22, 2026 05:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Comment thread test/windows/UnitTests.cpp Outdated
Comment thread src/linux/init/main.cpp
Copilot AI review requested due to automatic review settings May 22, 2026 05:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Comment thread src/linux/init/main.cpp
Comment on lines +2245 to +2247
auto MiniInitDirectChildPidPath = std::filesystem::read_symlink(PROCFS_PATH "/self");
pid_t MiniInitDirectChildPid = std::stoul(MiniInitDirectChildPidPath.string());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants