Skip to content

fix(bwrap): default to deny-by-default filesystem (mirror seatbelt)#482

Open
caarlos0 wants to merge 10 commits into
microsoft:mainfrom
caarlos0:bwrap-deny-default
Open

fix(bwrap): default to deny-by-default filesystem (mirror seatbelt)#482
caarlos0 wants to merge 10 commits into
microsoft:mainfrom
caarlos0:bwrap-deny-default

Conversation

@caarlos0

@caarlos0 caarlos0 commented Jun 3, 2026

Copy link
Copy Markdown

📖 Description

Change the Bubblewrap backend's default filesystem posture from "host root mounted read-only" to deny-by-default, matching the macOS Seatbelt backend's (deny default) baseline.

Why

bwrap_command::build_args used to emit:

args.extend(["--ro-bind".into(), "/".into(), "/".into()]);

That bind-mounted the entire host root read-only into every sandbox, so the caller's $HOME/.aws/credentials, $HOME/.ssh/id_*, browser cookies, etc. were readable inside the sandbox by default. The Seatbelt backend on macOS starts from (deny default) and only allows narrow system paths (SYSTEM_READ_ALLOW in src/backends/seatbelt/common/src/profile_builder.rs), so the two backends had a meaningful asymmetry in the confidentiality guarantees they offered. This PR closes that gap.

What changes

New baseline (BASELINE_RO_BIND_PATHS) — mirrors seatbelt's SYSTEM_READ_ALLOW:

  • Top-level dirs: /bin, /sbin, /lib, /lib32, /lib64, /libx32 (symlinks under /usr on merged-usr distros; bwrap follows source-side symlinks so both real-dir and symlinked distros work).
  • /usr subpaths: /usr/bin, /usr/sbin, /usr/lib, /usr/lib32, /usr/lib64, /usr/libexec, /usr/share — deliberately not /usr wholesale, so /usr/local is not implicitly exposed.
  • /etc — whole, like seatbelt's /private/etc. Files with restrictive perms (/etc/shadow, /etc/sudoers, /etc/ssh/ssh_host_*_key) stay unreadable to a non-root caller because user-namespace UID mapping does not bypass kernel DAC.
  • DNS stub-resolver dirs: /run/systemd/resolve, /run/NetworkManager, /run/resolvconf — needed when /etc/resolv.conf is a symlink. Narrow subpaths so /run/user/<uid> (D-Bus session, keyring, ssh-agent sockets) stays hidden.
  • /etc/resolv.conf symlinks outside /run: also synthesise a /var/run -> /run compat symlink (for /var/run/...-routed targets — older RHEL/CentOS-era and some container images) and --ro-bind-try /mnt/wsl/resolv.conf (for WSL), so DNS keeps working under deny-by-default without exposing host /var or /mnt contents.

All emitted via --ro-bind-try so missing paths are silently skipped (e.g. /lib32 on x86_64-only systems, /run/systemd/resolve on hosts without systemd-resolved).

What disappears from sandbox by default

$HOME, /root, /home/*, /opt, /srv, /mnt, /media, /var, /sys, /usr/local, /run/user/<uid>, /run/dbus. Callers who legitimately need any of these must list them under readonlyPaths or readwritePaths.

What's preserved

  • readwritePaths / readonlyPaths / deniedPaths semantics — unchanged.
  • --unshare-* flags, network policy handling, proxy env-var injection, working-dir, env clearing — unchanged.
  • Standard --dev /dev / --proc /proc / --tmpfs /tmp overlay — unchanged.

Drive-by build fix

The second commit (fix(nanvix): compile as build-dep from non-Linux/Windows hosts) adds empty/zero fallbacks for REQUIRED_BINARIES and NANVIXD_BINARY so nanvix_common compiles on macOS hosts when pulled in as a [build-dependency] of lxc / wxc during cross-compile. Zero runtime impact on supported platforms — the consuming build scripts already gate the surrounding logic behind cfg(target_os = "linux"/"windows") and feature = "microvm". Separated out so it can be reviewed (or split into its own PR) independently.

Breaking change for users

This is a behavior change. Configs that implicitly relied on $HOME (or /opt, /var, /usr/local, …) being readable will start failing. The migration is to list the directory in readonlyPaths:

{
  "filesystem": {
    "readonlyPaths": ["/home/alice/project", "/usr/local"]
  }
}

Documented in the updated "How It Works → Deny-by-default filesystem" and "Limitations" sections of docs/bwrap-support/bubblewrap-backend.md.

🔗 References

No tracking issue — this came out of a direct comparison between the seatbelt and bwrap baselines while reviewing the two unprivileged backends.

Related follow-up (out of scope for this PR):

🔍 Validation

Unit tests (cargo test -p bwrap_common from src/) — 25/25 pass, including new regression tests covering the new contract:

  • baseline_does_not_bind_mount_host_root — regression test for the old --ro-bind / / default.
  • baseline_emits_required_ro_bind_try_paths/bin, /sbin, /lib, /lib64, /usr/bin, /usr/lib, /usr/share, /etc all emitted.
  • baseline_does_not_expose_usr_local — no --ro-bind /usr /usr and no explicit /usr/local entry.
  • baseline_excludes_confidential_paths — no /home, /root, /opt, /srv, /var, /sys, /run/user, /run/dbus bind-mounts.
  • baseline_includes_dns_stub_resolver_dirs — all three DNS dirs emitted via --ro-bind-try.
  • baseline_mounts_precede_policy_mounts — policy mounts can still shadow baseline.
  • baseline_recreates_var_run_compat_symlink — emits --symlink /run /var/run (and never binds host /var) so /var/run/...-routed resolv.conf symlinks resolve.
  • baseline_includes_wsl_resolv_conf — emits --ro-bind-try /mnt/wsl/resolv.conf (and never exposes /mnt wholesale) so WSL DNS works.

Plus updated filesystem_policy_produces_correct_mounts to match the new contract (a bare --ro-bind /data /data is now unambiguously the policy mount).

Lint / formatcargo clippy -p bwrap_common --all-targets -- -D warnings clean, cargo fmt --all -- --check clean.

Bubblewrap behavior — verified empirically against bwrap 0.8.0 that the /var/run -> /run symlink makes /var/run/NetworkManager/resolv.conf resolve into the bound /run/NetworkManager, that a WSL-style /etc/resolv.conf -> /mnt/wsl/resolv.conf is readable, and that the control case (no symlink) fails — reproducing the original gap.

Linux VM verification — cross-compiled lxc-exec for aarch64-unknown-linux-gnu and ran a 6-config smoke suite on a Linux VM (see src/target/vm-test-bundle/ locally — gitignored). The suite plants TOP_SECRET=hunter2 in /home/SENTINEL_DO_NOT_LEAK.txt on the host and verifies the secret does not appear in sandbox output without an explicit readonlyPaths: ["/home"], then verifies the opt-in does expose it. Also covers /opt//var//sys//root//usr/local being hidden, DNS resolution working with network allowed, and /etc/shadow staying unreadable via DAC. (Will paste the run output as a PR comment once the VM run is complete.)

✅ Checklist

📋 Issue Type

  • Bug fix
  • Feature
  • Task
Microsoft Reviewers: Open in CodeFlow

caarlos0 and others added 2 commits June 3, 2026 09:39
The Bubblewrap backend used to bind-mount the entire host root read-only
into every sandbox (`--ro-bind / /`), so the caller's $HOME, /root,
/opt, /var/sys, /run/user/<uid>, and everything else readable by the
calling uid was visible inside the sandbox by default. The macOS Seatbelt
backend, by contrast, starts from `(deny default)` and only allows a
narrow system baseline -- bwrap now matches that posture.

The new baseline (`BASELINE_RO_BIND_PATHS`) mirrors seatbelt's
`SYSTEM_READ_ALLOW` allowlist: top-level executable/library dirs
(/bin, /sbin, /lib*), the /usr subpaths that seatbelt allows (without
/usr/local), /etc, and the DNS stub-resolver directories under /run
(/run/systemd/resolve, /run/NetworkManager, /run/resolvconf) so
/etc/resolv.conf symlinks still resolve when network is allowed.
$HOME, /opt, /usr/local, /var, /sys, and /run/user/<uid> are no
longer visible until the caller opts in via `readonlyPaths` /
`readwritePaths`.

Paths are emitted via `--ro-bind-try` so missing entries are silently
skipped (e.g. /lib32 on x86_64-only systems, /run/systemd/resolve on
hosts without systemd-resolved).

Files in /etc with restrictive perms (/etc/shadow, /etc/sudoers,
/etc/ssh/ssh_host_*_key) remain unreadable to a non-root caller even
though /etc is bound whole -- user-namespace UID mapping does not
bypass kernel DAC.

Updated the existing `filesystem_policy_produces_correct_mounts` test
and added 5 new tests covering the new contract (no host-root bind,
required baseline paths emitted, /usr/local not exposed, confidential
paths excluded, DNS dirs included, baseline precedes policy mounts).

Docs in docs/bwrap-support/bubblewrap-backend.md updated accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
`nanvix_common` is a `[build-dependency]` of `lxc` and `wxc`. Build
deps are compiled for the host, so cross-compiling lxc-exec from macOS
to aarch64-unknown-linux-gnu pulled nanvix_common into a host build
where `target_os` was neither "windows" nor "linux" -- the
`REQUIRED_BINARIES` and `NANVIXD_BINARY` constants then had no
definition and the crate failed to compile.

Add empty/zero fallbacks for non-Windows/Linux hosts. The empty slice
is correct because:
- NanVix only runs on Windows and Linux, so iterating `REQUIRED_BINARIES`
  on other hosts must be a no-op.
- The consuming build scripts (e.g. `src/core/lxc/build.rs`) already
  gate the surrounding logic behind `cfg(target_os = "linux")` and
  `feature = "microvm"`, so the fallback values are never reached
  in practice.

Zero runtime impact on supported platforms; pure build-time
portability fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 3, 2026 12:40
@caarlos0 caarlos0 requested a review from a team as a code owner June 3, 2026 12:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR tightens the Bubblewrap backend’s default filesystem exposure by switching from a full host-root bind mount to a minimal allowlist baseline, adds regression tests for the new deny-by-default posture, and updates docs accordingly. It also adds NanVix constant fallbacks so the NanVix common crate can compile on non-Windows/Linux hosts when used as a build dependency.

Changes:

  • Bubblewrap: replace --ro-bind / / with a minimal baseline set of --ro-bind-try mounts and add targeted regression tests.
  • NanVix: add non-Windows/Linux fallbacks for REQUIRED_BINARIES and NANVIXD_BINARY to support host builds on macOS/BSD.
  • Docs: document the Bubblewrap deny-by-default filesystem model and its consequences.
Show a summary per file
File Description
src/backends/nanvix/common/src/lib.rs Adds non-Windows/Linux fallbacks for NanVix host-compiled constants to keep builds working when cross-compiling.
src/backends/bubblewrap/common/src/bwrap_command.rs Introduces a minimal baseline allowlist (deny-by-default) via --ro-bind-try and expands/updates tests.
docs/bwrap-support/bubblewrap-backend.md Documents the new baseline filesystem behavior and user-facing implications.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 5

Comment thread src/backends/nanvix/common/src/lib.rs Outdated
Comment thread src/backends/bubblewrap/common/src/bwrap_command.rs
Comment thread src/backends/bubblewrap/common/src/bwrap_command.rs Outdated
Comment thread src/backends/bubblewrap/common/src/bwrap_command.rs Outdated
Comment thread src/backends/bubblewrap/common/src/bwrap_command.rs Outdated
@bbonaby

bbonaby commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Comment thread docs/bwrap-support/bubblewrap-backend.md
Comment thread docs/bwrap-support/bubblewrap-backend.md
Comment thread src/backends/bubblewrap/common/src/bwrap_command.rs
caarlos0 and others added 2 commits June 9, 2026 08:35
- nanvix: use a descriptive sentinel for NANVIXD_BINARY on unsupported
  hosts instead of an empty string, so any accidental Command use fails
  with a named program rather than an empty one.
- bwrap: soften "mirrors seatbelt's baseline exactly" comment to
  "aligned with" to avoid implying exact, lasting parity.
- bwrap test: drop the brittle `assert!(ro_pos > 0)` — the preceding
  `.expect(...)` already guarantees the mount exists.
- bwrap test: restrict the /usr/local check to mount-argument windows so
  a script body mentioning /usr/local cannot cause a false positive.
- docs: note the deny-by-default baseline requires bwrap 0.3.0+ for
  `--ro-bind-try`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
@bbonaby

bbonaby commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

The previous MXC-PR-Build run (149353501) failed only on the Linux 1ES
agents with transient network errors in the same time window:
- SDK Unit Tests (linux): "The SSL connection could not be established"
- x64/arm64 LXC builds: cargo exited 101 (dependency fetch failure)

All equivalent Windows/macOS jobs passed, and the exact Linux build/test
commands plus the SDK unit tests reproduce cleanly and pass locally, so
the branch changes are not the cause. Empty commit to re-run CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
@caarlos0

Copy link
Copy Markdown
Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Commenter does not have sufficient privileges for PR 482 in repo microsoft/mxc

caarlos0 added 2 commits June 17, 2026 11:08
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
Comment on lines +79 to +85
// DNS stub-resolver directories. /etc/resolv.conf is usually a
// symlink into one of these on modern Linux distros (systemd-resolved
// / NetworkManager / resolvconf). We bind the narrow subdirectories
// rather than all of /run to avoid exposing /run/user/<uid>.
"/run/systemd/resolve",
"/run/NetworkManager",
"/run/resolvconf",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Deny-by-default can silently break DNS when /etc/resolv.conf is a symlink to a path outside the baseline

Severity: Medium (the failure is silent and cryptic, which is what keeps it from Low)

Mechanism

/etc is bound whole, so a /etc/resolv.conf symlink is preserved as a symlink — bwrap doesn't dereference it. When a sandboxed process opens it, the kernel resolves the target inside the sandbox mount namespace, where only the baseline paths exist. Because /var and /mnt are never mounted, any resolv.conf target routed through them is a dangling link → glibc falls back to nameserver 127.0.0.1 (usually nothing listening) → name resolution fails with no obvious cause. The sandbox still starts successfully, so this surfaces as a confusing "DNS doesn't work" rather than a clear error.

A subtle trap: binding the canonical /run/NetworkManager here does not rescue a link written as /var/run/NetworkManager/resolv.conf, because the intermediate /var/run compat symlink lives in the unmounted /var.

What actually works vs. breaks

The common modern configs are fine (the baseline was clearly designed around them):

Config /etc/resolv.conf Result
systemd-resolved (Ubuntu 18.04+, most cloud images) /run/systemd/resolve/stub-resolv.conf ✅ works (/run/systemd/resolve bound)
resolvconf /run/resolvconf/resolv.conf ✅ works (/run/resolvconf bound)
NetworkManager default mode real file written into /etc ✅ works (inside bound /etc)
NM symlink mode, canonical /run/NetworkManager/resolv.conf ✅ works (/run/NetworkManager bound)
static /etc/resolv.conf real file ✅ works
WSL /mnt/wsl/resolv.conf ❌ breaks (/mnt unmounted)
/var/run-routed (older RHEL/CentOS-era) /var/run/NetworkManager/resolv.conf ❌ breaks (/var unmounted)
admin-custom target outside the 17 baseline paths → e.g. /etc/dns/resolv.conf on a non-bound mount ❌ breaks

So this is a real but minority break, not the broad regression it might first appear — the dominant systemd-resolved / resolvconf / NM-default paths are all covered. Worth noting the PR's VM smoke test ("DNS working with network allowed") almost certainly ran on a systemd-resolved host, i.e. the case that does work, which is why the gap wasn't surfaced.

Suggested fix (cheap, and worth doing regardless):

  • Preferred: at runtime, readlink /etc/resolv.conf and --ro-bind-try its canonical target before emitting the baseline.
  • Or statically: add --symlink run /var/run (rescues the entire /var/run/... family without exposing /var's contents) plus /mnt/wsl/resolv.conf to the baseline.

Either way, a regression test that points /etc/resolv.conf at a /var/run/... target and asserts the resolved target ends up reachable would lock this in.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9ae2486 — went with the static option since build_args is deliberately pure/platform-agnostic and unit-testable on every host:

  • /var/run/...-routed targets: synthesise a --symlink /run /var/run compat symlink. /var/run/NetworkManager/resolv.conf (and the rest of the /var/run/... family) now resolve into the already-bound /run/* DNS dirs. bwrap synthesises an empty /var for the symlink, so no host /var contents are exposed.
  • WSL: --ro-bind-try /mnt/wsl/resolv.conf (single file, skipped on non-WSL hosts — no /mnt exposure).

Added two regression tests (baseline_recreates_var_run_compat_symlink, baseline_includes_wsl_resolv_conf) and updated the backend docs. Verified the symlink + bind resolution empirically against bwrap 0.8.0 (and confirmed the control case fails without the symlink).

Truly custom out-of-baseline targets (e.g. an admin-set /etc/dns/resolv.conf on an unbound mount) still need a readonlyPaths entry; that residual caveat is now documented.

@microsoft-github-policy-service microsoft-github-policy-service Bot added the Needs-Author-Feedback Issue needs attention from issue or PR author label Jun 18, 2026
… baseline

The deny-by-default baseline never mounts /var or /mnt, so an
/etc/resolv.conf symlink routed through /var/run/... (older RHEL/CentOS,
some container images) or /mnt/wsl/resolv.conf (WSL) would dangle inside
the sandbox and silently break name resolution.

Cover the two common out-of-baseline targets without exposing host /var
or /mnt contents:
- synthesise a `/var/run -> /run` compat symlink so /var/run/...-routed
  resolv.conf targets resolve into the already-bound /run/* DNS dirs;
- `--ro-bind-try` /mnt/wsl/resolv.conf so WSL DNS works (skipped on
  non-WSL hosts).

Add regression tests for both and update the backend docs. Verified the
symlink/bind behavior empirically with bwrap 0.8.0.

Addresses review feedback from @MGudgin on PR microsoft#482.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
@microsoft-github-policy-service microsoft-github-policy-service Bot added Needs-Attention Issue needs attention from Microsoft and removed Needs-Author-Feedback Issue needs attention from issue or PR author labels Jun 19, 2026
@caarlos0 caarlos0 requested a review from MGudgin June 19, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs-Attention Issue needs attention from Microsoft

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants