Skip to content

Allowlist mesa CVE-2026-40393 for PT 2.10 SM training (CPU + GPU cu130)#6265

Open
bhanutejagk wants to merge 7 commits into
aws:masterfrom
bhanutejagk:patch/pt-2.10-sm-allowlist-mesa-cve-2026-40393
Open

Allowlist mesa CVE-2026-40393 for PT 2.10 SM training (CPU + GPU cu130)#6265
bhanutejagk wants to merge 7 commits into
aws:masterfrom
bhanutejagk:patch/pt-2.10-sm-allowlist-mesa-cve-2026-40393

Conversation

@bhanutejagk

Copy link
Copy Markdown
Contributor

Adds an os_scan_allowlist entry for mesa CVE-2026-40393 (CVSS v3 9.8 CRITICAL, WebGPU OOB) to the PyTorch 2.10 SageMaker training CPU and GPU (cu130) allowlists. mesa is pulled in transitively (libgl1-mesa-glx for OpenCV/visualization deps); DLC training containers do not expose a WebGPU/rendering surface to untrusted content, so the vulnerable code path is not reachable. Awaiting upstream Ubuntu 22.04 fix.

Purpose

Test Plan

Test Result


Toggle if you are merging into master Branch

By default, docker image builds and tests are disabled. Two ways to run builds and tests:

  1. Using dlc_developer_config.toml
  2. Using this PR description (currently only supported for PyTorch, TensorFlow, vllm, and base images)
How to use the helper utility for updating dlc_developer_config.toml

Assuming your remote is called origin (you can find out more with git remote -v)...

  • Run default builds and tests for a particular buildspec - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -cp origin

  • Enable specific tests for a buildspec or set of buildspecs - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -t sanity_tests -cp origin

  • Restore TOML file when ready to merge

python src/prepare_dlc_dev_environment.py -rcp origin

NOTE: If you are creating a PR for a new framework version, please ensure success of the local, standard, rc, and efa sagemaker tests by updating the dlc_developer_config.toml file:

  • sagemaker_remote_tests = true
  • sagemaker_efa_tests = true
  • sagemaker_rc_tests = true
  • sagemaker_local_tests = true
How to use PR description Use the code block below to uncomment commands and run the PR CodeBuild jobs. There are two commands available:
  • # /buildspec <buildspec_path>
    • e.g.: # /buildspec pytorch/training/buildspec.yml
    • If this line is commented out, dlc_developer_config.toml will be used.
  • # /tests <test_list>
    • e.g.: # /tests sanity security ec2
    • If this line is commented out, it will run the default set of tests (same as the defaults in dlc_developer_config.toml): sanity, security, ec2, ecs, eks, sagemaker, sagemaker-local.
# /buildspec <buildspec_path>
# /tests <test_list>
Toggle if you are merging into main Branch

PR Checklist

  • [] I ran pre-commit run --all-files locally before creating this PR. (Read DEVELOPMENT.md for details).

Adds an os_scan_allowlist entry for mesa CVE-2026-40393 (CVSS v3 9.8
CRITICAL, WebGPU OOB) to the PyTorch 2.10 SageMaker training CPU and
GPU (cu130) allowlists. mesa is pulled in transitively (libgl1-mesa-glx
for OpenCV/visualization deps); DLC training containers do not expose
a WebGPU/rendering surface to untrusted content, so the vulnerable code
path is not reachable. Awaiting upstream Ubuntu 22.04 fix.
Bhanu Teja Goshikonda added 6 commits June 18, 2026 09:36
… allowlist verification

Sets build_frameworks=["pytorch"], build_training=true, build_inference=false,
do_build=true, points dlc-pr-pytorch-training at buildspec-2-10-sm.yml, and
turns on the full standard test suite (sanity, security with safety_check +
ecr_scan_allowlist, ecs, eks, ec2, sagemaker_local, sagemaker_remote) so the
mesa CVE-2026-40393 allowlist takes effect during PR CI. Heavy/opt-in suites
(sagemaker_efa, sagemaker_rc, sagemaker_benchmark, ec2_benchmark,
ec2_tests_on_heavy_instances, nightly_pr_test_mode) remain off.
Adds an os_scan_allowlist entry for mesa CVE-2026-40393 (CVSS v3 9.8
CRITICAL, WebGPU OOB) to the PyTorch 2.10 EC2 training CPU and GPU
(cu130) allowlists. mesa is pulled in transitively (libgl1-mesa-glx for
OpenCV/visualization deps); DLC training containers do not expose a
WebGPU/rendering surface to untrusted content, so the vulnerable code
path is not reachable. Awaiting upstream Ubuntu 22.04 fix.

flash_attn 2.8.3 CVE-2026-31253 (GPU only) is already allowlisted on
the GPU EC2 file from a prior change. Upstream has no patched 2.x
release; the CVE targets repo training scripts (checkpoint.py, eval.py)
and not files shipped inside the installed flash_attn wheel.
…ning py3

Adds two SFTY allowlist entries to IGNORE_SAFETY_IDS["pytorch"]["training"]["py3"]:

- SFTY-20250331-30014: torch DoS via torch.jit.script (CVE-2025-3000).
  Affects torch 2.10.0+cpu; no upstream fix available for the pinned version
  in this container.

- SFTY-20240604-95861: mlflow path traversal / arbitrary file read
  (CVE-2024-37058). Affects mlflow 3.14.0; no upstream fix, broad-spec
  advisory.
Add SFTY-20260511-67155 (CVE-2026-31253) to the PyTorch training py3
ignore list. The finding flags insecure torch.load(weights_only=False)
in flash-attn 2.8.3's checkpoint loader; there is no upstream fix
available. Observed on the GPU image where flash-attn is installed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant