Skip to content

fix(pytorch-2.10-ec2): allowlist mesa CVE-2026-40393 for training images#6271

Open
bhanutejagk wants to merge 4 commits into
aws:masterfrom
bhanutejagk:patch/pt210-ec2-mesa-cve-2026-40393
Open

fix(pytorch-2.10-ec2): allowlist mesa CVE-2026-40393 for training images#6271
bhanutejagk wants to merge 4 commits into
aws:masterfrom
bhanutejagk:patch/pt210-ec2-mesa-cve-2026-40393

Conversation

@bhanutejagk

Copy link
Copy Markdown
Contributor

Adds an os_scan_allowlist entry for mesa CVE-2026-40393 (CVSS v3 9.8 CRITICAL, WebGPU OOB) to the PyTorch 2.10 EC2 training CPU and GPU (cu130) allowlists. mesa is pulled in transitively via libgl1-mesa-glx; training containers do not expose a WebGPU/rendering surface to untrusted content, so the vulnerable code path is not reachable. Awaiting patched mesa in Ubuntu 22.04 (upstream fix in mesa 25.3.6 / 26.0.1).

Also scopes dlc_developer_config.toml to the PyTorch training EC2 buildspec only, enables sanity/security/EC2 tests, and disables SageMaker test suites for this verification PR.

Purpose

Test Plan

Test Result


Toggle if you are merging into master Branch

By default, docker image builds and tests are disabled. Two ways to run builds and tests:

  1. Using dlc_developer_config.toml
  2. Using this PR description (currently only supported for PyTorch, TensorFlow, vllm, and base images)
How to use the helper utility for updating dlc_developer_config.toml

Assuming your remote is called origin (you can find out more with git remote -v)...

  • Run default builds and tests for a particular buildspec - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -cp origin

  • Enable specific tests for a buildspec or set of buildspecs - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -t sanity_tests -cp origin

  • Restore TOML file when ready to merge

python src/prepare_dlc_dev_environment.py -rcp origin

NOTE: If you are creating a PR for a new framework version, please ensure success of the local, standard, rc, and efa sagemaker tests by updating the dlc_developer_config.toml file:

  • sagemaker_remote_tests = true
  • sagemaker_efa_tests = true
  • sagemaker_rc_tests = true
  • sagemaker_local_tests = true
How to use PR description Use the code block below to uncomment commands and run the PR CodeBuild jobs. There are two commands available:
  • # /buildspec <buildspec_path>
    • e.g.: # /buildspec pytorch/training/buildspec.yml
    • If this line is commented out, dlc_developer_config.toml will be used.
  • # /tests <test_list>
    • e.g.: # /tests sanity security ec2
    • If this line is commented out, it will run the default set of tests (same as the defaults in dlc_developer_config.toml): sanity, security, ec2, ecs, eks, sagemaker, sagemaker-local.
# /buildspec <buildspec_path>
# /tests <test_list>
Toggle if you are merging into main Branch

PR Checklist

  • [] I ran pre-commit run --all-files locally before creating this PR. (Read DEVELOPMENT.md for details).

Adds an os_scan_allowlist entry for mesa CVE-2026-40393 (CVSS v3 9.8
CRITICAL, WebGPU OOB) to the PyTorch 2.10 EC2 training CPU and GPU
(cu130) allowlists. mesa is pulled in transitively via libgl1-mesa-glx;
training containers do not expose a WebGPU/rendering surface to
untrusted content, so the vulnerable code path is not reachable.
Awaiting patched mesa in Ubuntu 22.04 (upstream fix in mesa 25.3.6 /
26.0.1).

Also scopes dlc_developer_config.toml to the PyTorch training EC2
buildspec only, enables sanity/security/EC2 tests, and disables
SageMaker test suites for this verification PR.
Bhanu Teja Goshikonda added 3 commits June 19, 2026 09:07
…files

Install nest-asyncio==1.6.0 explicitly in the common stage of the CPU
and GPU Dockerfiles. Restores the package previously pulled in
transitively; release-baseline regression test now matches.
Add SFTY-20250331-30014 (torch <=2.12.0 torch.jit.script memory
corruption) to IGNORE_SAFETY_IDS for PT 2.10 EC2 training CPU/GPU.
Affected code path is not exercised in DLC training; awaiting upstream
patched torch.
Add SFTY-20260511-67155 to IGNORE_SAFETY_IDS for PT EC2 training py3.
The CVE is for a commit made after flash-attn 2.8.3 was released and
affects upstream repo training scripts, not the installed package.
Mirrors the existing entry in the ECR enhanced-scan allowlist.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant