Skip to content

Hugging Face Model integration in Superbench#803

Open
Aishwarya-Tonpe wants to merge 1 commit into
mainfrom
hf-models-clean
Open

Hugging Face Model integration in Superbench#803
Aishwarya-Tonpe wants to merge 1 commit into
mainfrom
hf-models-clean

Conversation

@Aishwarya-Tonpe
Copy link
Copy Markdown
Contributor

@Aishwarya-Tonpe Aishwarya-Tonpe commented Apr 13, 2026

Adds support for loading and benchmarking models from HuggingFace Hub across Inference micro-benchmarks -ORT/TensorRT inference. Users can run any compatible HF-hosted model through the existing benchmark harness using --model_source huggingface --model_identifier <org/model>.

SuperBench previously only supported in-house model definitions with hardcoded architectures. Adding new models required code changes. This PR allows benchmarking any compatible HuggingFace model with a CLI flag change, including gated models via HF_TOKEN.

Key Changes

New modules:

  • HuggingFaceModelLoader — Downloads, caches, and loads models from HF Hub. Estimates parameter count from model config (few KB) and checks GPU
    memory before downloading full weights to avoid failed multi-GB downloads.

  • ModelSourceConfig — Dataclass for model source configuration (in-house / huggingface), dtype, revision, auth token, and device mapping.

    Micro-benchmarks (inference):

  • ORT inference — Downloads HF model → exports to ONNX → runs ORT inference. Handles both vision (pixel_values) and NLP (input_ids) inputs
    automatically.

  • TensorRT inference — Same flow: download → ONNX export → trtexec engine build → inference. Includes dynamic input shape detection from the
    exported ONNX graph.

  • ONNX exporter — New export_huggingface_model() method with vision/NLP auto-detection, dynamic axes, and external data support for large models
    (>2GB).

Testing

  • test_model_source_config.py — Unit tests for validation, defaults, and edge cases.
  • test_huggingface_loader.py — Unit tests for dtype conversion, model size calculation, memory estimation, and param count estimation.
  • test_huggingface_e2e.py — End-to-end integration tests covering micro-benchmarks with real HF models.

Usage

Training benchmark

ORT inference
python examples/benchmarks/ort_inference_performance.py
--model_source huggingface --model_identifier bert-base-uncased

TensorRT inference
python examples/benchmarks/tensorrt_inference_performance.py
--model_source huggingface --model_identifier microsoft/resnet-50

Gated models
export HF_TOKEN=hf_xxxxx

@Aishwarya-Tonpe Aishwarya-Tonpe requested a review from a team as a code owner April 13, 2026 17:36
Copilot AI review requested due to automatic review settings April 13, 2026 17:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds HuggingFace Hub as a first-class model source across SuperBench training benchmarks and ORT/TensorRT inference micro-benchmarks, enabling users to benchmark arbitrary HF models via CLI flags (including gated models via HF_TOKEN).

Changes:

  • Introduces ModelSourceConfig and HuggingFaceModelLoader for unified HF model configuration/loading and memory-fit checks.
  • Extends PyTorch model benchmarks to optionally load HF backbones and wrap them with task-specific heads.
  • Adds HF→ONNX export support and integrates HF flows into ORT and TensorRT inference micro-benchmarks, plus new tests and examples.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/benchmarks/micro_benchmarks/test_model_source_config.py Adds unit tests for ModelSourceConfig validation/defaulting.
tests/benchmarks/micro_benchmarks/test_huggingface_loader.py Adds unit tests for HF loader dtype handling, load flow, and size estimation.
tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py Adds integration tests that download real HF models and validate basic forward pass.
superbench/benchmarks/model_benchmarks/pytorch_mixtral_impl.py Adds HF config customization + wrapper and HF-loading branch for Mixtral benchmark.
superbench/benchmarks/model_benchmarks/pytorch_lstm.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_llama.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_gpt2.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_cnn.py Adds HF-loading path + wrapper for HF vision backbones, keeps in-house torchvision path.
superbench/benchmarks/model_benchmarks/pytorch_bert.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_base.py Adds shared HF model loading flow, memory estimation, and CLI args for model source/identifier.
superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Adds HF model preprocessing: config-only memory check, HF load, ONNX export, TRT build command.
superbench/benchmarks/micro_benchmarks/ort_inference_performance.py Adds HF preprocessing (config memory check, HF load, ONNX export/quantize) + dynamic input handling.
superbench/benchmarks/micro_benchmarks/model_source_config.py New dataclass encapsulating model source, identifier, dtype, token, and loader kwargs.
superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py New loader for HF Hub with tokenizer support, size/memory estimation utilities, and pre-checks.
superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py Adds HF model ONNX export with vision/NLP detection, dynamic axes, and optional external data output.
examples/benchmarks/tensorrt_inference_performance.py Updates example script to show in-house vs HF usage via CLI.
examples/benchmarks/pytorch_huggingface_models.py New example demonstrating HF-backed training benchmarks, incl. distributed option.
examples/benchmarks/ort_inference_performance.py Updates ORT example script to show in-house vs HF usage via CLI.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread superbench/benchmarks/model_benchmarks/pytorch_base.py Outdated
Comment thread tests/benchmarks/micro_benchmarks/test_model_source_config.py
Comment thread superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py
Comment thread superbench/benchmarks/micro_benchmarks/model_source_config.py Outdated
Comment thread superbench/benchmarks/model_benchmarks/pytorch_base.py Outdated
@Aishwarya-Tonpe Aishwarya-Tonpe changed the title Hf models clean Hugging Face Model integration in Superbench Apr 14, 2026
Copilot AI review requested due to automatic review settings April 14, 2026 17:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread superbench/benchmarks/model_benchmarks/pytorch_base.py Outdated
Comment thread superbench/benchmarks/model_benchmarks/pytorch_base.py Outdated
Comment thread superbench/benchmarks/model_benchmarks/pytorch_base.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Comment thread superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py
Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py
Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Comment thread superbench/benchmarks/micro_benchmarks/model_source_config.py
Comment thread superbench/benchmarks/micro_benchmarks/model_source_config.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/model_source_config.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py
Comment thread tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py Outdated
Copilot AI review requested due to automatic review settings April 14, 2026 20:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py
Comment thread tests/benchmarks/micro_benchmarks/test_huggingface_loader.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Comment thread tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
…e benchmarks

- Add HuggingFaceModelLoader for downloading and caching models from HF Hub
- Support both NLP (AutoModelForCausalLM) and vision (AutoModelForImageClassification) models
- Add model_source and model_identifier parameters to TensorRT/ORT benchmarks
- Add ONNX export pipeline for HuggingFace models with dynamic axes
- Derive vision input shapes from ONNX graph dims with HF config fallback
- Filter ONNX initializers from graph.input for correct NLP input handling
- Add PyTorch 2.8+ compatibility (external_data vs use_external_data_format)
- Add example script, unit tests, and config schema updates
- Support HF_TOKEN env var for gated model access
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Extract loading parameters
return self.load_model(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelSourceConfig.hf_token (and cache_dir) never get forwarded here, so passing a token via config has no effect unless it also exists in env vars.

That breaks private/gated model loading for callers that rely on explicit config instead of process environment. Consider wiring these fields through explicitly (for example by constructing HuggingFaceModelLoader(cache_dir=config.cache_dir, token=config.hf_token) or forwarding token/cache args into load_model).

elif 'input_ids' in input_names:
# NLP model: use input_ids and attention_mask
seq_len = getattr(self._args, 'seq_length', 512)
input_ids = np.random.randint(0, 30000, (self._args.batch_size, seq_len)).astype(np.int64)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a fixed upper bound (30000) for synthetic token IDs can produce out-of-range indices for models with smaller vocabularies, causing runtime failures in embedding lookup.

Could we derive the max token ID from the model config/tokenizer vocab size (or use a conservative low bound) before generating input_ids?

# Load config (use pre-downloaded config if provided)
if config is None:
logger.info('Loading model configuration...')
config = AutoConfig.from_pretrained(model_identifier, trust_remote_code=True, **load_kwargs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trust_remote_code=True executes repository-provided Python during config/tokenizer/model load. Since model_identifier comes from CLI input, this effectively allows arbitrary code execution from untrusted model repos.

Recommend defaulting this to False and requiring an explicit opt-in flag for remote code execution (with clear warning in docs).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Comment on lines +6 to +10
import os
import re
from pathlib import Path

import torch
Comment on lines +325 to +328
if 'pixel_values' in input_names:
# Vision model: use pixel_values (batch_size, 3, 224, 224)
pixel_values = np.random.randn(self._args.batch_size, 3, 224, 224).astype(dtype=precision)
inputs = {'pixel_values': pixel_values}
}

if normalized_dtype not in dtype_map:
raise ValueError(f"Invalid dtype '{dtype_str}'.Must be one of {list(dtype_map.keys())}")
Comment on lines +27 to +29
def loader(self):
"""Create a loader instance."""
return HuggingFaceModelLoader(cache_dir='/tmp/hf_test_cache')
Comment thread tests/helper/decorator.py

pytorch_test = unittest.skipIf(os.environ.get('SB_TEST_PYTORCH', '1') == '0', 'Skip PyTorch tests.')
directx_test = unittest.skipIf(os.environ.get('SB_TEST_DIRECTX', '0') == '0', 'Skip DirectX tests.')
hf_e2e_test = unittest.skipUnless(os.environ.get('SB_TEST_HF_E2E', '0') == '1', 'Skip HF E2E tests. Set SB_TEST_HF_E2E=1 to enable.')
from superbench.benchmarks.micro_benchmarks.model_source_config import ModelSourceConfig


@pytest.mark.skipif(os.environ.get('SB_TEST_HF_E2E', '0') != '1', reason='Skip HF E2E tests. Set SB_TEST_HF_E2E=1 to enable.')
Comment on lines +205 to +215
# Extract loading parameters
return self.load_model(
model_identifier=config.identifier,
torch_dtype=config.torch_dtype,
device=device,
revision=config.revision,
device_map=config.device_map,
config=config_pretrained,
**config.additional_kwargs
)

Comment on lines +115 to +130
config = AutoConfig.from_pretrained(model_identifier, trust_remote_code=True, **load_kwargs)
else:
logger.info('Using pre-downloaded model configuration.')

# Load tokenizer (may fail for some models, that's ok)
tokenizer = None
try:
logger.info('Loading tokenizer...')
tokenizer = AutoTokenizer.from_pretrained(model_identifier, trust_remote_code=True, **load_kwargs)
except Exception as e:
logger.warning(f'Could not load tokenizer: {e}. Continuing without tokenizer.')

# Load model
logger.info(f'Loading model weights (dtype={torch_dtype}, device={device})...')
model_kwargs = load_kwargs.copy()
model_kwargs['trust_remote_code'] = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] Unconditional trust_remote_code=True enables RCE from a CLI argument

Every AutoConfig/AutoTokenizer/AutoModel.from_pretrained call in the loader (and the duplicate calls in _preprocess_huggingface_models in both ORT and TRT benchmarks) hardcodes trust_remote_code=True. The model identifier flows directly from the --model_identifier CLI flag — there is no allow-list, no opt-in flag, and no warning. HF then downloads and exec()s arbitrary configuration_*.py/modeling_*.py/tokenization_*.py from the supplied repo. Since SuperBench is commonly run as root in containers on GPU hosts with credentialed network access, the blast radius is full host compromise.

Default trust_remote_code=False. Add a separate --allow_remote_code opt-in flag, and log a prominent warning when it is set. Recommend pinning --revision <sha> in the example docs.

trust = bool(self._args.allow_remote_code)
if trust:
    logger.warning('allow_remote_code=True: executing arbitrary Python from HF repo %r', model_identifier)
config = AutoConfig.from_pretrained(model_identifier, trust_remote_code=trust, **load_kwargs)

# Determine input format based on what the model expects
if 'pixel_values' in input_names:
# Vision model: use pixel_values (batch_size, 3, 224, 224)
pixel_values = np.random.randn(self._args.batch_size, 3, 224, 224).astype(dtype=precision)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] ORT inference hardcodes (B, 3, 224, 224) for all vision HF models

The exporter (export_huggingface_model) correctly derives (C, H, W) from model.config.num_channels and image_size, so a 384×384 ViT or 1-channel medical model exports with the right shape — but __inference then always feeds (batch, 3, 224, 224) whenever pixel_values is an input. ORT will fail with a shape mismatch on any vision model that is not (3, 224, 224). The TRT path already reads dims from the ONNX graph; ORT must do the same.

meta = next(i for i in ort_sess.get_inputs() if i.name == 'pixel_values')
batch = self._args.batch_size
dims = [d if isinstance(d, int) else None for d in meta.shape]
_, c, h, w = (dims + [None]*4)[:4]
c, h, w = c or 3, h or 224, w or 224
pixel_values = np.random.randn(batch, c, h, w).astype(precision)
inputs = {'pixel_values': pixel_values}

'--explicitBatch',
f'--optShapes=input:{input_shape}',
'--workspace=8192',
'--memPoolSize=workspace:8192M',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] --memPoolSize requires TensorRT 8.4+; in-house TRT path now silently breaks on older runtimes

The in-house branch had --explicitBatch --workspace=8192; this PR removes --explicitBatch and replaces --workspace= with --memPoolSize=workspace:8192M. --memPoolSize was introduced in TensorRT 8.4; --workspace= was deprecated in 8.4 but still accepted through TRT 10. The PR description does not declare a minimum TensorRT version bump, so any environment still on TRT 8.0–8.3 (common in CUDA 11.x base images) hits an unknown-argument error.

Either gate the flag on detected trtexec version (keep --workspace=8192 for < 8.4), or explicitly bump and document the minimum TensorRT version in README + Dockerfiles + changelog.

Comment on lines +206 to +213
return self.load_model(
model_identifier=config.identifier,
torch_dtype=config.torch_dtype,
device=device,
revision=config.revision,
device_map=config.device_map,
config=config_pretrained,
**config.additional_kwargs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] load_model_from_config silently drops config.hf_token and config.cache_dir

ModelSourceConfig exposes hf_token and cache_dir as part of its documented contract, but load_model_from_config extracts only identifier, torch_dtype, revision, device_map, and additional_kwargs. The loader then falls back to self.token/self.cache_dir populated from env vars in __init__. A caller who instantiates HuggingFaceModelLoader() (no args) and supplies a token only via ModelSourceConfig(hf_token=...) will get a silent 401 on private/gated models — exactly the use case the PR description advertises.

return self.load_model(
    model_identifier=config.identifier,
    torch_dtype=config.torch_dtype,
    device=device,
    revision=config.revision,
    device_map=config.device_map,
    config=config_pretrained,
    token=config.hf_token or self.token,
    cache_dir=config.cache_dir or self.cache_dir,
    **config.additional_kwargs,
)

Comment on lines +51 to +53
valid_dtypes = ['float32', 'float16', 'bfloat16', 'int8']
if self.torch_dtype not in valid_dtypes:
raise ValueError(f"Invalid torch_dtype '{self.torch_dtype}'. Must be one of {valid_dtypes}.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] ModelSourceConfig validates 'int8' but _get_torch_dtype rejects it

valid_dtypes = ['float32', 'float16', 'bfloat16', 'int8'] accepts 'int8' at construction time, but HuggingFaceModelLoader._get_torch_dtype('int8') raises ValueError. Two sources of truth disagree, producing a confusing late-stage failure. The ORT preprocess works around this by mapping INT8→FLOAT32 before constructing the config; any other caller will fail.

Drop 'int8' from valid_dtypes (INT8 is handled via post-export quantize_dynamic, not via torch_dtype) — or implement it consistently in the loader.

elif 'input_ids' in input_names:
# NLP model: use input_ids and attention_mask
seq_len = getattr(self._args, 'seq_length', 512)
input_ids = np.random.randint(0, 30000, (self._args.batch_size, seq_len)).astype(np.int64)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] Random input_ids bound 30000 can exceed model vocabulary

np.random.randint(0, 30000, ...) produces out-of-range token IDs for any model with vocab_size < 30000 (many distilled / domain-specific HF models). Embedding Gather on out-of-range indices is undefined on CUDA — silent NaNs, garbage timings, or device-side assertions. The exporter already loaded the HF config; carry vocab_size through.

vocab_size = getattr(self._hf_config, 'vocab_size', 30000)
input_ids = np.random.randint(0, vocab_size, (self._args.batch_size, seq_len), dtype=np.int64)

f'--onnx={onnx_path}',
f'--optShapes={input_shapes}',
'--memPoolSize=workspace:8192M',
None if self._args.precision == 'fp32' else f'--{self._args.precision}',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] TRT --int8 is emitted without calibration data — INT8 numbers are not meaningful

The HF TRT args list emits f'--{self._args.precision}', so for INT8 trtexec receives --int8 but no --calib=<file> and no Q/DQ ONNX. trtexec falls back to fake dynamic ranges and reports latencies, but accuracy is undefined and the benchmark is misleading.

Either reject INT8 in the HF TRT path with a clear "requires calibration" error, generate a small synthetic calibration cache from dummy inputs and pass --calib=, or run ORT static quantization first (Q/DQ embedded) and feed the quantized ONNX to trtexec.

Comment on lines +278 to +282
# Require CUDAExecutionProvider — this benchmark targets GPU inference
available = ort.get_available_providers()
if 'CUDAExecutionProvider' not in available:
logger.error(f'CUDAExecutionProvider is not available (available: {available}).')
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] _benchmark hard-fails without CUDAExecutionProvider — behavioral regression for CPU/ROCm smoke tests

The previous behavior allowed ORT to fall back to whatever provider was registered. The new hard return False breaks CPU-only smoke tests, ROCm hosts using onnxruntime-rocm, and future provider extensions. Even if CUDA is the intended target, this should not be a silent fail-closed.

Either downgrade to a warning and let InferenceSession pick its own providers, or gate the strict check on an explicit --require_cuda flag so the default in-house path keeps its prior tolerance.

Comment on lines +221 to +222
import onnx as onnx_lib
onnx_model = onnx_lib.load(onnx_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] onnx.load(onnx_path) will OOM/fail for >2GB external-data models

The exporter switches to external-data format when model size > 2GB (LLaMA-7B and up). onnx.load() defaults to load_external_data=True and will try to materialize all sidecar tensor data into memory — solely to read graph input metadata. This crashes or OOMs the host node on exactly the models the external-data branch was written for.

onnx_model = onnx_lib.load(onnx_path, load_external_data=False)

from superbench.benchmarks.micro_benchmarks._export_torch_to_onnx import torch2onnxExporter
exporter = torch2onnxExporter()

model_name = self._args.model_identifier.replace('/', '_')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] Insufficient sanitization of --model_identifier before use in filesystem paths and from_pretrained

model_name = self._args.model_identifier.replace('/', '_') strips / but not .., \ (Windows separator), :, or control characters. The same model_identifier is also passed to AutoConfig.from_pretrained, which will load from a local directory if the string resolves as a path — letting a crafted argument load attacker-staged files from a mounted volume.

Validate at the CLI boundary with a strict allow-list and assert path containment:

import re
_SAFE_ID = re.compile(r'^[A-Za-z0-9][A-Za-z0-9._-]{0,127}(/[A-Za-z0-9._-]{1,128})?$')
if not _SAFE_ID.match(model_identifier):
    raise ValueError(f'Invalid model_identifier: {model_identifier!r}')
out = (proc_output_path / (model_name + '.onnx')).resolve()
assert proc_output_path.resolve() in out.parents

Comment thread tests/helper/decorator.py

pytorch_test = unittest.skipIf(os.environ.get('SB_TEST_PYTORCH', '1') == '0', 'Skip PyTorch tests.')
directx_test = unittest.skipIf(os.environ.get('SB_TEST_DIRECTX', '0') == '0', 'Skip DirectX tests.')
hf_e2e_test = unittest.skipUnless(os.environ.get('SB_TEST_HF_E2E', '0') == '1', 'Skip HF E2E tests. Set SB_TEST_HF_E2E=1 to enable.')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] New hf_e2e_test is dead code and exceeds the repo's 120-char flake8 limit

The new hf_e2e_test = unittest.skipUnless(...) line is 138 characters (repo flake8 max-line-length is 120), and the symbol is not referenced anywhere — tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py uses its own inline @pytest.mark.skipif.

Either delete the helper, or wire TestHuggingFaceE2E to use it (one source of truth). Either way, wrap the line:

hf_e2e_test = unittest.skipUnless(
    os.environ.get('SB_TEST_HF_E2E', '0') == '1',
    'Skip HF E2E tests. Set SB_TEST_HF_E2E=1 to enable.',
)

Returns:
bool: True if preprocessing succeeds.
"""
import os
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] Duplicate os import in TRT module

This PR adds a module-level import os, but _preprocess_huggingface_models also has a local import os. Lint (F401/F811) will flag at least one. Keep the module-level import; remove the inner one.

Comment on lines +27 to +29
def loader(self):
"""Create a loader instance."""
return HuggingFaceModelLoader(cache_dir='/tmp/hf_test_cache')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] E2E test fixture hardcodes /tmp/hf_test_cache

HuggingFaceModelLoader(cache_dir='/tmp/hf_test_cache') is non-portable (fails on Windows runners), predictable on shared CI (symlink/squatting risk), and collides across pytest-xdist workers. The loader-unit-test file already uses tmp_path — be consistent.

@pytest.fixture
def loader(self, tmp_path):
    return HuggingFaceModelLoader(cache_dir=str(tmp_path / 'hf_cache'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants