Hugging Face Model integration in Superbench by Aishwarya-Tonpe · Pull Request #803 · microsoft/superbenchmark

Aishwarya-Tonpe · 2026-04-13T17:35:59Z

Adds support for loading and benchmarking models from HuggingFace Hub across Inference micro-benchmarks -ORT/TensorRT inference. Users can run any compatible HF-hosted model through the existing benchmark harness using --model_source huggingface --model_identifier <org/model>.

SuperBench previously only supported in-house model definitions with hardcoded architectures. Adding new models required code changes. This PR allows benchmarking any compatible HuggingFace model with a CLI flag change, including gated models via HF_TOKEN.

Key Changes

New modules:

HuggingFaceModelLoader — Downloads, caches, and loads models from HF Hub. Estimates parameter count from model config (few KB) and checks GPU
memory before downloading full weights to avoid failed multi-GB downloads.
ModelSourceConfig — Dataclass for model source configuration (in-house / huggingface), dtype, revision, auth token, and device mapping.

Micro-benchmarks (inference):
ORT inference — Downloads HF model → exports to ONNX → runs ORT inference. Handles both vision (pixel_values) and NLP (input_ids) inputs
automatically.
TensorRT inference — Same flow: download → ONNX export → trtexec engine build → inference. Includes dynamic input shape detection from the
exported ONNX graph.
ONNX exporter — New export_huggingface_model() method with vision/NLP auto-detection, dynamic axes, and external data support for large models
(>2GB).

Testing

test_model_source_config.py — Unit tests for validation, defaults, and edge cases.
test_huggingface_loader.py — Unit tests for dtype conversion, model size calculation, memory estimation, and param count estimation.
test_huggingface_e2e.py — End-to-end integration tests covering micro-benchmarks with real HF models.

Usage

Training benchmark

ORT inference
python examples/benchmarks/ort_inference_performance.py
--model_source huggingface --model_identifier bert-base-uncased

TensorRT inference
python examples/benchmarks/tensorrt_inference_performance.py
--model_source huggingface --model_identifier microsoft/resnet-50

Gated models
export HF_TOKEN=hf_xxxxx

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds HuggingFace Hub as a first-class model source across SuperBench training benchmarks and ORT/TensorRT inference micro-benchmarks, enabling users to benchmark arbitrary HF models via CLI flags (including gated models via HF_TOKEN).

Changes:

Introduces ModelSourceConfig and HuggingFaceModelLoader for unified HF model configuration/loading and memory-fit checks.
Extends PyTorch model benchmarks to optionally load HF backbones and wrap them with task-specific heads.
Adds HF→ONNX export support and integrates HF flows into ORT and TensorRT inference micro-benchmarks, plus new tests and examples.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/benchmarks/micro_benchmarks/test_model_source_config.py	Adds unit tests for `ModelSourceConfig` validation/defaulting.
tests/benchmarks/micro_benchmarks/test_huggingface_loader.py	Adds unit tests for HF loader dtype handling, load flow, and size estimation.
tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py	Adds integration tests that download real HF models and validate basic forward pass.
superbench/benchmarks/model_benchmarks/pytorch_mixtral_impl.py	Adds HF config customization + wrapper and HF-loading branch for Mixtral benchmark.
superbench/benchmarks/model_benchmarks/pytorch_lstm.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_llama.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_gpt2.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_cnn.py	Adds HF-loading path + wrapper for HF vision backbones, keeps in-house torchvision path.
superbench/benchmarks/model_benchmarks/pytorch_bert.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_base.py	Adds shared HF model loading flow, memory estimation, and CLI args for model source/identifier.
superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py	Adds HF model preprocessing: config-only memory check, HF load, ONNX export, TRT build command.
superbench/benchmarks/micro_benchmarks/ort_inference_performance.py	Adds HF preprocessing (config memory check, HF load, ONNX export/quantize) + dynamic input handling.
superbench/benchmarks/micro_benchmarks/model_source_config.py	New dataclass encapsulating model source, identifier, dtype, token, and loader kwargs.
superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py	New loader for HF Hub with tokenizer support, size/memory estimation utilities, and pre-checks.
superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py	Adds HF model ONNX export with vision/NLP detection, dynamic axes, and optional external data output.
examples/benchmarks/tensorrt_inference_performance.py	Updates example script to show in-house vs HF usage via CLI.
examples/benchmarks/pytorch_huggingface_models.py	New example demonstrating HF-backed training benchmarks, incl. distributed option.
examples/benchmarks/ort_inference_performance.py	Updates ORT example script to show in-house vs HF usage via CLI.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…e benchmarks - Add HuggingFaceModelLoader for downloading and caching models from HF Hub - Support both NLP (AutoModelForCausalLM) and vision (AutoModelForImageClassification) models - Add model_source and model_identifier parameters to TensorRT/ORT benchmarks - Add ONNX export pipeline for HuggingFace models with dynamic axes - Derive vision input shapes from ONNX graph dims with HF config fallback - Filter ONNX initializers from graph.input for correct NLP input handling - Add PyTorch 2.8+ compatibility (external_data vs use_external_data_format) - Add example script, unit tests, and config schema updates - Support HF_TOKEN env var for gated model access

polarG · 2026-05-26T17:26:28Z

+            device = 'cuda' if torch.cuda.is_available() else 'cpu'
+
+        # Extract loading parameters
+        return self.load_model(


ModelSourceConfig.hf_token (and cache_dir) never get forwarded here, so passing a token via config has no effect unless it also exists in env vars.

That breaks private/gated model loading for callers that rely on explicit config instead of process environment. Consider wiring these fields through explicitly (for example by constructing HuggingFaceModelLoader(cache_dir=config.cache_dir, token=config.hf_token) or forwarding token/cache args into load_model).

polarG · 2026-05-26T17:26:29Z

+        elif 'input_ids' in input_names:
+            # NLP model: use input_ids and attention_mask
+            seq_len = getattr(self._args, 'seq_length', 512)
+            input_ids = np.random.randint(0, 30000, (self._args.batch_size, seq_len)).astype(np.int64)


Using a fixed upper bound (30000) for synthetic token IDs can produce out-of-range indices for models with smaller vocabularies, causing runtime failures in embedding lookup.

Could we derive the max token ID from the model config/tokenizer vocab size (or use a conservative low bound) before generating input_ids?

polarG · 2026-05-26T17:26:31Z

+            # Load config (use pre-downloaded config if provided)
+            if config is None:
+                logger.info('Loading model configuration...')
+                config = AutoConfig.from_pretrained(model_identifier, trust_remote_code=True, **load_kwargs)


trust_remote_code=True executes repository-provided Python during config/tokenizer/model load. Since model_identifier comes from CLI input, this effectively allows arbitrary code execution from untrusted model repos.

Recommend defaulting this to False and requiring an explicit opt-in flag for remote code execution (with clear warning in docs).

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

+import os
 import re
 from pathlib import Path

+import torch


+        if 'pixel_values' in input_names:
+            # Vision model: use pixel_values (batch_size, 3, 224, 224)
+            pixel_values = np.random.randn(self._args.batch_size, 3, 224, 224).astype(dtype=precision)
+            inputs = {'pixel_values': pixel_values}


+        }
+
+        if normalized_dtype not in dtype_map:
+            raise ValueError(f"Invalid dtype '{dtype_str}'.Must be one of {list(dtype_map.keys())}")


+    def loader(self):
+        """Create a loader instance."""
+        return HuggingFaceModelLoader(cache_dir='/tmp/hf_test_cache')



 pytorch_test = unittest.skipIf(os.environ.get('SB_TEST_PYTORCH', '1') == '0', 'Skip PyTorch tests.')
 directx_test = unittest.skipIf(os.environ.get('SB_TEST_DIRECTX', '0') == '0', 'Skip DirectX tests.')
+hf_e2e_test = unittest.skipUnless(os.environ.get('SB_TEST_HF_E2E', '0') == '1', 'Skip HF E2E tests. Set SB_TEST_HF_E2E=1 to enable.')


+from superbench.benchmarks.micro_benchmarks.model_source_config import ModelSourceConfig
+
+
+@pytest.mark.skipif(os.environ.get('SB_TEST_HF_E2E', '0') != '1', reason='Skip HF E2E tests. Set SB_TEST_HF_E2E=1 to enable.')


+        # Extract loading parameters
+        return self.load_model(
+            model_identifier=config.identifier,
+            torch_dtype=config.torch_dtype,
+            device=device,
+            revision=config.revision,
+            device_map=config.device_map,
+            config=config_pretrained,
+            **config.additional_kwargs
+        )
+


guoshzhao · 2026-05-26T20:48:47Z

+                config = AutoConfig.from_pretrained(model_identifier, trust_remote_code=True, **load_kwargs)
+            else:
+                logger.info('Using pre-downloaded model configuration.')
+
+            # Load tokenizer (may fail for some models, that's ok)
+            tokenizer = None
+            try:
+                logger.info('Loading tokenizer...')
+                tokenizer = AutoTokenizer.from_pretrained(model_identifier, trust_remote_code=True, **load_kwargs)
+            except Exception as e:
+                logger.warning(f'Could not load tokenizer: {e}. Continuing without tokenizer.')
+
+            # Load model
+            logger.info(f'Loading model weights (dtype={torch_dtype}, device={device})...')
+            model_kwargs = load_kwargs.copy()
+            model_kwargs['trust_remote_code'] = True


[BLOCKER] Unconditional trust_remote_code=True enables RCE from a CLI argument

Every AutoConfig/AutoTokenizer/AutoModel.from_pretrained call in the loader (and the duplicate calls in _preprocess_huggingface_models in both ORT and TRT benchmarks) hardcodes trust_remote_code=True. The model identifier flows directly from the --model_identifier CLI flag — there is no allow-list, no opt-in flag, and no warning. HF then downloads and exec()s arbitrary configuration_*.py/modeling_*.py/tokenization_*.py from the supplied repo. Since SuperBench is commonly run as root in containers on GPU hosts with credentialed network access, the blast radius is full host compromise.

Default trust_remote_code=False. Add a separate --allow_remote_code opt-in flag, and log a prominent warning when it is set. Recommend pinning --revision <sha> in the example docs.

trust = bool(self._args.allow_remote_code) if trust: logger.warning('allow_remote_code=True: executing arbitrary Python from HF repo %r', model_identifier) config = AutoConfig.from_pretrained(model_identifier, trust_remote_code=trust, **load_kwargs)

guoshzhao · 2026-05-26T20:48:48Z

+        # Determine input format based on what the model expects
+        if 'pixel_values' in input_names:
+            # Vision model: use pixel_values (batch_size, 3, 224, 224)
+            pixel_values = np.random.randn(self._args.batch_size, 3, 224, 224).astype(dtype=precision)


[BLOCKER] ORT inference hardcodes (B, 3, 224, 224) for all vision HF models

The exporter (export_huggingface_model) correctly derives (C, H, W) from model.config.num_channels and image_size, so a 384×384 ViT or 1-channel medical model exports with the right shape — but __inference then always feeds (batch, 3, 224, 224) whenever pixel_values is an input. ORT will fail with a shape mismatch on any vision model that is not (3, 224, 224). The TRT path already reads dims from the ONNX graph; ORT must do the same.

meta = next(i for i in ort_sess.get_inputs() if i.name == 'pixel_values') batch = self._args.batch_size dims = [d if isinstance(d, int) else None for d in meta.shape] _, c, h, w = (dims + [None]*4)[:4] c, h, w = c or 3, h or 224, w or 224 pixel_values = np.random.randn(batch, c, h, w).astype(precision) inputs = {'pixel_values': pixel_values}

guoshzhao · 2026-05-26T20:48:50Z

-                '--explicitBatch',
                f'--optShapes=input:{input_shape}',
-                '--workspace=8192',
+                '--memPoolSize=workspace:8192M',


[BLOCKER] --memPoolSize requires TensorRT 8.4+; in-house TRT path now silently breaks on older runtimes

The in-house branch had --explicitBatch --workspace=8192; this PR removes --explicitBatch and replaces --workspace= with --memPoolSize=workspace:8192M. --memPoolSize was introduced in TensorRT 8.4; --workspace= was deprecated in 8.4 but still accepted through TRT 10. The PR description does not declare a minimum TensorRT version bump, so any environment still on TRT 8.0–8.3 (common in CUDA 11.x base images) hits an unknown-argument error.

Either gate the flag on detected trtexec version (keep --workspace=8192 for < 8.4), or explicitly bump and document the minimum TensorRT version in README + Dockerfiles + changelog.

guoshzhao · 2026-05-26T20:48:51Z

+        return self.load_model(
+            model_identifier=config.identifier,
+            torch_dtype=config.torch_dtype,
+            device=device,
+            revision=config.revision,
+            device_map=config.device_map,
+            config=config_pretrained,
+            **config.additional_kwargs


[SHOULD-FIX] load_model_from_config silently drops config.hf_token and config.cache_dir

ModelSourceConfig exposes hf_token and cache_dir as part of its documented contract, but load_model_from_config extracts only identifier, torch_dtype, revision, device_map, and additional_kwargs. The loader then falls back to self.token/self.cache_dir populated from env vars in __init__. A caller who instantiates HuggingFaceModelLoader() (no args) and supplies a token only via ModelSourceConfig(hf_token=...) will get a silent 401 on private/gated models — exactly the use case the PR description advertises.

return self.load_model( model_identifier=config.identifier, torch_dtype=config.torch_dtype, device=device, revision=config.revision, device_map=config.device_map, config=config_pretrained, token=config.hf_token or self.token, cache_dir=config.cache_dir or self.cache_dir, **config.additional_kwargs, )

guoshzhao · 2026-05-26T20:48:52Z

+        valid_dtypes = ['float32', 'float16', 'bfloat16', 'int8']
+        if self.torch_dtype not in valid_dtypes:
+            raise ValueError(f"Invalid torch_dtype '{self.torch_dtype}'. Must be one of {valid_dtypes}.")


[SHOULD-FIX] ModelSourceConfig validates 'int8' but _get_torch_dtype rejects it

valid_dtypes = ['float32', 'float16', 'bfloat16', 'int8'] accepts 'int8' at construction time, but HuggingFaceModelLoader._get_torch_dtype('int8') raises ValueError. Two sources of truth disagree, producing a confusing late-stage failure. The ORT preprocess works around this by mapping INT8→FLOAT32 before constructing the config; any other caller will fail.

Drop 'int8' from valid_dtypes (INT8 is handled via post-export quantize_dynamic, not via torch_dtype) — or implement it consistently in the loader.

guoshzhao · 2026-05-26T20:48:53Z

+        elif 'input_ids' in input_names:
+            # NLP model: use input_ids and attention_mask
+            seq_len = getattr(self._args, 'seq_length', 512)
+            input_ids = np.random.randint(0, 30000, (self._args.batch_size, seq_len)).astype(np.int64)


[SHOULD-FIX] Random input_ids bound 30000 can exceed model vocabulary

np.random.randint(0, 30000, ...) produces out-of-range token IDs for any model with vocab_size < 30000 (many distilled / domain-specific HF models). Embedding Gather on out-of-range indices is undefined on CUDA — silent NaNs, garbage timings, or device-side assertions. The exporter already loaded the HF config; carry vocab_size through.

vocab_size = getattr(self._hf_config, 'vocab_size', 30000) input_ids = np.random.randint(0, vocab_size, (self._args.batch_size, seq_len), dtype=np.int64)

guoshzhao · 2026-05-26T20:48:54Z

+                f'--onnx={onnx_path}',
+                f'--optShapes={input_shapes}',
+                '--memPoolSize=workspace:8192M',
+                None if self._args.precision == 'fp32' else f'--{self._args.precision}',


[SHOULD-FIX] TRT --int8 is emitted without calibration data — INT8 numbers are not meaningful

The HF TRT args list emits f'--{self._args.precision}', so for INT8 trtexec receives --int8 but no --calib=<file> and no Q/DQ ONNX. trtexec falls back to fake dynamic ranges and reports latencies, but accuracy is undefined and the benchmark is misleading.

Either reject INT8 in the HF TRT path with a clear "requires calibration" error, generate a small synthetic calibration cache from dummy inputs and pass --calib=, or run ORT static quantization first (Q/DQ embedded) and feed the quantized ONNX to trtexec.

guoshzhao · 2026-05-26T20:48:55Z

+        # Require CUDAExecutionProvider — this benchmark targets GPU inference
+        available = ort.get_available_providers()
+        if 'CUDAExecutionProvider' not in available:
+            logger.error(f'CUDAExecutionProvider is not available (available: {available}).')
+            return False


[SHOULD-FIX] _benchmark hard-fails without CUDAExecutionProvider — behavioral regression for CPU/ROCm smoke tests

The previous behavior allowed ORT to fall back to whatever provider was registered. The new hard return False breaks CPU-only smoke tests, ROCm hosts using onnxruntime-rocm, and future provider extensions. Even if CUDA is the intended target, this should not be a silent fail-closed.

Either downgrade to a warning and let InferenceSession pick its own providers, or gate the strict check on an explicit --require_cuda flag so the default in-house path keeps its prior tolerance.

guoshzhao · 2026-05-26T20:48:56Z

+            import onnx as onnx_lib
+            onnx_model = onnx_lib.load(onnx_path)


[SHOULD-FIX] onnx.load(onnx_path) will OOM/fail for >2GB external-data models

The exporter switches to external-data format when model size > 2GB (LLaMA-7B and up). onnx.load() defaults to load_external_data=True and will try to materialize all sidecar tensor data into memory — solely to read graph input metadata. This crashes or OOMs the host node on exactly the models the external-data branch was written for.

onnx_model = onnx_lib.load(onnx_path, load_external_data=False)

guoshzhao · 2026-05-26T20:48:57Z

+            from superbench.benchmarks.micro_benchmarks._export_torch_to_onnx import torch2onnxExporter
+            exporter = torch2onnxExporter()
+
+            model_name = self._args.model_identifier.replace('/', '_')


[SHOULD-FIX] Insufficient sanitization of --model_identifier before use in filesystem paths and from_pretrained

model_name = self._args.model_identifier.replace('/', '_') strips / but not .., \ (Windows separator), :, or control characters. The same model_identifier is also passed to AutoConfig.from_pretrained, which will load from a local directory if the string resolves as a path — letting a crafted argument load attacker-staged files from a mounted volume.

Validate at the CLI boundary with a strict allow-list and assert path containment:

import re _SAFE_ID = re.compile(r'^[A-Za-z0-9][A-Za-z0-9._-]{0,127}(/[A-Za-z0-9._-]{1,128})?$') if not _SAFE_ID.match(model_identifier): raise ValueError(f'Invalid model_identifier: {model_identifier!r}') out = (proc_output_path / (model_name + '.onnx')).resolve() assert proc_output_path.resolve() in out.parents

guoshzhao · 2026-05-26T20:48:58Z


 pytorch_test = unittest.skipIf(os.environ.get('SB_TEST_PYTORCH', '1') == '0', 'Skip PyTorch tests.')
 directx_test = unittest.skipIf(os.environ.get('SB_TEST_DIRECTX', '0') == '0', 'Skip DirectX tests.')
+hf_e2e_test = unittest.skipUnless(os.environ.get('SB_TEST_HF_E2E', '0') == '1', 'Skip HF E2E tests. Set SB_TEST_HF_E2E=1 to enable.')


[SHOULD-FIX] New hf_e2e_test is dead code and exceeds the repo's 120-char flake8 limit

The new hf_e2e_test = unittest.skipUnless(...) line is 138 characters (repo flake8 max-line-length is 120), and the symbol is not referenced anywhere — tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py uses its own inline @pytest.mark.skipif.

Either delete the helper, or wire TestHuggingFaceE2E to use it (one source of truth). Either way, wrap the line:

hf_e2e_test = unittest.skipUnless( os.environ.get('SB_TEST_HF_E2E', '0') == '1', 'Skip HF E2E tests. Set SB_TEST_HF_E2E=1 to enable.', )

guoshzhao · 2026-05-26T20:49:00Z

+        Returns:
+            bool: True if preprocessing succeeds.
+        """
+        import os


[SHOULD-FIX] Duplicate os import in TRT module

This PR adds a module-level import os, but _preprocess_huggingface_models also has a local import os. Lint (F401/F811) will flag at least one. Keep the module-level import; remove the inner one.

guoshzhao · 2026-05-26T20:49:01Z

+    def loader(self):
+        """Create a loader instance."""
+        return HuggingFaceModelLoader(cache_dir='/tmp/hf_test_cache')


[SHOULD-FIX] E2E test fixture hardcodes /tmp/hf_test_cache

HuggingFaceModelLoader(cache_dir='/tmp/hf_test_cache') is non-portable (fails on Windows runners), predictable on shared CI (symlink/squatting risk), and collides across pytest-xdist workers. The loader-unit-test file already uses tmp_path — be consistent.

@pytest.fixture def loader(self, tmp_path): return HuggingFaceModelLoader(cache_dir=str(tmp_path / 'hf_cache'))

Aishwarya-Tonpe requested a review from a team as a code owner April 13, 2026 17:36

Copilot AI review requested due to automatic review settings April 13, 2026 17:36

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Copilot started reviewing on behalf of Aishwarya-Tonpe April 13, 2026 17:48 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch from f689460 to 2a47dc8 Compare April 13, 2026 19:33

Aishwarya-Tonpe changed the title ~~Hf models clean~~ Hugging Face Model integration in Superbench Apr 14, 2026

Copilot AI review requested due to automatic review settings April 14, 2026 17:30

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 2a47dc8 to a61db26 Compare April 14, 2026 17:30

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Aishwarya-Tonpe force-pushed the hf-models-clean branch from a61db26 to 6bebb38 Compare April 14, 2026 18:27

Copilot started reviewing on behalf of Aishwarya-Tonpe April 14, 2026 18:34 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 6bebb38 to 4eec2f7 Compare April 14, 2026 20:05

Copilot AI review requested due to automatic review settings April 14, 2026 20:34

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 4eec2f7 to 2f24e0f Compare April 14, 2026 20:34

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 2f24e0f to 18df07b Compare April 14, 2026 20:47

Copilot AI review requested due to automatic review settings April 14, 2026 20:51

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Copilot started reviewing on behalf of Aishwarya-Tonpe April 14, 2026 22:44 View session

Copilot started reviewing on behalf of Aishwarya-Tonpe April 14, 2026 23:07 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 12a65ad to 7632427 Compare April 20, 2026 18:28

Copilot AI review requested due to automatic review settings April 23, 2026 22:31

Copilot started reviewing on behalf of Aishwarya-Tonpe April 23, 2026 22:31 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch 2 times, most recently from dca9515 to 7094628 Compare April 23, 2026 22:37

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 7094628 to 2ca3e68 Compare April 29, 2026 23:39

polarG reviewed May 26, 2026

View reviewed changes

polarG requested a review from Copilot May 26, 2026 17:31

Copilot started reviewing on behalf of polarG May 26, 2026 17:32 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

guoshzhao reviewed May 26, 2026

View reviewed changes

		from superbench.benchmarks.micro_benchmarks.model_source_config import ModelSourceConfig


		@pytest.mark.skipif(os.environ.get('SB_TEST_HF_E2E', '0') != '1', reason='Skip HF E2E tests. Set SB_TEST_HF_E2E=1 to enable.')

		import onnx as onnx_lib
		onnx_model = onnx_lib.load(onnx_path)

Conversation

Aishwarya-Tonpe commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Testing

Usage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Aishwarya-Tonpe commented Apr 13, 2026 •

edited

Loading