feat: add HuggingFace Hub integration by AbhiPrasad · Pull Request #452 · braintrustdata/braintrust-sdk-python

Abhijeet Prasad (AbhiPrasad) · 2026-05-21T15:15:44Z

resolves #278

Adds native Braintrust tracing for the Hugging Face Hub Python SDK (huggingface_hub) via the integrations API. The integration supports huggingface-hub>=0.32.0 and is included in auto_instrument() by default.

How to enable it:

import braintrust
from braintrust import init_logger
from huggingface_hub import InferenceClient

logger = init_logger(project="my-project")
braintrust.auto_instrument()  # enables huggingface_hub unless disabled

client = InferenceClient(provider="auto")
with logger.start_span(name="hf request"):
    response = client.chat_completion(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=[{"role": "user", "content": "Say hello"}],
        max_tokens=32,
    )

The integration can also be enabled explicitly or disabled from the global auto-instrumentation call:

from braintrust.auto import auto_instrument

auto_instrument(huggingface_hub=True)
auto_instrument(huggingface_hub=False)  # opt out

For manual wrapping, wrap individual sync or async clients:

from braintrust.integrations.huggingface_hub import wrap_huggingface_hub
from huggingface_hub import AsyncInferenceClient, InferenceClient

client = wrap_huggingface_hub(InferenceClient(provider="auto"))
async_client = wrap_huggingface_hub(AsyncInferenceClient(provider="auto"))

Features added:

Traces sync and async InferenceClient calls for chat_completion, text_generation, feature_extraction, and sentence_similarity.
Covers the OpenAI-compatible chat alias client.chat.completions.create(...) because it proxies through chat_completion.
Supports non-streaming and streaming chat completions, including async streams, context manager finalization, early stream close handling, and nesting under parent Braintrust spans.
Captures span inputs, outputs, allowlisted request metadata, provider/model routing metadata, response identifiers, finish reasons, and token metrics from Hugging Face response usage/details fields.
Logs provider errors to the span before re-raising.
Adds VCR-backed coverage for latest and 0.32.0 Hugging Face Hub SDKs, plus an auto-instrumentation smoke test.
Adds the test_huggingface_hub nox session, dependency matrix entries, and cassette-directory mapping for provider-versioned cassette hygiene.

Adds native Braintrust tracing for the Hugging Face Hub Python SDK (`huggingface_hub`) via the integrations API. The integration supports `huggingface-hub>=0.32.0` and is included in `auto_instrument()` by default. How to enable it: ```python import braintrust from braintrust import init_logger from huggingface_hub import InferenceClient logger = init_logger(project="my-project") braintrust.auto_instrument() # enables huggingface_hub unless disabled client = InferenceClient(provider="auto") with logger.start_span(name="hf request"): response = client.chat_completion( model="meta-llama/Llama-3.1-8B-Instruct", messages=[{"role": "user", "content": "Say hello"}], max_tokens=32, ) ``` The integration can also be enabled explicitly or disabled from the global auto-instrumentation call: ```python from braintrust.auto import auto_instrument auto_instrument(huggingface_hub=True) auto_instrument(huggingface_hub=False) # opt out ``` For manual wrapping, wrap individual sync or async clients: ```python from braintrust.integrations.huggingface_hub import wrap_huggingface_hub from huggingface_hub import AsyncInferenceClient, InferenceClient client = wrap_huggingface_hub(InferenceClient(provider="auto")) async_client = wrap_huggingface_hub(AsyncInferenceClient(provider="auto")) ``` Features added: - Traces sync and async `InferenceClient` calls for `chat_completion`, `text_generation`, `feature_extraction`, and `sentence_similarity`. - Covers the OpenAI-compatible chat alias `client.chat.completions.create(...)` because it proxies through `chat_completion`. - Supports non-streaming and streaming chat completions, including async streams, context manager finalization, early stream close handling, and nesting under parent Braintrust spans. - Captures span inputs, outputs, allowlisted request metadata, provider/model routing metadata, response identifiers, finish reasons, and token metrics from Hugging Face response usage/details fields. - Logs provider errors to the span before re-raising. - Adds VCR-backed coverage for latest and 0.32.0 Hugging Face Hub SDKs, plus an auto-instrumentation smoke test. - Adds the `test_huggingface_hub` nox session, dependency matrix entries, and cassette-directory mapping for provider-versioned cassette hygiene.

Abhijeet Prasad (AbhiPrasad) requested a review from Luca Forstner (lforst) May 21, 2026 15:15

Abhijeet Prasad (AbhiPrasad) self-assigned this May 21, 2026

Abhijeet Prasad (AbhiPrasad) force-pushed the abhi-huggingface-py-sdk branch from 584eeed to d403163 Compare May 21, 2026 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add HuggingFace Hub integration#452

feat: add HuggingFace Hub integration#452
Abhijeet Prasad (AbhiPrasad) wants to merge 1 commit into
mainfrom
abhi-huggingface-py-sdk

Abhijeet Prasad (AbhiPrasad) commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Abhijeet Prasad (AbhiPrasad) commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Abhijeet Prasad (AbhiPrasad) commented May 21, 2026 •

edited

Loading