Skip to content

feat: add HuggingFace Hub integration#452

Open
Abhijeet Prasad (AbhiPrasad) wants to merge 1 commit into
mainfrom
abhi-huggingface-py-sdk
Open

feat: add HuggingFace Hub integration#452
Abhijeet Prasad (AbhiPrasad) wants to merge 1 commit into
mainfrom
abhi-huggingface-py-sdk

Conversation

@AbhiPrasad
Copy link
Copy Markdown
Member

@AbhiPrasad Abhijeet Prasad (AbhiPrasad) commented May 21, 2026

resolves #278

Adds native Braintrust tracing for the Hugging Face Hub Python SDK (huggingface_hub) via the integrations API. The integration supports huggingface-hub>=0.32.0 and is included in auto_instrument() by default.

How to enable it:

import braintrust
from braintrust import init_logger
from huggingface_hub import InferenceClient

logger = init_logger(project="my-project")
braintrust.auto_instrument()  # enables huggingface_hub unless disabled

client = InferenceClient(provider="auto")
with logger.start_span(name="hf request"):
    response = client.chat_completion(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=[{"role": "user", "content": "Say hello"}],
        max_tokens=32,
    )

The integration can also be enabled explicitly or disabled from the global auto-instrumentation call:

from braintrust.auto import auto_instrument

auto_instrument(huggingface_hub=True)
auto_instrument(huggingface_hub=False)  # opt out

For manual wrapping, wrap individual sync or async clients:

from braintrust.integrations.huggingface_hub import wrap_huggingface_hub
from huggingface_hub import AsyncInferenceClient, InferenceClient

client = wrap_huggingface_hub(InferenceClient(provider="auto"))
async_client = wrap_huggingface_hub(AsyncInferenceClient(provider="auto"))

Features added:

  • Traces sync and async InferenceClient calls for chat_completion, text_generation, feature_extraction, and sentence_similarity.
  • Covers the OpenAI-compatible chat alias client.chat.completions.create(...) because it proxies through chat_completion.
  • Supports non-streaming and streaming chat completions, including async streams, context manager finalization, early stream close handling, and nesting under parent Braintrust spans.
  • Captures span inputs, outputs, allowlisted request metadata, provider/model routing metadata, response identifiers, finish reasons, and token metrics from Hugging Face response usage/details fields.
  • Logs provider errors to the span before re-raising.
  • Adds VCR-backed coverage for latest and 0.32.0 Hugging Face Hub SDKs, plus an auto-instrumentation smoke test.
  • Adds the test_huggingface_hub nox session, dependency matrix entries, and cassette-directory mapping for provider-versioned cassette hygiene.

Adds native Braintrust tracing for the Hugging Face Hub Python SDK
(`huggingface_hub`) via the integrations API. The integration supports
`huggingface-hub>=0.32.0` and is included in `auto_instrument()` by default.

How to enable it:

```python
import braintrust
from braintrust import init_logger
from huggingface_hub import InferenceClient

logger = init_logger(project="my-project")
braintrust.auto_instrument()  # enables huggingface_hub unless disabled

client = InferenceClient(provider="auto")
with logger.start_span(name="hf request"):
    response = client.chat_completion(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=[{"role": "user", "content": "Say hello"}],
        max_tokens=32,
    )
```

The integration can also be enabled explicitly or disabled from the global
auto-instrumentation call:

```python
from braintrust.auto import auto_instrument

auto_instrument(huggingface_hub=True)
auto_instrument(huggingface_hub=False)  # opt out
```

For manual wrapping, wrap individual sync or async clients:

```python
from braintrust.integrations.huggingface_hub import wrap_huggingface_hub
from huggingface_hub import AsyncInferenceClient, InferenceClient

client = wrap_huggingface_hub(InferenceClient(provider="auto"))
async_client = wrap_huggingface_hub(AsyncInferenceClient(provider="auto"))
```

Features added:

- Traces sync and async `InferenceClient` calls for `chat_completion`,
  `text_generation`, `feature_extraction`, and `sentence_similarity`.
- Covers the OpenAI-compatible chat alias
  `client.chat.completions.create(...)` because it proxies through
  `chat_completion`.
- Supports non-streaming and streaming chat completions, including async
  streams, context manager finalization, early stream close handling, and
  nesting under parent Braintrust spans.
- Captures span inputs, outputs, allowlisted request metadata, provider/model
  routing metadata, response identifiers, finish reasons, and token metrics
  from Hugging Face response usage/details fields.
- Logs provider errors to the span before re-raising.
- Adds VCR-backed coverage for latest and 0.32.0 Hugging Face Hub SDKs, plus
  an auto-instrumentation smoke test.
- Adds the `test_huggingface_hub` nox session, dependency matrix entries, and
  cassette-directory mapping for provider-versioned cassette hygiene.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add HuggingFace Hub Python SDK integration for chat completion, text generation, and embedding instrumentation

1 participant