Sonos' neural-network inference engine.
tract loads ONNX and NNEF models, optimises them, and runs them anywhere — from embedded ARM CPUs to NVIDIA / Apple GPUs, in the browser via WebAssembly, or on a Linux / macOS / Windows workstation. It is used in production at Sonos for wake-word and streaming speech-recognition workloads, and also runs LLM, text-to-image, and classical CV models with a particular focus on the translate-once / ship-tiny-runtime story enabled by its NNEF-based intermediate format (tract-OPL).
From examples/onnx-mobilenet-v2:
use tract::prelude::*;
tract::impl_ndarray_interop!();
let model = tract::onnx()?
.load("mobilenetv2-7.onnx")?
.into_model()?
.into_runnable()?;
let result = model.run([input.tract()?])?;The tract crate (api/rs/src/lib.rs) is the authoritative public API. The
internal crates (tract-core, tract-nnef, tract-onnx, ...) are not
stable surface and shouldn't be depended on directly.
For Python, see the tract package on PyPI.
examples/ has runnable demos covering the workloads tract
targets today:
| Example | What |
|---|---|
onnx-mobilenet-v2 |
Minimal CV starter |
tflite-mobilenet-v3 |
TFLite import path |
causal_llm |
Transformer text generation |
nemo-parakeet-asr / nemo-nemotron-streaming-asr |
Speech recognition, including streaming via pulsification |
stable-diffusion / stable-diffusion-3 / stable-diffusion-xl |
Text-to-image |
face_detection_yolov8onnx_example / face_similarity_arcface_onnx |
Modern object detection / face recognition |
wasm-model-bench |
Running tract in the browser |
Technical documentation lives under doc/ (start at doc/intro.md);
the doc/cli-recipe.md page collects practical CLI recipes.
The Sonos engineering blog
has a long-form post on tract internals.
tract is also available as the tract package on PyPI,
built on top of the same Rust core:
pip install tractThe API mirrors the Rust pipeline: load a model, set input facts, optimise, then run.
Documentation: sonos.github.io/tract. Source lives in api/py/.
| Backend | Crate | Notes |
|---|---|---|
| CPU (x86, ARMv6/7/8, ARM SVE) | tract-linalg |
Default. Hand-rolled SIMD micro-kernels. |
| Apple Metal | tract-metal |
Apple GPUs. |
| NVIDIA CUDA | tract-cuda |
NVIDIA GPUs. |
| WebAssembly | via standard wasm32 targets | Browser / WASI deployment. |
All backends share the TypedModel IR and the same loaders, so a model
optimised on one platform can be moved to another.
tract has first-class support for pulsified inference: a network that operates on full sequences during training is translated into one that processes a fixed-size pulse along its streaming axis at each step. This lets the same model serve both batch evaluation and low-latency real-time inference (wake-word, streaming ASR, ...).
The translate-time logic lives in tract-pulse; runtime ships only the
small tract-pulse-opl crate. See
AGENTS.md § Streaming and pulsification
for the engineering view, and
examples/nemo-nemotron-streaming-asr
for a working demo.
| Format | Load | Save |
|---|---|---|
| ONNX | ✓ | — |
| NNEF (+ tract-OPL extensions) | ✓ | ✓ |
| TensorFlow Lite (legacy) | ✓ | ✓ |
| TensorFlow 1 frozen graph (legacy) | ✓ | — |
PyTorch models can be exported directly to NNEF using torch-to-nnef (source), an open-source PyTorch-to-NNEF converter maintained alongside tract — useful when you want to skip the detour through ONNX.
tract-OPL is an NNEF-compatible intermediate representation that extends NNEF with the operators needed to express a full tract-core model. The recommended deployment workflow is:
- Once, at build time: convert from ONNX / TF / TFLite to NNEF using
the
tractCLI:tract model.onnx dump --nnef model.nnef.tgz
- At runtime: ship only
tract-core+tract-nnef, plustract-onnx-oplif the model uses ONNX-only operators, andtract-pulse-oplif it is pulsified.
This keeps the runtime footprint small (no protobuf, no training-framework
loaders). See doc/intro.md for the full design rationale.
NNEF parts are tied to the NNEF specification and very stable. tract-OPL extensions are a bit more in flux, but we observe the rule:
A model serialised with tract
0.x.yshould work with tract0.x.zwherez >= y.
Models embed a tract_nnef_ser_version property identifying the generating
tract version; tract itself does not enforce a version check, so it is up
to the application to do so if needed. See CHANGELOG.md
for the running list of notable serialisation-format changes.
tract still loads TF1 frozen graphs and supports the operator set needed for the classical CV and wake-word models that originally drove its design (Inception v3, Snips wake words, ...). TensorFlow 2 is not directly supported — convert to ONNX first.
Files in tensorflow/protos are copied from the
TensorFlow project and files in
onnx/protos from the ONNX project; neither
is covered by the licence statement below.
All original work is licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any Contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
