A Query.Farm VGI worker for DuckDB.
vgi-audio · a Query.Farm VGI worker · powered by librosa
A VGI worker that brings audio feature extraction into
DuckDB/SQL. It runs librosa over audio you reference by
file path or pass as raw bytes (BLOB) — pulling out duration, sample
rate, tempo, MFCCs, spectral features, a heuristic musical key, and beat onset
times — as plain SQL functions.
INSTALL vgi FROM community; LOAD vgi;
ATTACH 'audio' (TYPE vgi, LOCATION 'uv run audio_worker.py');
SELECT audio.duration('/tmp/tone.wav'); -- 2.0 (seconds)
SELECT audio.sample_rate('/tmp/tone.wav'); -- 22050 (Hz)
SELECT audio.channels('/tmp/tone.wav'); -- 1
SELECT audio.tempo('/tmp/click.wav'); -- ~120.0 (estimated BPM)
SELECT audio.rms_energy('/tmp/tone.wav'); -- mean RMS energy
SELECT audio.zero_crossing_rate('/tmp/tone.wav'); -- mean ZCR
SELECT audio.spectral_centroid('/tmp/tone.wav'); -- Hz (spectrum centre of mass)
SELECT audio.spectral_bandwidth('/tmp/tone.wav'); -- Hz
SELECT audio.mfcc('/tmp/tone.wav'); -- DOUBLE[13]
SELECT audio.mfcc('/tmp/tone.wav', 20); -- DOUBLE[20]
SELECT audio.estimated_key('/tmp/tone.wav'); -- 'C major' (heuristic!)
SELECT * FROM audio.beats('/tmp/click.wav'); -- (seq, time) beat onsets
SELECT * FROM audio.audio_info('/tmp/tone.wav'); -- (duration, sample_rate, channels)Every function accepts the audio two ways:
- a
VARCHARpath — the worker opens the file from its own filesystem; or - a
BLOB— raw audio bytes that travel over Arrow (e.g. fromread_blob(...)), decoded in-memory.
-- bytes straight from DuckDB's read_blob:
SELECT file, audio.duration(content)
FROM read_blob('recordings/*.wav');Under the hood each feature is registered as two same-name overloads — one
whose first argument is VARCHAR, one BLOB — and DuckDB resolves which to call
by the column type. (mfcc additionally has an optional coefficient-count
argument, so it has path/bytes × default/n overloads.)
-
Scalars take positional arguments only and resolve overloads by type/arity (DuckDB's
name := valuesyntax is a table-function feature, not a scalar one). Every per-row answer is a scalar, so it works inline in any projection or predicate:duration,sample_rate,channels,tempo,rms_energy,zero_crossing_rate,spectral_centroid,spectral_bandwidth,mfcc,estimated_key. -
Table functions return rows:
beats(audio)→(seq BIGINT, time DOUBLE)beat onset times;audio_info(audio)→ a single(duration DOUBLE, sample_rate INT, channels INT)row.
A SQL worker is fed whatever is in the column. Malformed, truncated, absurdly large, or maliciously crafted audio must never crash the worker process, so:
- every decode/analysis is wrapped per row; failures return NULL (scalars) or no rows (table functions) instead of raising;
NULLinput →NULLoutput / no rows;- decoded audio is bounded — at most 30 minutes (
MAX_DURATION_SECONDS) is materialised, so a corrupt header claiming billions of frames can't exhaust memory; longer inputs are analysed on a truncated prefix; - analysis runs at a fixed 22.05 kHz mono internally, bounding per-frame work.
A garbage blob next to a valid one yields NULL for the bad row and the correct
value for the good row — the process keeps serving.
- WAV / FLAC / OGG decode natively via
soundfile(libsndfile, bundled in the wheel) — no external tools required. - Compressed formats (mp3, m4a, ...) require
ffmpeg(oraudioread) installed on the worker host. If neither is present, those decodes returnNULLrather than crashing.
Python dependencies and their licenses:
| Library | Purpose | License |
|---|---|---|
librosa |
feature extraction | ISC |
soundfile |
WAV/FLAC/OGG decode | BSD-3-Clause |
numpy |
array math | BSD-3-Clause |
This worker itself is MIT licensed (see LICENSE).
estimated_key is a heuristic chroma-based (Krumhansl-Schmuckler) estimate —
useful as a hint, not ground truth.
audio_worker.py repo-root stdio entry point; PEP 723 inline deps; main()
vgi_audio/
features.py pure feature logic over librosa/soundfile/numpy; no Arrow/VGI; total
scalars.py per-row scalars (path/bytes overloads, mfcc n overload)
tables.py table functions: beats, audio_info (path/bytes overloads)
schema_utils.py pa.Field comment / column-doc helper
tests/ pytest: test_features (pure), test_tables (in-proc), test_scalars (Client RPC)
test/sql/*.test haybarn-unittest sqllogictest — authoritative E2E
Makefile test / test-unit / test-sql / lint
uv sync --extra dev
uv run pytest -q # unit + integration
make test-sql # end-to-end SQL via haybarn-unittest
uv run ruff check . && uv run mypy vgi_audio/Test audio is generated deterministically in-process (numpy sine tones / click
tracks written to in-memory WAV via soundfile); a couple of tiny committed WAV
fixtures under test/sql/data/ drive the SQL E2E.
Written by Query.Farm.
Copyright 2026 Query Farm LLC - https://query.farm
