HyperQuant

HyperQuant is an open-source, memory-first compression stack for teams that are hitting a real resident-memory limit in vector, long-context, replay, or session-state workloads.

It is not positioned as a universal “better quantizer.” The product goal is narrower and more useful:

reduce resident memory in a way that can be measured, reproduced, and rolled out safely.

What problem it solves

Most infrastructure teams do not need another synthetic compression demo. They need a clear answer to one of these questions:

Why does resident state cap session density per node?
How many sessions fit into the current RAM budget if only a hot window stays resident?
Can long-context artifacts be tiered out of RAM without hiding quality loss?
Can route selection stay explicit instead of silently over-claiming wins?

Routes

HyperQuant ships four explicit routes.

conservative_codebook: trained codebook route for arbitrary numeric tensors.
vector_codec: training-free rotated-scalar route for vector-like data, with a residual rescue side-channel for the hardest coefficients.
context_codec: page-aware route for structured long-context data, with an explicit fail-closed contract.
resident_tier: resident-memory planning and tiered artifact store for bounded hot-window serving.

These routes are deliberately separate. The repository is easier to trust when it says where each path fits.

Why test it

The main reason to evaluate HyperQuant is not headline ratio. It is operating leverage:

fewer resident bytes per session,
higher session density under a fixed RAM budget,
reproducible route-level trade-offs,
offline tiered artifacts that can be reopened, verified, and read lazily by slice.

The flagship path is resident_tier. The other routes exist to make that path measurable and explainable.

Quick start

python -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
pytest -q

Generate a live-like dataset and project resident memory:

python -m hyperquant generate-live-data   --scenario online   --output /tmp/online.npy   --n-vectors 16384   --dim 128

python -m hyperquant resident-plan   --input /tmp/online.npy   --page-size 64   --group-size 128   --residual-topk 1   --hot-pages 8   --active-window-tokens 256   --concurrent-sessions 8   --runtime-value-bytes 2

Build, verify, and read from a tiered resident store:

python -m hyperquant build-resident-store   --input /tmp/online.npy   --output /tmp/resident_store   --page-size 64   --group-size 128   --residual-topk 1   --hot-pages 8

python -m hyperquant verify-resident-store   --store /tmp/resident_store

python -m hyperquant read-resident-slice   --store /tmp/resident_store   --start 0   --end 256   --output /tmp/window.npy

Compare the vector route against the dense rotation baseline:

python -m hyperquant vector-benchmark   --input /tmp/online.npy   --bits 3   --group-size 128   --residual-topk 1   --iterations 5   --warmup 1

python -m hyperquant dense-baseline-benchmark   --input /tmp/online.npy   --bits 3   --group-size 128   --residual-topk 1   --iterations 5   --warmup 1

Evaluate the structured context route with a fail-closed contract:

python -m hyperquant context-benchmark   --input /tmp/online.npy   --with-guarantee   --fail-closed

Regenerate the checked-in evidence pack:

python scripts/build_proof_pack.py

What to test first

Resident memory is the bottleneck

Start with:

resident-plan
build-resident-store
verify-resident-store
read-resident-slice
resident-benchmark

Vector compression throughput is the bottleneck

Start with:

vector-benchmark
dense-baseline-benchmark
route-benchmark

Structured long-context state is the bottleneck

Start with:

context-benchmark --with-guarantee --fail-closed
route-benchmark
context-audit-input

Evidence

The repository ships checked-in evidence under docs/evidence/ and a script that regenerates it from source.

Start with:

docs/evidence/route-benchmark.md
docs/evidence/resident-benchmark.md
docs/evidence/capacity-example.md

API surface

GET /healthz
GET /metrics
POST /v1/codebook/compress
POST /v1/codebook/decompress
POST /v1/vector/compress
POST /v1/vector/decompress
POST /v1/context/compress
POST /v1/context/decompress
POST /v1/resident/plan

What this repository proves

the four routes work end to end;
resident-memory planning is reproducible from source;
the resident store can be built, reopened, verified, and read lazily by slice;
route trade-offs can be measured with the included harness;
the context route can run with a fail-closed contract.

What it does not claim

universal superiority across every model or runtime;
end-task quality on your production traces without a pilot;
vendor-runtime integration such as auth, admission control, or multi-node cache coherence;
a drop-in replacement for serving engines or vector indexes.

That boundary is deliberate. It is more credible than pretending to be a universal winner.

License

Licensed under the Apache License, Version 2.0.

Synchatica is the project brand. See LICENSE and NOTICE.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
docs		docs
examples		examples
hyperquant		hyperquant
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HyperQuant

What problem it solves

Routes

Why test it

Quick start

What to test first

Resident memory is the bottleneck

Vector compression throughput is the bottleneck

Structured long-context state is the bottleneck

Evidence

API surface

What this repository proves

What it does not claim

Read next

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HyperQuant

What problem it solves

Routes

Why test it

Quick start

What to test first

Resident memory is the bottleneck

Vector compression throughput is the bottleneck

Structured long-context state is the bottleneck

Evidence

API surface

What this repository proves

What it does not claim

Read next

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages