ColorGPT

A language model's vocabulary, rendered as color and sound.

ColorGPT takes the input-token embedding matrix of a small language model — Qwen 2.5 0.5B, ~152k tokens × 896 dimensions — projects it to three dimensions with UMAP, and uses those coordinates as both a perceptual color space (OKLab → sRGB) and a sonic one (Web Audio). Each of the model's tokens gets a fixed color and a fixed timbre. Generation, reading, and dialogue then become chromatic and acoustic events, played out in real time.

The piece is an attempt to render the model's mental geography of language — not what words mean to us, but how a network with no eyes, no ears, and no body has learned to organize symbols among themselves.

What this is — and what it is not

This is not visual synesthesia. The token blue is not blue. scarlet is not red.

What you see is the geometry of the model's input-embedding space, projected into three dimensions and assigned to color and sound. Tokens that are close in that space are close in color and timbre. So blue / azure / navy will cluster — not because they describe blue things, but because the model has learned they appear in similar contexts. The same is true for Monday / Tuesday / Wednesday, for inflections of a single verb, and for the BPE fragments ing, ed, ly.

A second commitment: ColorGPT renders subword tokens, not words. The model does not see "scarlet"; it sees scar followed by let, with two unrelated colors. Word-averaging would smooth this away and lie about how the model perceives text. The fragmentation is the point.

How it works

Qwen 2.5 0.5B input embeddings (vocab × 896)
    ↓ UMAP (cosine metric, 3D)
3D coordinates per token
    ↓ axis 0  → OKLab L                ↓ all three axes
    ↓ axis 1  → OKLab a   →  sRGB      → normalized [0,255]³
    ↓ axis 2  → OKLab b                  driving Web Audio synth
                                         (pitch, timbre, pan)

The same UMAP coordinates drive both modalities. A token's color and its sound are coupled — they are two readings of the same vector. Dialogue mode hard-pans speaker A to the left channel and speaker B to the right; otherwise the audio is a function of the token alone, not who said it.

OKLab is used because it is perceptually uniform: equal Euclidean distances in OKLab correspond to equal perceived color differences. This means small movements in embedding space produce small movements in apparent color, and large movements produce large ones — the projection is an honest one rather than a function of sRGB's well-known non-uniformities.

Three live modes

All three are streamed over Server-Sent Events. The browser receives {id, text, rgb, u, source, hold_ms} per token and paints / sounds it.

speaker

A single instance of Qwen autoregresses from a prompt. A literary-prose prefix is prepended to nudge the model away from QA-style continuations. Non-Latin tokens are suppressed via bad_words_ids — we want the script the LUT was tuned for.

reader

A corpus file is tokenized and emitted in order. No generation, no sampling — pure transcription, the corpus passed through the model's vocabulary as colored cells. Provided corpora are public-domain English: the King James Bible (Genesis 1, 1 Corinthians 13, Ecclesiastes 3, John 1) and Conrad's Heart of Darkness.

dialogue

Two Qwen instances pass the rolling transcript back and forth. Even turns are speaker A, odd turns are speaker B. Same weights, separate KV state per turn. In the audio, A is panned hard left, B is panned hard right — the room becomes stereo conversation.

Pacing — punctuation as heartbeat

Pacing lives in pacing.py, downstream of the streams. Every token gets a base hold of 1 / base_tps (default 500 ms). On top of that, tokens ending in punctuation receive a tiered additional pause:

punctuation	pause (ms)	role
`. ! ?` `．` `。`	700	sentence-end
`…`	900	trailing-off
`;`	400	clause
`:`	350	clause
`, — –` `，`	220	phrase
`\n`	450	line break
closers `) ] " ' ” ’`	140–180	release

A . is always rendered as rgb(0, 143, 87) and held for 700 ms. The piece has a heartbeat: punctuation marks become visual and sonic punctuation, identical across modes.

Static printable artifacts

histogram.py renders four image types from any text. All are produced PNGs intended for print or wall display.

render	what it shows
transcript	Every content token of a corpus packed into a perfect ⌈√N⌉ × ⌈√N⌉ grid, with a hue-sorted palette strip below. Verse structure is deliberately dropped — the corpus collapses into a square of color.
palette	Frequency-weighted bars sorted by hue. The corpus's chromatic fingerprint as a Pantone fan.
reading-map	A per-corpus grid of unique tokens, each cell labeled with its decoded text, sized for printing as a handout (25 mm cells) or wall card (15 mm cells). The Rosetta stone for a corpus.
vocab atlas	The whole 151,936-token vocabulary as a √V × √V chromatic atlas. Sortable by token ID (training-derived structure: common merges first, then rarer / CJK-byte-fallback tokens) or by hue (chromatic distribution).

transcript · John 1 (KJV)

Every content token of the passage as a colored cell, packed into a square. The hue-sorted palette underneath is the corpus's chromatic fingerprint.

reading-map · Heart of Darkness

The whole vocabulary of Conrad's novella as a labeled grid — the Rosetta stone for a corpus, sized for a wall card.

vocab atlas · corpus filter

The atlas restricted to the tokens present in a single corpus — a chromatic fingerprint at the vocabulary level. John 1 (146 unique tokens) on the left, Genesis 1 (187) on the right.

John 1	Genesis 1

Generate your own with the snippet under Run below.

Demo

demo.mp4

Run

Requires Python 3.11+. First-time setup, in PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python build_lut.py     # ~6 minutes, one-time precompute
python server.py        # http://127.0.0.1:5000

build_lut.py produces three files: lut.bin (the token → RGB lookup, ~445 KB), lut_meta.json (vocab size, model id, parameters), and umap_3d.npy (cached 3D coords, so OKLab tuning can be re-rendered without re-running UMAP). All three are regenerable and gitignored.

Press F for fullscreen, or open ?fullscreen=1 directly. Audio starts on first user interaction (browser policy).

For static renders without the server:

python visualize.py "the quick brown fox"        # one-off colored strip → out.html
python -c "from histogram import render_transcript; render_transcript(open('corpus/john1.txt').read()).save('john1.png')"

File map

file	role
`build_lut.py`	One-time LUT precompute (embeddings → UMAP → OKLab → sRGB).
`engine.py`	Lazy-loaded shared state: model, tokenizer, color LUT, audio LUT.
`streams.py`	The three modes. Each yields token events; bounded `Queue(maxsize=1)` for backpressure.
`pacing.py`	Base TPS + tiered punctuation pauses. Single source of truth for timing.
`server.py`	Flask + SSE. Serves the UI, three streams, corpus uploads, static renders, filter bitmaps.
`histogram.py`	Static PNG renders (transcript, palette, reading-map, vocab atlas).
`visualize.py`	Standalone colored-strip HTML for arbitrary text.
`templates/index.html`	UI + Web Audio synth + client-side vocab canvas.
`corpus/`	Public-domain text (KJV passages, Heart of Darkness).

Physical installation

A physical instantiation of ColorGPT is in development.

Acknowledgements

ColorGPT uses Qwen 2.5 0.5B (Apache 2.0) for both the embedding source and the live generation. UMAP is umap-learn. OKLab is Björn Ottosson's perceptual color space (2020). Provided corpora are public domain.

Citation

See CITATION.cff — GitHub renders a "Cite this repository" widget from it. If you write about the work, the dual license below applies.

License

Dual-licensed:

Code (everything except README.md, CITATION.cff, and any future docs/) — Apache License 2.0. See also NOTICE.
Writeup (README.md and any future docs/*.md) — CC BY 4.0.

The split exists because the code is meant to be reused and the prose is meant to be cited. Apache 2.0 carries an explicit patent grant and a retaliation clause — defensive cover for a project working in a domain where patent activity is increasing. CC BY 4.0 preserves the conceptual claim: anyone may quote, adapt, or translate the framing of ColorGPT, but must credit the original.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ColorGPT

What this is — and what it is not

How it works

Three live modes

speaker

reader

dialogue

Pacing — punctuation as heartbeat

Static printable artifacts

transcript · John 1 (KJV)

reading-map · Heart of Darkness

vocab atlas · corpus filter

Demo

Run

File map

Physical installation

Acknowledgements

Citation

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
corpus		corpus
templates		templates
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
LICENSE-writeup.md		LICENSE-writeup.md
NOTICE		NOTICE
README.md		README.md
build_lut.py		build_lut.py
engine.py		engine.py
histogram.py		histogram.py
lut_meta.json		lut_meta.json
pacing.py		pacing.py
render_assets.py		render_assets.py
requirements.txt		requirements.txt
server.py		server.py
streams.py		streams.py
visualize.py		visualize.py

Folders and files

Latest commit

History

Repository files navigation

ColorGPT

What this is — and what it is not

How it works

Three live modes

speaker

reader

dialogue

Pacing — punctuation as heartbeat

Static printable artifacts

transcript · John 1 (KJV)

reading-map · Heart of Darkness

vocab atlas · corpus filter

Demo

Run

File map

Physical installation

Acknowledgements

Citation

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages