ScratchForge

Make Scratch projects in code. ScratchForge is a Python compiler that emits real .sb3 files — so you can generate the kind of thing that's miserable to wire by hand in the block editor. Two headline demos:

A solid, shaded, spinning 3D cube — backface culling, flat lighting, pen scanline fill. Scratch has no 3D; we software-render it. (image above is the actual pen output captured from the Scratch VM.)
A real transformer LLM running in Scratch — Karpathy's stories260K, plus a fine-tuned chatbot, generated as .sb3 and verified bit-for-bit against a NumPy reference. See llm/.

Everything is verified by running it headless in the real scratch-vm (and, for the renderers, by capturing the pen output to a PNG).

Try it instantly: the demos/ folder has ready-to-play .sb3 files — drag one onto turbowarp.org and press the green flag: chat_4.6M.sb3 (the chatbot), cube_solid.sb3 (the 3D cube), story_generator.sb3, catch.sb3.

Quickstart

git clone <this-repo> && cd ScratchForge
npm install                       # scratch-vm, for headless verification

python scripts/demo_cube_solid.py # solid shaded cube   -> out/cube_solid.sb3
python scripts/demo_catch.py      # a playable game     -> out/catch.sb3

# LLM / chatbot in Scratch (see llm/README.md)
python llm/download_model.py      # fetch the tiny model + tokenizer
python llm/gen_scratch_llm.py chat   # -> out/llm_chat.sb3

Drag any .sb3 onto https://turbowarp.org (or open in the Scratch desktop editor) and press the green flag. TurboWarp compiles the project to JavaScript, which is what makes the heavy math run fast.

A real LLM running in Scratch

The headline demo: a full transformer language model — Karpathy's stories260K, plus chatbots fine-tuned on TinyChat — generated as a .sb3 and verified bit-for-bit against a NumPy reference. Nothing faked: a ~1.5k-block interpreter runs the matmuls, RMSNorm, RoPE, grouped-query attention and SwiGLU in runtime loops, with the weights stored as int8-quantized lists.

How it chats: you type → a BPE tokenizer built in Scratch blocks encodes it → a forward pass runs per token → the reply streams into a chat-log list.
Two chatbots (demos/): chat_1.2M.sb3 (fast) and chat_4.6M.sb3 (smarter — roughly CraftGPT scale). Trained on TinyChat, so they make simple small-talk: "how are you today" → "I am feeling very happy today, thank you for asking."
Verified, not vibes: compare.py confirms the Scratch logits equal the NumPy reference (argmax-exact under int8); the tokenizer is checked token-for-token.
Honest limits: these are deliberately tiny models (the size that fits in a Scratch runtime), so they chat about everyday things but aren't real assistants. And the projects exceed scratch.mit.edu's upload cap, so they run in TurboWarp (which is also what makes them fast).

The full train → generate → verify pipeline, model table, and the inference-engine internals are documented in llm/.

What a Scratch project actually is

A .sb3 file is just a ZIP archive:

mygame.sb3
├── project.json          the entire project as data (sprites, scripts, vars)
├── <md5>.svg             costume/sound assets, named by their md5 hash
└── <md5>.wav

project.json has targets (the Stage + each sprite). Every target carries its costumes, sounds, variables, and a blocks dictionary — and that blocks dict is the whole ballgame.

The blocks format (what we learned from a real project)

Each block is keyed by a unique id and stored flat. Scripts are linked lists: a block points to the one below it via next, and back up via parent. Hat blocks (e.g. event_whenflagclicked) carry topLevel: true plus x/y.

"b2": {
  "opcode": "motion_movesteps",
  "next": "b3", "parent": "b1",
  "inputs": { "STEPS": [1, [4, "10"]] },
  "fields": {},
  "topLevel": false
}

Inputs use shaped arrays, where the leading number is the "mode":

Shape	Meaning
`[1, [4, "10"]]`	a literal shadow — `4`=number, `10`=string
`[1, "menuId"]`	a dropdown menu (itself a shadow sub-block)
`[2, "blockId"]`	a block plugged in — used for booleans & substacks
`[3, "blockId", [4, "0"]]`	a reporter obscuring a shadow
`[3, [12, "name", "id"]]`	a variable, inlined as a primitive (`12`=var, `13`=list, `11`=broadcast, `9`=color)

That's the entire trick. ScratchForge's engine (scratchforge/sb3.py) hides all of it: you build a tree of blocks, it assigns ids, wires next/parent, and emits the correct shapes.

Writing a game

from scratchforge import Project
from scratchforge.blocks import *

p = Project()
score = p.stage.add_variable("score", 0)

player = p.sprite("Player", x=0, y=-140)
player.add_script(
    when_flag_clicked(),
    forever(
        if_(key_pressed("right arrow"), change_x(7)),
        if_(key_pressed("left arrow"),  change_x(-7)),
    ),
)

p.save("out/game.sb3")

Stack blocks (move, say, change_x) chain top-to-bottom.
Control blocks (forever, if_, repeat, repeat_until) take their body as trailing arguments: forever(move(10), if_on_edge_bounce()).
Reporters/booleans nest as arguments: move(mul(speed, 2)), if_(and_(key_pressed("space"), touching("Target")), ...).
Anything numeric accepts a literal, a variable ref, or a reporter block.

The DSL (scratchforge/blocks.py) covers events, motion, looks, control, sensing, operators, variables/lists, and pen. Adding a new block is one small function.

Repo layout

scratchforge/
  sb3.py        core engine: Project / Target / serializer / .sb3 writer
  blocks.py     the block DSL (one function per Scratch block)
scripts/
  download_project.py   pull any shared project off scratch.mit.edu as .sb3
  inspect_project.py    pretty-print a project.json to study its scripts
  verify_sb3.js         load a generated .sb3 in the real Scratch VM (validation)
  capture_pen.js        run in the VM with a mock renderer; record every pen line
  render_pen.py         draw captured pen lines to a PNG so you can see the output
  demo_catch.py         playable demo game
  demo_cube3d.py        spinning 3D wireframe cube (the 3D proof-of-concept)
examples/       real projects downloaded for study (gitignored)
out/            generated .sb3 files (gitignored)

Learning from real projects

python scripts/download_project.py 1215331604   # -> examples/<id>.sb3
python scripts/inspect_project.py examples/1215331604.project.json

The downloader handles Scratch's per-project token flow and pulls every asset. inspect_project.py walks each script so you can see exactly how the blocks of a working game are wired.

Validation

scripts/verify_sb3.js loads a generated .sb3 in the actual scratch-vm (npm install already done), runs the green flag, and reports active threads. Because scratch-parser validates project.json during load, a clean load is strong proof the file is valid Scratch — not just well-formed JSON.

node scripts/verify_sb3.js out/cube3d.sb3   # -> VERIFY: PASS

The road to 3D

Scratch is a 2D engine — there are no 3D blocks. "3D in Scratch" means software rendering: hold 3D points in variables/lists, rotate + perspective-project them to 2D each frame with the math operators, and draw the result with pen lines (or stamped sprites). demo_cube3d.py already does this for a wireframe cube — it compiles to ~330 blocks from ~40 lines of Python, which is the whole argument for generating Scratch instead of clicking it together.

Speed: "run without screen refresh." A plain repeat loop yields to the screen after every iteration, so you literally watch a 12-edge cube draw edge by edge. The fix is to wrap the render in a custom block with the warp flag (sprite.define_proc("draw cube", warp=True)), so the entire frame draws in one tick. capture_pen.js proves it: it records every penLine the VM emits and shows all 24 draw-calls landing in a single step instead of dribbling out. render_pen.py then turns those captured lines into a PNG — that's how the cube above was verified without opening a browser.

Natural next steps from here:

Filled triangles — painter's-algorithm sort by depth, stamp/pen-fill faces for a solid (not wireframe) look.
A camera + scene graph in the DSL — author Mesh, Camera, transform in Python; emit the per-vertex projection blocks automatically.
Reusable custom blocks (procedures) — done: define_proc(...) emits procedures_definition with the warp flag. Next, give the projection math its own project(i) block so it's defined once and called, shrinking block count.
Raycaster path — a Wolfenstein-style first-person renderer (cast rays over columns, draw vertical pen strips) for actual playable 3D games.

The pattern throughout: prototype the algorithm in plain Python first (you can see/plot the numbers), then emit the identical logic as Scratch blocks. That's how demo_cube3d.py's projection was checked before trusting it on the stage.

Credits

stories260K tiny LLM and the tok512 tokenizer — Andrej Karpathy's llama2.c (tinyllamas), MIT licensed.
TinyChat dataset (chatbot fine-tuning) — starhopp3r/TinyChat, the same dataset behind CraftGPT.
scratch-vm — used to run generated projects headless for verification.
TurboWarp — compiles .sb3 to JS for fast playback.

License: MIT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScratchForge

Quickstart

A real LLM running in Scratch

What a Scratch project actually is

The blocks format (what we learned from a real project)

Writing a game

Repo layout

Learning from real projects

Validation

The road to 3D

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
demos		demos
llm		llm
scratchforge		scratchforge
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

ScratchForge

Quickstart

A real LLM running in Scratch

What a Scratch project actually is

The blocks format (what we learned from a real project)

Writing a game

Repo layout

Learning from real projects

Validation

The road to 3D

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages