Reading the recursive-reasoning thread critically — HRM, TRM, and what their results actually show.
A small class of models has been making an outsized claim: that you can get strong abstract-reasoning results from a tiny network that iterates, rather than a giant one that's scaled. TRM (Tiny Recursive Models, ~7M parameters) reportedly reaches ~45% on ARC-AGI-1 — and won the ARC Prize 2025 Paper Award. That's worth understanding properly, and worth reading critically: later analyses argue a good share of the performance comes from efficiency, task-specific conditioning and heavy test-time compute rather than "deep reasoning."
This repo is my literature map and mechanism explainer for that thread — what the idea is, how TRM actually works, and how to read its numbers without over-claiming.
Round 1 — analysis & explainer. This first pass is public-information analysis (literature map, mechanism explainer, an original architecture diagram, and a critical reading). A later round will add hands-on reproductions — small, honest, dated runs using the official MIT code — see the Roadmap.
Attribution discipline. Every reported number below is the source's, not mine, and is cited as such. Dates and links are in Sources, re-verified 2026-06-20.
Deeper notes live in notes/: a literature map
(HRM → TRM → critiques/variants) and a mechanism explainer.
Most large language models reason autoregressively — one token at a time, with "thinking" externalised as more generated text. Recursive-reasoning models make a different bet: keep the network tiny and instead iterate internally. The model holds a latent "scratchpad" and a current answer, and repeatedly refines both over many steps before committing. The slogan version: test-time compute as effective depth — depth you get by looping a small network, not by stacking layers or scaling parameters.
It's a draft-then-revise loop rather than a left-to-right decode: propose an answer, critique it against the problem, revise, repeat. The appeal is efficiency — if it works, you get strong reasoning from a model small enough to run inside a sandbox with no internet and no hosted API.
TRM — "Less is More: Recursive Reasoning with Tiny Networks" (Alexia Jolicoeur-Martineau, Samsung SAIL Montréal, arXiv:2510.04871):
- Size: a single shared ~7M-parameter, 2-layer network — recursing, not deep.
- Mechanism: it recursively updates a latent state
zand an answer embeddingygiven the question, the current answer and the current latent — progressively improving the answer over a fixed budget of supervision steps. Trained from scratch with heavy augmentation (not few-shot prompting). Full walk-through innotes/how-trm-works.md. - Reported results (the paper's, not mine): ~45% on ARC-AGI-1, ~8% on ARC-AGI-2, Sudoku-Extreme 87.4%, Maze-Hard 85.3% — reported to beat much larger models (DeepSeek-R1, o3-mini-high, Gemini 2.5 Pro) on those public evals with <0.01% of their parameters.
- Lineage: builds on HRM (Hierarchical Reasoning Model, arXiv:2506.21734); TRM removes HRM's two-module hierarchy and fixed-point math, and is simpler for it.
- Recognition: ARC Prize 2025 Paper Award (1st place) (ARC Prize 2025 Technical Report).
- Code:
SamsungSAILMontreal/TinyRecursiveModels— MIT licensed, so it's safe to reproduce and build on.
Figure: the TRM architecture, from the official implementation (SamsungSAILMontreal/TinyRecursiveModels) — © 2025 Samsung Electronics Co., Ltd., reused under the MIT License. See CREDITS.
The headline — "7M params beats frontier models" — is the kind of claim that should invite scrutiny rather than applause, so here's the balancing view.
A follow-up analysis, "Tiny Recursive Models on ARC-AGI-1: Inductive Biases, Identity Conditioning, and Test-Time Compute" (arXiv:2512.11847), argues TRM's ARC-AGI-1 result "arises from an interaction between efficiency, task-specific conditioning, and aggressive test-time compute rather than deep internal reasoning." Specifically:
- Test-time compute does heavy lifting. A voting pipeline with ~1000 samples adds roughly +11 percentage points over single-pass inference.
- Puzzle-ID (identity) conditioning is load-bearing. Replacing the puzzle identifier with blank or random tokens drops accuracy to zero — the model leans hard on task-specific identifiers, not purely on the grids.
- The recursion is shallow in practice. Most of the accuracy appears at the first recursion step and plateaus after a few latent updates — so "deep iterative reasoning" overstates what's happening.
- The efficiency is real, though. TRM genuinely uses far less memory and runs at higher throughput than fine-tuned multi-billion-parameter baselines.
The honest synthesis: TRM is a genuinely efficient architecture with a clever training recipe, and the "tiny beats giant" framing is partly an artefact of task-specific conditioning plus a lot of test-time sampling. Both things are true — and a public repo about it should say so. (Related variants, e.g. a Mamba-2 attention hybrid, arXiv:2602.12078, are tracked in the literature map.)
Recursive tiny models fit the ARC Prize 2026 sandbox unusually well: they're
small, self-contained, and need no internet or hosted API — exactly what the
no-internet, open-source rules
require, where giant hosted LLMs are disallowed. That makes this thread directly
relevant to the static track. Background and the broader solver lineage are in the
arc-agi hub and its
approaches tour;
the static track itself is arc-agi-2.
Planned for the next round:
recursive-reasoning-models/
└── reproductions/
├── README.md # what was run, on what hardware, what was found
└── probe_trm.ipynb # a small, tractable run/probe on the official MIT repo
The plan: run something small and honest (e.g. a Sudoku subset or a handful of ARC tasks) on available hardware using the official MIT code, and write up exactly what was run, the settings, and the observations — clearly separating my runs from the reported numbers. A clear partial reproduction with caveats beats an unsupported "I replicated SOTA."
Re-verified 2026-06-20. Reported metrics are the cited authors', not mine.
- Less is More: Recursive Reasoning with Tiny Networks (TRM) — https://arxiv.org/abs/2510.04871
- Hierarchical Reasoning Model (HRM) — https://arxiv.org/abs/2506.21734
- Tiny Recursive Models on ARC-AGI-1: Inductive Biases, Identity Conditioning, and Test-Time Compute — https://arxiv.org/abs/2512.11847
- Tiny Recursive Reasoning with Mamba-2 Attention Hybrid — https://arxiv.org/abs/2602.12078
- ARC Prize 2025: Technical Report — https://arxiv.org/abs/2601.10904
- Official code:
SamsungSAILMontreal/TinyRecursiveModels(MIT) — https://github.com/SamsungSAILMontreal/TinyRecursiveModels
Prose in this repo is © 2026 Antonio Rodriguez-Moral, licensed
CC BY 4.0; code is MIT. The
architecture figure (figures/TRM_fig.png) is © 2025 Samsung Electronics Co., Ltd.,
reused under the MIT License — see CREDITS.
🌐 arodmor.me · 💻 github.com/arodmor · ✉️ antonio.rodriguez.moral@pm.me
Part of a series: AI/ML Lab · voice-ai-landscape · arc-agi · recursive-reasoning-models
