MORPH is a training-free, self-organising coordination mechanism for multi-agent systems, inspired by neuroplasticity. It discovers task-semantic communication structure purely from co-task co-occurrence statistics — no spatial sensors, no offline training, no hand-tuned graph topology required.
Most coordination methods either hard-wire communication topology (proximity-based heuristics) or learn it end-to-end with millions of RL training steps (CommFormer, TarMAC, DGN). MORPH occupies a different position: it self-organises online using four biologically grounded plasticity rules applied to a dynamic communication graph.
| Property | Proximity | Learned comm (CommFormer etc.) | MORPH |
|---|---|---|---|
| Requires spatial sensors | ✓ | ✗ | ✗ |
| Requires offline RL training | ✗ | ✓ | ✗ |
| Adapts to task distribution shifts | ✗ | partial | ✓ |
| Persistent coordination memory | ✗ | ✓ | ✓ |
| Interpretable links | ✗ | ✗ | ✓ |
| Works from step 0, any environment | ✓ | ✗ | ✓ |
| Mechanism | Biological basis | Role in MORPH |
|---|---|---|
| Hebbian + Homeostatic plasticity (v1) | Hebb's rule + synaptic scaling | Strengthen links for co-assigned pairs; maintain target degree |
| BCM Metaplasticity | BCM sliding threshold | Over-potentiated links decay faster; stale coordination forgotten |
| Reward-modulated Plasticity | Dopamine-gated LTP | Links that complete deliveries are strengthened, not just co-assignment |
| Neuromodulation | Dopamine/ACh arousal signal | Explore new links when delivery rate drops; consolidate when performing well |
| Predictive Formation | Anticipatory synaptogenesis | Links form before co-assignment via spatial hint matrix |
v1 behaviour is recovered exactly by setting all v2 mechanism gains to zero.
| Condition | Tiny (N=8) | Small (N=12) | Medium (N=18) | Large (N=24) |
|---|---|---|---|---|
| MORPH | 62.4 ± 2.9 | 70.4 ± 1.3 | 91.0 ± 5.1 | 107.4 ± 3.4 |
| Proximity | 62.4 ± 4.0 | 75.2 ± 1.6 | 102.2 ± 2.2 | 113.2 ± 3.1 |
| TSG | 55.2 ± 5.8 | 70.9 ± 2.1 | 89.9 ± 1.8 | 97.6 ± 13.5 |
| Full-Graph | 57.8 ± 18.0 | 64.1 ± 13.7 | 88.8 ± 5.9 | 98.0 ± 7.7 |
MORPH outperforms Full-Graph at every scale — at N=24, MORPH reaches 110% of Full-Graph throughput (107.4 vs. 98.0) while using only 21% of possible coordination links. MORPH ties Proximity at Tiny (62.4 each) and trails by 4–10 deliveries at larger scales; Proximity's radius was calibrated from MORPH's own converged link count, an advantage unavailable at real deployment time.
Delivery curves (top), throughput retention vs N (bottom-left), communication overhead vs N (bottom-right), and per-scale bar charts (bottom row).
MORPH sits in the upper-left ideal region: high throughput at low communication cost.
A key advantage of MORPH over proximity: resilience to non-stationary task patterns.
Experiment: Medium scale (N=18), 20 seeds. Six consecutive 400-step phases alternating LEFT/RIGHT demand (L–R–L–R–L–R). MORPH carries its learned pairwise weights across all phases; MORPH-reset clears weights at every phase boundary; Proximity recomputes spatial links each step.
| Condition | P1 | P2 | P3 | P4 | P5 | P6 | Δ (P6−P3) |
|---|---|---|---|---|---|---|---|
| Proximity | 51.1 | 48.2 | 45.7 | 41.8 | 37.9 | 34.1 | −25.4% |
| TSG | 49.9 | 46.0 | 44.6 | 41.4 | 38.8 | 38.1 | −14.6% |
| MORPH (reset) | 50.0 | 48.9 | 48.2 | 46.5 | 43.9 | 40.8 | −15.6% |
| MORPH | 50.0 | 49.1 | 47.5 | 46.1 | 45.2 | 44.0 | −7.4% |
MORPH degrades 3.3× less than Proximity and 2.1× less than resetting MORPH at every phase boundary, from Phase 3 to Phase 6. Recovery time decreases from 36.9 steps at the first return (r3) to 17.7 steps at r6, showing that retained preferences compound across repeated regime cycles. Proximity's recovery time grows to 96 steps by r5 as its spatial links cannot adapt to repeated shifts.
Post-hoc analyses on a single fully-recorded episode (T=1500, medium scale, N=18, seed=42).
Script: python scripts/emergent_analysis.py
Spearman ρ = −0.111 (p = 0.174) — not significant. MORPH's learned weights are driven by task co-assignment statistics, not physical distance. The system discovers a genuinely task-semantic coordination structure without any spatial sensors.
AGV–Picker pairs develop the strongest weights (they collaborate on every delivery); AGV–AGV and Picker–Picker pairs stay near zero (same-type agents rarely co-assign).
The coordination graph evolves from all-zeros to a structured, sparse pattern. By t=100–200 the AGV–Picker off-diagonal block is already taking shape; same-type blocks (AGV–AGV, Picker–Picker) remain near-zero throughout, consistent with the task structure.
η is mildly negative for most of the episode (exploration mode), with short consolidation bursts (η > 0) following delivery clusters. The system dynamically balances exploring new coordination links vs locking in those that drove recent deliveries — a direct analogue of dopaminergic arousal modulation.
772 link events observed over 800 steps. Only 0.8% survived ≥ 400 steps (stable core); 99.2% were short-lived exploratory probes with a median lifetime of 21 steps. MORPH maintains a tiny set of highly-trusted long-term partnerships while continuously testing alternatives — analogous to the brain's balance between long-term potentiated synapses and ongoing synaptic turnover. Progressive sparsification is further confirmed by the Fiedler value λ₂ declining monotonically from 0.57 to 0.23 over 1500 steps as sliding-threshold and structural pruning eliminate low-value preferences.
Each animation shows MORPH self-organising its communication graph in real time. Right-side panels display: active links, neuromodulation signal η, BCM threshold θ̄, and structural plasticity events (formations / prunings).
| Scale | Agents | Animation |
|---|---|---|
| Tiny | N=8 (5 AGV + 3 picker) | ![]() |
| Small | N=12 (8 AGV + 4 picker) | ![]() |
| Medium | N=18 (12 AGV + 6 picker) | ![]() |
| Large | N=24 (16 AGV + 8 picker) | ![]() |
MORPH is not competing with trained communication methods (CommFormer, TarMAC) in raw throughput — those methods train for millions of steps on the target distribution. MORPH's contribution is:
- No training required — works from the first episode in any environment
- No spatial prior — achieves Proximity-competitive throughput without position sensors; emergent analysis confirms learned W is not correlated with distance (ρ = −0.11, p = 0.17)
- Adaptive — self-rewires when task distribution shifts (proximity cannot)
- Interpretable — W matrix is a learned coordination history: W_ij reflects how much agents i and j have historically co-assigned on tasks
- Scalable — active-link fraction falls monotonically from 97% (N=8) to 21% (N=24) as coordination complexity grows, vs 100% for Full-Graph
- Biologically structured emergence — maintains a tiny stable coordination core (< 1% of link events) alongside a dynamic exploratory periphery, analogous to long-term potentiation vs ongoing synaptic turnover in biological neural circuits
morph_v2/
├── morph/
│ ├── morph.py # MORPH coordinator (all four v2 mechanisms)
│ └── morph_env.py # Abstract environment interface
│
├── experiments/
│ ├── tarware_experiment.py # Scale study: 4 conditions × 4 scales × 5 seeds
│ └── shift_experiment.py # Task distribution shift adaptability experiment
│
├── scripts/
│ ├── generate_figures.py # Figures 1–3 from experiment_results.pkl
│ ├── statistical_tests.py # Welch's t-tests → results/statistical_tests.csv
│ ├── animate.py # Warehouse animations → figures/MORPH_v2_*.gif
│ └── emergent_analysis.py # Emergent behaviour analyses → figures/emergent_*.png
│
├── figures/ # Generated figures and GIFs
├── results/ # PKL data and summary CSVs
├── requirements.txt
└── README.md
git clone <repo-url> morph_v2
cd morph_v2
pip install -r requirements.txtpython experiments/tarware_experiment.py
# → results/experiment_results.pkl
# → results/summary_table.csv
# → results/step_csv/ (per-step CSVs)python experiments/shift_experiment.py
# → results/shift_results.pkl
# → figures/shift_adaptability.pngpython scripts/generate_figures.py
# → figures/scale_comparison.png
# → figures/scale_bars.png
# → figures/pareto.png
python scripts/statistical_tests.py
# → results/statistical_tests.csvpython scripts/animate.py tiny # one scale
python scripts/animate.py all # all four scales (~30 min)
# → figures/MORPH_v2_{tiny,small,medium,large}.gifpython scripts/emergent_analysis.py
# → figures/emergent_w_vs_distance.png (W vs Manhattan distance, Spearman ρ)
# → figures/emergent_w_evolution.png (W matrix snapshots at t=0,100,200,400,600,800)
# → figures/emergent_neuromod.png (η time series + delivery events)
# → figures/emergent_link_persistence.png (link lifetime distribution)| Parameter | Default | Description |
|---|---|---|
alpha |
0.18 | Synaptic learning rate |
beta |
0.04 | Homeostatic correction rate |
decay |
0.98 | Base synaptic weight decay per step |
theta_form_start |
0.75 | Initial MI threshold for link formation |
theta_form_end |
0.45 | Final MI threshold (after annealing) |
theta_prune |
0.008 | Weight threshold for pruning |
target_deg_frac |
0.35 | Target degree as fraction of N−1 |
grace_steps |
20 | Min link age before eligible for pruning or BCM penalty |
k_slow |
3 | Structural update frequency (steps) |
| Parameter | Default | Description |
|---|---|---|
bcm_tau |
0.95 | BCM threshold smoothing (higher = longer memory, ~20-step window) |
bcm_gain |
0.5 | Extra decay multiplier for over-potentiated links |
reward_alpha |
0.08 | Delivery-burst learning rate (only links with jaccard > 0.25) |
neuromod_gain |
0.4 | Scales effective α upward when η > 0 (performing above expectation) |
neuromod_explore |
0.10 | Reduces θ_form when η < 0 (performing below expectation) |
neuromod_ema |
0.03 | EMA smoothing for delivery rate (~33-step window) |
expected_delivery_rate |
0.12 | Baseline deliveries/step for η signal (scale-dependent) |
pred_boost |
0.4 | Anticipatory hint weight in structural MI score |
Welch's t-tests (5 seeds, two-sided), MORPH vs each condition per scale:
| Scale | vs Proximity | vs TSG | vs Full-Graph |
|---|---|---|---|
| Tiny | p = 1.00 (tied) | p = 0.07 | p = 0.64 |
| Small | p = 0.003 ★ | p = 0.71 | p = 0.42 |
| Medium | p = 0.008 ★ | p = 0.74 | p = 0.60 |
| Large | p = 0.038 ★ | p = 0.21 | p = 0.067 |
★ Proximity is significantly better at Small/Medium/Large; MORPH ties Proximity at Tiny (62.4 each). MORPH outperforms Full-Graph at every scale without a significant p-value threshold required — at N=24, MORPH delivers 107.4 vs. Full-Graph 98.0 (+9.6%), while using only 21% of possible links.
@inproceedings{morph2026,
title = {MORPH: Multi-agent Online Rewiring through Plasticity-guided Hierarchy},
author = {Didem Gurdur Broo},
booktitle = {[Venue]},
year = {2026}
}MIT — see LICENSE.











