Update eval harness with richer scenarios and diagnostics by zac · Pull Request #1 · velos/CodeMode.swift

zac · 2026-04-30T03:56:49Z

Summary

Expanded the built-in eval set with additional filesystem and catalog scenarios covering API shapes, recovery, and capability minimization.
Added multi-step execute support plus clearer runtime/tool guidance so models use top-level returns and avoid unreturned async IIFEs.
Tightened filesystem catalog metadata and added diagnostics/tests for missing return values and async IIFE misuse.
Re-ran and updated both core and failures r5 baselines after clean live runs.

zac added 11 commits April 28, 2026 23:21

Drop watchOS support and fix platform APIs

7346075

Add Wavelike eval suites and report metrics

9bc8082

Add report output, comparison, and live eval retries

c645414

Polish eval CLI with planning and summary output

3e3b4a6

Add Markdown report output for eval runs

5d16c4e

Add retry diagnostics to LLM eval reports

9103b5e

Update eval harness prompts, add richer filesystem scenarios, and roll r

4f6158b

Lower package tools version for CI

49cf717

Stabilize eval package dependency identity

7e230ce

Split deterministic eval CLI for CI

6189874

Update README installation instructions

c9e85a2

zac merged commit 66c4fdf into main Apr 30, 2026
2 checks passed

zac deleted the eval-cli branch April 30, 2026 04:58