Skip to content

feat(GH-26): Agent Benchmark System — Lot B : Dashboard frontend#27

Open
simodev25 wants to merge 26 commits into
feat/GH-24/agent-benchmark-system-lot-afrom
feat/GH-26/benchmark-dashboard-frontend
Open

feat(GH-26): Agent Benchmark System — Lot B : Dashboard frontend#27
simodev25 wants to merge 26 commits into
feat/GH-24/agent-benchmark-system-lot-afrom
feat/GH-26/benchmark-dashboard-frontend

Conversation

@simodev25

Copy link
Copy Markdown
Owner

Résumé

Dashboard frontend React pour le système de benchmark des agents de trading (Lot B). Repose sur l API REST livrée par le Lot A (GH-24, PR #25).

  • Nouvelle page /benchmark intégrée au SPA existant avec navigation sidebar
  • Gestion des fixtures (liste, détail), lancement de runs avec sélection de modèle LLM
  • Visualisation des scores V1 par métrique et vue comparaison multi-modèles

Changements

  • frontend/src/types/benchmark.ts — types TypeScript (fixtures, runs, cases, attempts, scores V1)
  • frontend/src/api/client.ts — 6 méthodes API benchmark ajoutées
  • frontend/src/pages/BenchmarkPage.tsx — page complète avec 5 vues (fixtures, lancement, résultats, comparaison, détail run)
  • frontend/src/components/Layout.tsx — entrée navigation sidebar BENCHMARK
  • frontend/src/App.tsx — route /benchmark en lazy loading
  • .samourai/docai/changes/ — spec, plan, test plan, pm-notes

Issue liée

Closes #26

Tests exécutés

cd frontend && npx tsc --noEmit 2>&1 | grep benchmark
# Résultat : 0 erreur dans les fichiers GH-26

Note : erreurs TS pré-existantes dans BacktestsPage, ConnectorsPage, OrdersPage, RunDetailPage — hors périmètre.

Risques

Checklist

simodev25 added 26 commits May 11, 2026 16:23
Add a UI panel and page-state hooks to create benchmark fixtures, wire the
frontend API client (createBenchmarkFixture) and include the change plan.

This commit implements the form, state management, types and API call used
to POST /benchmark/fixtures from the frontend.
Add realistic, per-agent input presets used by the benchmark fixture creator so tests and manual runs better reflect production agent inputs; updates benchmark page and creation panel to use the new presets.

Verified: frontend builds locally and UI interactions for fixture creation (manual smoke).
AgentScope tracing expects a Msg with .role and .get_content_blocks(),
not a plain string. Build a rich Msg from fixture inputs (context,
news_context, portfolio_state, etc.) matching the real pipeline format.
Mirror the real pipeline's _msg_to_dict logic: check metadata first,
then parse JSON from text content, then try content dict. Previous
version returned empty {'text': ''} when metadata was None/empty.
Ajoute des logs plus verbeux dans le moteur de benchmark et les scénarios
pour faciliter le débogage et l'analyse des résultats de GH-24.

Vérifié localement : exécution des tâches de benchmark et observation
du nouveau niveau de détail des messages de log.
…de priority

Favor rendering system prompts from the prompts DB for benchmark agents, while
still allowing a fixture-level `system_prompt` to override when present. This
fix ensures benchmark runs use the authoritative prompt source (DB) and keeps
the previous behaviour of explicit fixture overrides.

Why: benchmark scenarios must reflect the same agent system prompts as the
production agentscope registry so results are consistent across runs.

What: use PromptTemplateService.render(...) when no fixture override exists;
add unit tests to verify DB-rendering and fixture override priority.

Tests: added unit tests covering DB rendering and override behaviour.

No breaking changes.
DeepSeek models return structured JSON inside ThinkingBlocks, not
TextBlocks. get_text_content() ignores thinking blocks, so the
extractor was getting empty text and scoring 0.00 everywhere.

New _extract_all_text_from_msg() reads ALL block types (text +
thinking) before attempting JSON parsing. Added 2 tests covering
thinking-only and mixed block scenarios.
Add non-blocking debug trace dumps for benchmark attempts and a run-level
summary JSON so runs can be inspected without interrupting execution.

Files:
- backend/app/core/config.py
- backend/app/services/benchmark/scenarios.py
- backend/app/services/benchmark/engine.py
- docker-compose.yml
- .samourai/docai/changes/2026-05/2026-05-11--GH-24--agent-benchmark-system-lot-a/chg-GH-24-plan.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant