AgentV

Evaluate AI agents from the terminal. No server. No signup.

npm install -g agentv
agentv init
agentv eval evals/example.yaml

That's it. Results in seconds, not minutes.

What it does

AgentV runs evaluation cases against your AI agents and scores them with deterministic code graders + customizable LLM graders. Everything lives in Git — YAML eval files, markdown judge prompts, JSONL results.

# evals/math.yaml
description: Math problem solving
tests:
  - id: addition
    input: What is 15 + 27?
    expected_output: "42"
    assertions:
      - type: contains
        value: "42"

agentv eval evals/math.yaml

Why AgentV?

Local-first — runs on your machine, no cloud accounts or API keys for eval infrastructure
Version-controlled — evals, judges, and results all live in Git
Hybrid graders — deterministic code checks + LLM-based subjective scoring
CI/CD native — exit codes, JSONL output, threshold flags for pipeline gating
Any agent — supports Claude, Codex, Copilot, VS Code, Pi, Azure OpenAI, or any CLI agent

Quick start

1. Install and initialize:

npm install -g agentv
agentv init

2. Configure targets in .agentv/targets.yaml — point to your agent or LLM provider.

3. Create an eval in evals/:

description: Code generation quality
tests:
  - id: fizzbuzz
    criteria: Write a correct FizzBuzz implementation
    input: Write FizzBuzz in Python
    assertions:
      - type: contains
        value: "fizz"
      - type: code-grader
        command: ./validators/check_syntax.py
      - type: llm-grader
        prompt: ./graders/correctness.md

4. Run it:

agentv eval evals/my-eval.yaml

5. Compare results across targets:

agentv compare .agentv/results/runs/<timestamp>/index.jsonl

Output formats

agentv eval evals/my-eval.yaml --output ./run   # writes ./run/index.jsonl
cat ./run/index.jsonl                         # JSONL results for scripts/CI

TypeScript SDK

Use AgentV programmatically:

import { evaluate } from '@agentv/core';

const { results, summary } = await evaluate({
  tests: [
    {
      id: 'greeting',
      input: 'Say hello',
      assertions: [{ type: 'contains', value: 'Hello' }],
    },
  ],
});

console.log(`${summary.passed}/${summary.total} passed`);

Documentation

Full docs at agentv.dev/docs.

Eval files — format and structure
Custom graders — code graders in any language
Rubrics — structured criteria scoring
Targets — configure agents and providers
Compare results — A/B testing and regression detection
Ecosystem — how AgentV fits with Agent Control and Langfuse

Development

git clone https://github.com/EntityProcess/agentv.git
cd agentv
bun install && bun run build
bun test

See AGENTS.md for development guidelines.

Docker Dashboard Deployment

To simulate a one-command production deployment of AgentV Dashboard with the AgentV examples project and a remote results repository:

AGENTV_RESULTS_REPO=EntityProcess/agentv-evalresults \
  scripts/setup-dashboard-deployment.sh

The script clones AgentV examples into ~/agentv-dashboard, clones the results repo, writes the Dashboard project registry under $AGENTV_HOME/config.yaml, builds the Docker image, and starts Dashboard at http://localhost:3117.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AgentV

What it does

Why AgentV?

Quick start

Output formats

TypeScript SDK

Documentation

Development

Docker Dashboard Deployment

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AgentV

What it does

Why AgentV?

Quick start

Output formats

TypeScript SDK

Documentation

Development

Docker Dashboard Deployment

License