Skip to content

Latest commit

 

History

History
142 lines (108 loc) · 3.73 KB

File metadata and controls

142 lines (108 loc) · 3.73 KB

AgentV

Evaluate AI agents from the terminal. No server. No signup.

npm install -g agentv
agentv init
agentv eval evals/example.yaml

That's it. Results in seconds, not minutes.

What it does

AgentV runs evaluation cases against your AI agents and scores them with deterministic code graders + customizable LLM graders. Everything lives in Git — YAML eval files, markdown judge prompts, JSONL results.

# evals/math.yaml
description: Math problem solving
tests:
  - id: addition
    input: What is 15 + 27?
    expected_output: "42"
    assertions:
      - type: contains
        value: "42"
agentv eval evals/math.yaml

Why AgentV?

  • Local-first — runs on your machine, no cloud accounts or API keys for eval infrastructure
  • Version-controlled — evals, judges, and results all live in Git
  • Hybrid graders — deterministic code checks + LLM-based subjective scoring
  • CI/CD native — exit codes, JSONL output, threshold flags for pipeline gating
  • Any agent — supports Claude, Codex, Copilot, VS Code, Pi, Azure OpenAI, or any CLI agent

Quick start

1. Install and initialize:

npm install -g agentv
agentv init

2. Configure targets in .agentv/targets.yaml — point to your agent or LLM provider.

3. Create an eval in evals/:

description: Code generation quality
tests:
  - id: fizzbuzz
    criteria: Write a correct FizzBuzz implementation
    input: Write FizzBuzz in Python
    assertions:
      - type: contains
        value: "fizz"
      - type: code-grader
        command: ./validators/check_syntax.py
      - type: llm-grader
        prompt: ./graders/correctness.md

4. Run it:

agentv eval evals/my-eval.yaml

5. Compare results across targets:

agentv compare .agentv/results/runs/<timestamp>/index.jsonl

Output formats

agentv eval evals/my-eval.yaml --output ./run   # writes ./run/index.jsonl
cat ./run/index.jsonl                         # JSONL results for scripts/CI

TypeScript SDK

Use AgentV programmatically:

import { evaluate } from '@agentv/core';

const { results, summary } = await evaluate({
  tests: [
    {
      id: 'greeting',
      input: 'Say hello',
      assertions: [{ type: 'contains', value: 'Hello' }],
    },
  ],
});

console.log(`${summary.passed}/${summary.total} passed`);

Documentation

Full docs at agentv.dev/docs.

Development

git clone https://github.com/EntityProcess/agentv.git
cd agentv
bun install && bun run build
bun test

See AGENTS.md for development guidelines.

Docker Dashboard Deployment

To simulate a one-command production deployment of AgentV Dashboard with the AgentV examples project and a remote results repository:

AGENTV_RESULTS_REPO=EntityProcess/agentv-evalresults \
  scripts/setup-dashboard-deployment.sh

The script clones AgentV examples into ~/agentv-dashboard, clones the results repo, writes the Dashboard project registry under $AGENTV_HOME/config.yaml, builds the Docker image, and starts Dashboard at http://localhost:3117.

License

MIT