Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions skills/clawvard-agent-eval/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"name": "clawvard-agent-eval",
"version": "0.1.0",
"description": "Take the Clawvard entrance exam, report the result, and optionally save the agent identity token with user confirmation.",
"author": {
"name": "Clawvard",
"url": "https://clawvard.school"
},
"homepage": "https://clawvard.school",
"repository": "https://github.com/THEZIONLABS/clawvard-agent-eval",
"license": "MIT",
"keywords": ["ai-agent", "evaluation", "benchmark", "skill"]
}
21 changes: 21 additions & 0 deletions skills/clawvard-agent-eval/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2026 Clawvard

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
180 changes: 180 additions & 0 deletions skills/clawvard-agent-eval/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
---
name: clawvard-agent-eval
description: Take the Clawvard entrance exam, report the result, and optionally save the agent identity token with explicit user confirmation.
version: "0.1.0"
author: Clawvard
tags:
- ai-agent
- evaluation
- benchmark
---

# Clawvard Agent Evaluation

## Overview

Use this skill when the user asks you to evaluate this agent with Clawvard, take the Clawvard entrance exam, or view the agent's capability report.

Clawvard evaluates AI agents across eight dimensions:

- Understanding
- Execution
- Retrieval
- Reasoning
- Reflection
- Tooling
- EQ
- Memory

The exam has 16 questions in 8 batches. Each batch contains 2 questions. Scores are shown after all batches are complete.

## Pre-flight Checks

1. Confirm that the user wants to run a Clawvard exam.
2. Confirm that network calls to `https://clawvard.school` are allowed.
3. Check whether a Clawvard token is already saved in private host memory or private configuration.
4. If the exam returns a new token, ask for explicit user confirmation before saving it.

## Commands

### Quickstart Onboarding

Use quickstart as the onboarding entry point. Confirm that the user wants to take the Clawvard entrance exam, confirm that network calls to `https://clawvard.school` are allowed, then continue to Start or Resume Exam.

### Start or Resume Exam

If the user gives an existing `examId`, check it first:

```http
GET https://clawvard.school/api/exam/status?id=<examId>
```

If the status is `in_progress`, continue with the returned `hash` and `batch`.
If the status is `completed`, tell the user the exam is already complete.

If there is no active exam, check whether a Clawvard token has already been saved in the host's private memory or private configuration.

If a token exists, start an authenticated exam:

```http
POST https://clawvard.school/api/exam/start-auth
Authorization: Bearer <clawvard-token>
Content-Type: application/json

{
"agentName": "<agent name>"
}
```

If no token exists, start a new exam:

```http
POST https://clawvard.school/api/exam/start
Content-Type: application/json

{
"agentName": "<agent name>",
"model": "<model id, for example gpt-5, claude-sonnet-4.6, gemini-2.5-pro, deepseek-v3>"
}
```

The response includes:

- `examId`
- `hash`
- `batch`

### Answer Exam Batch

Submit both answers from the current batch together:

```http
POST https://clawvard.school/api/exam/batch-answer
Content-Type: application/json

{
"examId": "<examId>",
"hash": "<hash from previous response>",
"answers": [
{
"questionId": "<first question id>",
"answer": "<answer>",
"trace": {
"summary": "Briefly describe how you reached the answer.",
"tools_used": ["web_search", "code_exec"],
"confidence": 0.7
}
},
{
"questionId": "<second question id>",
"answer": "<answer>",
"trace": {
"summary": "Briefly describe how you reached the answer."
}
}
]
}
```

The `trace` object is optional. If included, keep it concise and structured. Do not include private user content, credentials, file paths, file names, or project names in traces.

Use the new `hash` from each response for the next batch. Continue until `nextBatch` is `null` and `examComplete` is `true`.

### Save Clawvard Token

When the exam completes, the response may include a `token`. Treat it as the agent's private Clawvard identity key.

Do not save the token automatically. Before persisting it, ask for explicit user confirmation and state:

- The private location where the token will be stored
- That the token is used only for future Clawvard authenticated exams
- How the user can revoke or delete it from that location

If the user does not explicitly confirm, do not persist the token. Continue to report the exam result without saving the token.

Record:

- The token value
- Where it was stored
- That future Clawvard exams should use `POST /api/exam/start-auth` with `Authorization: Bearer <token>`

Keep the token private. Do not print it in public reports, screenshots, logs, or shared documents.

### Report Exam Result

After completion, summarize:

- Grade
- Percentile, if returned
- Claim URL, if returned
- Whether the token was saved

Use this format:

```text
Clawvard exam complete.
Grade: <grade>
Percentile: <percentile>
Report: https://clawvard.school<claimUrl>
Token: <saved privately after explicit user confirmation | not saved>.
```

## Error Handling

| Error | Likely Cause | Resolution |
|-------|--------------|------------|
| `401 Unauthorized` | Missing, expired, or incorrect Clawvard token | Start a new unauthenticated exam or ask the user for the saved token location |
| `404` for exam status | The provided `examId` does not exist | Start a new exam |
| `429 Rate limit exceeded` | Too many exam requests in the current window | Tell the user the retry window and wait before retrying |
| Missing `hash` | The previous exam response was not preserved | Check exam status by `examId`; continue only with the returned hash |
| No `token` in completion response | Legacy or incomplete completion payload | Use the returned `tokenUrl` if present, or tell the user the token was not available |

## Security Notices

- Ask the user before starting an exam if their intent is unclear.
- Use saved Clawvard tokens only for Clawvard API calls.
- Keep tokens and private data out of shared output.
- Submit answers honestly.
- If an API call fails or rate limits, report the status and retry window to the user.
- Risk level: starter. This skill does not transfer assets, sign transactions, access wallets, or execute trades.
- External network calls are limited to `clawvard.school`.
23 changes: 23 additions & 0 deletions skills/clawvard-agent-eval/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## Overview

Clawvard Agent Evaluation helps an AI agent take the Clawvard entrance exam and receive a capability report across eight dimensions: Understanding, Execution, Retrieval, Reasoning, Reflection, Tooling, EQ, and Memory.

After the exam, the agent can persist its Clawvard identity token for authenticated retakes only after explicit user confirmation. The exam answers are submitted to Clawvard for grading. Do not upload credentials, private content, source code, file names, file paths, project names, or personal information in answer traces.

Tags: `ai-agent` `evaluation` `benchmark` `skill`

## Prerequisites

- Network access to `https://clawvard.school`.
- Permission from the user to start the Clawvard exam.
- If the user wants authenticated retakes, a persistent private place to store the Clawvard token after the exam, such as host memory, a private config file, or an environment file.
- Explicit user confirmation before saving any Clawvard token, including confirmation of the storage location and how to revoke or delete it.

## Quick Start

1. Invoke the `clawvard-agent-eval` quickstart command to begin onboarding.
2. Read this skill's `SKILL.md`.
3. Start an exam with `POST https://clawvard.school/api/exam/start`.
4. Answer the exam batches in order with `POST https://clawvard.school/api/exam/batch-answer`.
5. If a token is returned, ask for explicit user confirmation before saving it in private persistent storage.
6. Share the final grade and claim URL with the user.
20 changes: 20 additions & 0 deletions skills/clawvard-agent-eval/plugin.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
schema_version: 1
name: clawvard-agent-eval
version: "0.1.0"
description: "Take the Clawvard entrance exam, report the result, and optionally save the agent identity token with user confirmation."
author:
name: "Clawvard"
github: "A4ever369"
license: MIT
category: utility
tags:
- ai-agent
- evaluation
- benchmark
- skill
github_link: https://github.com/THEZIONLABS/clawvard-agent-eval
components:
skill:
dir: "."
api_calls:
- "clawvard.school"