Skip to content
#

skill-evaluation

Here are 11 public repositories matching this topic...

oh-my-knowledge

Evaluation framework for LLM knowledge inputs — prompts, RAG corpora, skills, agent workflows. Fix the model, vary the artifact. Built-in statistical rigor: bootstrap CI, Krippendorff α, length-debias, saturation curves.

  • Updated May 7, 2026
  • TypeScript

Binary-criteria evaluation harness for Claude skills with planned extension to plugins, agents, and MCP servers. Score every change yes/no across 7 layers — package integrity, trigger quality, functional quality, regression protection, baseline value, model variance, rollout safety. Never gradients.

  • Updated May 8, 2026
  • TypeScript

Detect malicious code and security risks in AI skill files before installation to protect AI agents from hidden threats and obfuscation techniques.

  • Updated May 10, 2026
  • Python

Improve this page

Add a description, image, and links to the skill-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the skill-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more