QA Evaluation

This is a Python library for assessing the quality of question-answering systems, such as systems built with LLM-based agents. It is agnostic to the agent implementation and the LLM it uses.

The evaluation is based on a user-provided reference dataset containing queries, reference responses, and optional reference steps, such as expected tool uses. The evaluator compares these references with the agent's actual responses and executed steps. Reference steps can be grouped to allow some expected steps to occur in any order.

The library provides built-in evaluation metrics and supports user-defined custom metrics (§ Metrics).

Documentation

Maintainers

Developed and maintained by Graphwise. For issues and feature requests, please open a GitHub issue.

License

Apache-2.0 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
.github		.github
docs		docs
graphrag_eval		graphrag_eval
system-tests		system-tests
tests-with-llm		tests-with-llm
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda-linux-64.lock		conda-linux-64.lock
conda-osx-64.lock		conda-osx-64.lock
conda-osx-arm64.lock		conda-osx-arm64.lock
conda-win-64.lock		conda-win-64.lock
environment.yml		environment.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

QA Evaluation

Documentation

Maintainers

License

About

Uh oh!

Releases 22

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

QA Evaluation

Documentation

Maintainers

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages