WKG-80: Allow the user to config the answer correctness prompt by nelly-hateva · Pull Request #82 · Ontotext-AD/graphrag-eval

nelly-hateva · 2026-06-02T16:25:54Z

Adds configurable answer-correctness prompt via:
```
answer_correctness:
  prompt: ...
```
Moves the answer-correctness console script implementation from graphrag_eval/answer_correctness.py to graphrag_eval/cli/answer_correctness.py.
Renames the prompt template variable from {candidate_answer} to {actual_answer}.
Changes answer correctness parsing to raise ValueError instead of returning an error string.
Answer correctness evaluator is no longer a singleton
Changes GenerationConfig defaults:
- temperature now defaults to 0.0
- max_tokens is now optional and defaults to None
Update 3rd party libraries to newest versions due to security vulnerabilities (aiohttp, langchain, pydantic-settings, langsmith)
Updates docs and tests accordingly.

pgan002

Looks good. A couple of minor issues.

Co-authored-by: Philip Ganchev <philip.ganchev@graphwise.ai>

pgan002 · 2026-06-23T12:03:06Z

As we discussed in the TTYG meeting, I think we should revert this functionality if we add user configurability of metrics to compute. I think this is the better approach. We should keep the CLI code separation and other improvements.

nelly-hateva · 2026-06-23T12:18:55Z

As we discussed in the TTYG meeting, I think we should revert this functionality if we add user configurability of metrics to compute.

But they will not be 1:1, because custom metrics can't define how to parse the response and compute something, so if we want to define the answer correctness as custom metric, we will have to prompt the LLM to calculate precision, recall, and f1, while for the built-in metric, we parse the response and calculate these with python.

pgan002 · 2026-06-24T08:59:50Z

OK, please re-request review after you address the comments.

github-code-quality Bot found potential problems Jun 2, 2026

View reviewed changes

Comment thread graphrag_eval/answer_correctness.py Fixed

nelly-hateva force-pushed the WKG-80 branch 2 times, most recently from bf8b6d2 to bc062b1 Compare June 3, 2026 12:00

WKG-80: Allow the user to config the answer correctness prompt

1948a8e

nelly-hateva force-pushed the WKG-80 branch from bc062b1 to 1948a8e Compare June 3, 2026 13:07

nelly-hateva requested a review from pgan002 June 15, 2026 11:01

nelly-hateva assigned pgan002 Jun 15, 2026

nelly-hateva force-pushed the WKG-80 branch 2 times, most recently from 71ec2c7 to 0459fff Compare June 19, 2026 14:57

Merge remote-tracking branch 'origin/main' into WKG-80

9043513

nelly-hateva force-pushed the WKG-80 branch from 0459fff to 9043513 Compare June 19, 2026 14:58

pgan002 requested changes Jun 22, 2026

View reviewed changes

Comment thread docs/config.md Outdated

Comment thread graphrag_eval/cli/answer_correctness.py Outdated

Comment thread graphrag_eval/cli/answer_correctness.py

Apply suggestion from @pgan002

9d06443

Co-authored-by: Philip Ganchev <philip.ganchev@graphwise.ai>

nelly-hateva force-pushed the WKG-80 branch from 466dad2 to 45e075b Compare June 23, 2026 11:18

This was referenced Jun 24, 2026

Bump aiohttp from 3.13.5 to 3.14.1 #84

Closed

Bump langchain from 1.3.1 to 1.3.9 #85

Closed

Bump langsmith from 0.8.5 to 0.8.18 #86

Closed

Bump pydantic-settings from 2.14.1 to 2.14.2 #87

Closed

nelly-hateva force-pushed the WKG-80 branch 4 times, most recently from d0b501d to 203794e Compare June 25, 2026 12:17

WKG-80: Improve docs

c55123a

nelly-hateva force-pushed the WKG-80 branch from 203794e to c55123a Compare June 25, 2026 12:40

nelly-hateva requested a review from pgan002 June 25, 2026 12:53

pgan002 approved these changes Jun 25, 2026

View reviewed changes

nelly-hateva merged commit aaa8fd3 into main Jun 25, 2026
1 check passed

nelly-hateva deleted the WKG-80 branch June 25, 2026 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WKG-80: Allow the user to config the answer correctness prompt#82

WKG-80: Allow the user to config the answer correctness prompt#82
nelly-hateva merged 4 commits into
mainfrom
WKG-80

nelly-hateva commented Jun 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

pgan002 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pgan002 commented Jun 23, 2026 •

edited

Loading

Uh oh!

nelly-hateva commented Jun 23, 2026 •

edited

Loading

Uh oh!

pgan002 commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nelly-hateva commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pgan002 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pgan002 commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nelly-hateva commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pgan002 commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nelly-hateva commented Jun 2, 2026 •

edited

Loading

pgan002 commented Jun 23, 2026 •

edited

Loading

nelly-hateva commented Jun 23, 2026 •

edited

Loading