Skip to content

WKG-80: Allow the user to config the answer correctness prompt#82

Merged
nelly-hateva merged 4 commits into
mainfrom
WKG-80
Jun 25, 2026
Merged

WKG-80: Allow the user to config the answer correctness prompt#82
nelly-hateva merged 4 commits into
mainfrom
WKG-80

Conversation

@nelly-hateva

@nelly-hateva nelly-hateva commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator
  • Adds configurable answer-correctness prompt via:

    answer_correctness:
      prompt: ...
  • Moves the answer-correctness console script implementation from graphrag_eval/answer_correctness.py to graphrag_eval/cli/answer_correctness.py.

  • Renames the prompt template variable from {candidate_answer} to {actual_answer}.

  • Changes answer correctness parsing to raise ValueError instead of returning an error string.

  • Answer correctness evaluator is no longer a singleton

  • Changes GenerationConfig defaults:

    • temperature now defaults to 0.0
    • max_tokens is now optional and defaults to None
  • Update 3rd party libraries to newest versions due to security vulnerabilities (aiohttp, langchain, pydantic-settings, langsmith)

  • Updates docs and tests accordingly.

Comment thread graphrag_eval/answer_correctness.py Fixed
@nelly-hateva nelly-hateva force-pushed the WKG-80 branch 2 times, most recently from bf8b6d2 to bc062b1 Compare June 3, 2026 12:00

@pgan002 pgan002 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A couple of minor issues.

Comment thread docs/config.md Outdated
Comment thread graphrag_eval/cli/answer_correctness.py Outdated
Comment thread graphrag_eval/cli/answer_correctness.py
Co-authored-by: Philip Ganchev <philip.ganchev@graphwise.ai>
@pgan002

pgan002 commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

As we discussed in the TTYG meeting, I think we should revert this functionality if we add user configurability of metrics to compute. I think this is the better approach. We should keep the CLI code separation and other improvements.

@nelly-hateva

nelly-hateva commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

As we discussed in the TTYG meeting, I think we should revert this functionality if we add user configurability of metrics to compute.

But they will not be 1:1, because custom metrics can't define how to parse the response and compute something, so if we want to define the answer correctness as custom metric, we will have to prompt the LLM to calculate precision, recall, and f1, while for the built-in metric, we parse the response and calculate these with python.

@pgan002

pgan002 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

OK, please re-request review after you address the comments.

@nelly-hateva nelly-hateva merged commit aaa8fd3 into main Jun 25, 2026
1 check passed
@nelly-hateva nelly-hateva deleted the WKG-80 branch June 25, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants