WKG-80: Allow the user to config the answer correctness prompt#82
Conversation
bf8b6d2 to
bc062b1
Compare
71ec2c7 to
0459fff
Compare
pgan002
left a comment
There was a problem hiding this comment.
Looks good. A couple of minor issues.
Co-authored-by: Philip Ganchev <philip.ganchev@graphwise.ai>
|
As we discussed in the TTYG meeting, I think we should revert this functionality if we add user configurability of metrics to compute. I think this is the better approach. We should keep the CLI code separation and other improvements. |
But they will not be 1:1, because custom metrics can't define how to parse the response and compute something, so if we want to define the answer correctness as custom metric, we will have to prompt the LLM to calculate precision, recall, and f1, while for the built-in metric, we parse the response and calculate these with python. |
|
OK, please re-request review after you address the comments. |
d0b501d to
203794e
Compare
Adds configurable answer-correctness prompt via:
Moves the
answer-correctnessconsole script implementation fromgraphrag_eval/answer_correctness.pytographrag_eval/cli/answer_correctness.py.Renames the prompt template variable from
{candidate_answer}to{actual_answer}.Changes answer correctness parsing to raise
ValueErrorinstead of returning an error string.Answer correctness evaluator is no longer a singleton
Changes
GenerationConfigdefaults:temperaturenow defaults to0.0max_tokensis now optional and defaults toNoneUpdate 3rd party libraries to newest versions due to security vulnerabilities (aiohttp, langchain, pydantic-settings, langsmith)
Updates docs and tests accordingly.