Skip to content
This repository was archived by the owner on Jan 29, 2024. It is now read-only.
This repository was archived by the owner on Jan 29, 2024. It is now read-only.

Add more than one acceptable answer in our Question-Answering dataset #617

@FrancescoCasalegno

Description

@FrancescoCasalegno

Context

  • In Question-Answering: collect example questions + run first analysis with pre-trained QA models #612 we started the creation of a Question-Answering dataset for Extractive QA task evaluation/training.
  • This was done following the style of popular datasets for Extractive QA like SQuAD.
  • However, for the sake of simplicity, we only annotated one ground-truth answer for each sample.
  • In SQuAD, more than one ground-truth answer is annotated, and we should do the same because otherwise our evaluation (EM score in particular, but also F1) is biased by the lack of a complete set of acceptable answers.
  • For instance
    • Question: "It is estimated that about 200,000 people live in Geneva."
    • Context: "What is the population size of Geneva?"
    • Acceptable Answers: ["200,000", "about 200,000", "200,000 people", "about 200,000 people"]

Actions

  • Add to our QA dataset more than one acceptable answer for each sample.
  • Re-run evaluation and compare results. We should expect higher scores for all models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions