The objective of this repository is to provide a tool for validating query simulation approaches in the context of (interactive) information retrieval.
We recommend using a virtual environment to run the code locally.
We provide instructions for creating a virtual environment using conda (recommended) or another method of your choice.
Conda environment (recommended)
To create a virtual environment using conda, run the following commands in your terminal:
- Make the script executable (if not already):
chmod +x scripts/create_environment.sh- Run the script to create the environment:
./scripts/create_environment.sh
conda activate query_sim_validationOwn method
To create a virtual environment using your own method, follow these steps:
- Create a virtual environment (e.g., using
venvorvirtualenv). - Activate the virtual environment.
- Install the requirements:
pip install -r requirements.txt - Download the
nltkdata files:
python -c "import nltk; nltk.download('wordnet'); nltk.download('punkt_tab'); nltk.download('averaged_perceptron_tagger_eng')"- Download the
spacymodel:
python -m spacy download en_core_web_smIf you want to validate your own data, make sure it follows the correct structure. See the data description guide for more details on the expected data format, along with example data files.
To run the validation, use the following command:
python -m query_sim_validation.main --config <path_to_config_file>Use python -m query_sim_validation.main --help to see all available options.
We provide a default configuration file at config/config_default.yaml. You can create a custom configuration file or specify the parameters directly in the command line. To run the validation with the default configuration, you need to unzip the original_sessions.zip and simulated_sessions.zip files in the data directory.
The configuration is automatically saved in a specific output directory (output by default).
⚠️ Note: Some measures may increase runtime (e.g., BERT Score Similarity). And traditional IR measures require Qrels to be provided in the configuration file.
Currently, the following validation measures are implemented:
Measures that assess how similar two queries are in terms of language, structure, and observable behavior.
| Facet | Measure | Ref |
|---|---|---|
| Behavioural Similarity Measures | Flesch Kincaid Score | flesch_kincaid_scores |
| Type-Token Ratio | ttr | |
| Data Similarity Measures | Query Length | query_length |
| Number of Query Terms | query_num_terms | |
| Number of Named Entity Query Terms | query_num_named_entities | |
| Query Similarity Measures | Jaccard Similarity | jaccard_similarity |
| Cosine Similarity | cosine_similarity | |
| BERT Score Similarity | bert_score | |
| WordNet Similarity | wordnet_similarity | |
| Rank Diversity Score | rank_diversity_score |
Measures that estimate how similarly two queries perform in retrieval settings.
| Facet | Measure | Ref |
|---|---|---|
| SERP Overlap Measures | SERP-based Jaccard Similarity | serp_jaccard |
| Ranked Biased Overlap | rbo | |
| Traditional IR Measures | Mean Average Precision (MAP) / MAP@K | map / map_cut@k |
| Normalized Discounted Cumulative Gain (NDCG) / NDCG@K | ndcg / ndcg_cut@k | |
| Precision / Precision@K | P / P@k | |
| Recall@K | recall@k | |
| Reciprocal Rank | recip_rank | |
| Rank-biased Precision | rbp |
If you plan on using the framework in your work, please cite it the following way:
@inproceedings{Kruff:2026:ECIR,
title={Validating Search Query Simulations: A Taxonomy of Measures},
booktitle={Proceedings of the 48th European Conference on Information Retrieval},
author={Kruff, Andreas Konstantin and Bernard, Nolwenn and Schaer, Philipp},
year={2026},
series={ECIR '26}
}