Misinformation Propagation in Benign Multi-Agent Systems

...and the MINT Dataset

This is the offical repository for the paper "Misinformation Propagation in Benign Multi-Agent Systems".

What does MINT do?

MINT studies how misinformation affects large language models in single-agent and multi-agent debate settings. The repository provides:

the MINT dataset with task instances, false facts, and eight misinformation strategies
experiment scripts for single-agent baselines, multi-agent debates (via MALLM), and agent-composition sweeps
plotting utilities to reproduce paper figures

Install

Create an environment:

conda create --name mint python=3.11
conda activate mint
pip install torch transformers datasets numpy pandas matplotlib seaborn tqdm requests

Clone MALLM next to this repository (required for exp2.py, exp2_ablation.py, and exp3.py):

github/
  mint/          # this repo
  mallm/         # MALLM framework

Multi-agent experiments expect an OpenAI-compatible model endpoint (vLLM, SGLang, TGI, or similar). Slurm job scripts are provided for cluster runs.

Dataset

The released benchmark is in MINT-dataset_v1.1/:

Dataset	Task type
`winogrande_misinformed.json`	Commonsense reasoning (multiple choice)
`ethics_commonsense_misinformed.json`	Moral judgment (multiple choice)
`complex_web_questions_misinformed.json`	Complex QA (free-form)

Each instance includes a false_fact, misinformation_by_strategy (clickbait, hoax, rumor, satire, propaganda, framing, conspiracy, other), and irrelevant_true_information as a control.

To regenerate or extend datasets:

python download_datasets.py --use_config_samples --datasets winogrande ethics_commonsense complex_web_questions

Run Experiments

Exp 1 — Single agent

Baseline vs. misinformed single-agent prompting (local HuggingFace or OpenAI-compatible API):

python exp1.py --model_name meta-llama/Llama-3.3-70B-Instruct --inference openai --endpoint_url http://127.0.0.1:8080/v1

exp1a.py runs the same setup with irrelevant true information instead of misinformation.

Exp 2 — Multi-agent debate

3-agent MALLM debates across datasets and misinformation strategies:

python exp2.py --endpoint_url http://127.0.0.1:8080/v1 --model_name meta-llama/Llama-3.3-70B-Instruct

exp2_ablation.py runs the same setup without misinformation.

Exp 3 — Agent composition

5-agent debates on WinoGrande, sweeping the number of misinformed agents (0–5):

python exp3.py --endpoint_url http://127.0.0.1:8080/v1 --model_name meta-llama/Llama-3.3-70B-Instruct

Quick smoke test (no model server):

python exp2.py --mock --debug

Results are written to out/<model_name>/. Use --continue to resume unfinished runs.

Figures

Generate plots from saved results:

python exp1_figures.py
python exp2_figures.py
python exp3_figures.py
python exp1_exp2_comparison.py

Code Structure

Component	Description
`download_datasets.py`	Download, sample, and generate misinformed datasets
`shared_utils.py`	Prompts, loading, evaluation helpers
`exp1.py` / `exp1a.py`	Single-agent experiments
`exp2.py` / `exp2_ablation.py`	Multi-agent debate experiments (MALLM)
`exp3.py`	Misinformed vs. informed agent composition
`exp*_figures.py`	Figure generation
`*.slurm`	Cluster job templates (model server + experiment)
`MINT-dataset_v1.1/`	Released benchmark data

Citation

If you use this repository, please cite the paper and the MALLM framework:

@misc{becker2026,
  author={Becker, Jonas and Wahle, Jan Philip and Ruas, Terry and Gipp, Bela},
  title={Misinformation Propagation in Benign Multi-Agent Systems},
  year={2026},
  month={06}
}

@inproceedings{becker-etal-2025-mallm,
    title = "{MALLM}: Multi-Agent Large Language Models Framework",
    author = "Becker, Jonas and Kaesberg, Lars Benedikt and Bauer, Niklas and Wahle, Jan Philip and Ruas, Terry and Gipp, Bela",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    year = "2025",
    url = "https://aclanthology.org/2025.emnlp-demos.29/"
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
MINT-dataset_v1.1		MINT-dataset_v1.1
scripts		scripts
README.md		README.md
color_palette.json		color_palette.json
dataset_config.json		dataset_config.json
download_datasets.py		download_datasets.py
exp1.py		exp1.py
exp1.slurm		exp1.slurm
exp1_exp2_comparison.py		exp1_exp2_comparison.py
exp1_figures.py		exp1_figures.py
exp1a.py		exp1a.py
exp1a.slurm		exp1a.slurm
exp1a_download.slurm		exp1a_download.slurm
exp2.py		exp2.py
exp2.slurm		exp2.slurm
exp2_ablation.py		exp2_ablation.py
exp2_figures.py		exp2_figures.py
exp3.py		exp3.py
exp3.slurm		exp3.slurm
exp3_figures.py		exp3_figures.py
figure_style.py		figure_style.py
glm_openai_server.py		glm_openai_server.py
model_server_config.py		model_server_config.py
plot_config.py		plot_config.py
shared_utils.py		shared_utils.py
shared_visualization.py		shared_visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Misinformation Propagation in Benign Multi-Agent Systems

...and the MINT Dataset

What does MINT do?

Install

Dataset

Run Experiments

Exp 1 — Single agent

Exp 2 — Multi-agent debate

Exp 3 — Agent composition

Figures

Code Structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Misinformation Propagation in Benign Multi-Agent Systems

...and the MINT Dataset

What does MINT do?

Install

Dataset

Run Experiments

Exp 1 — Single agent

Exp 2 — Multi-agent debate

Exp 3 — Agent composition

Figures

Code Structure

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages