From b89ffe440eff609b357df4b9fd4ff5c694e2388e Mon Sep 17 00:00:00 2001 From: Yuxuan Zhang Date: Wed, 20 May 2026 17:05:21 -0700 Subject: [PATCH] Add ClawBench (arXiv:2604.08523) to Benchmark --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 4eeda39..75691ed 100644 --- a/README.md +++ b/README.md @@ -176,6 +176,7 @@ You are more than welcome to update this list! If you find a paper about agentic ## Benchmark +- [ClawBench: Can AI Agents Complete Everyday Online Tasks?](https://arxiv.org/abs/2604.08523) by Zhang, Yuxuan, Yubo Wang, Yipeng Zhu, Penghui Du, Junwen Miao, Xuan Lu, Wendong Xu, Yunzhuo Hao, Songcheng Cai, Xiaochen Wang, Huaisong Zhang, Xian Wu, Yi Lu, Minyi Lei, Kai Zou, Huifeng Yin, Ping Nie, Liang Chen, Dongfu Jiang, Wenhu Chen, Kelsey R. Allen. 2026 - [Safearena: Evaluating the safety of autonomous web agents](https://arxiv.org/pdf/2503.04957) by Tur, Ada Defne, Nicholas Meade, Xing Han Lù, Alejandra Zambrano, Arkil Patel, Esin Durmus, Spandana Gella, Karolina Stańczak, and Siva Reddy. 2025 - [An Illusion of Progress? Assessing the Current State of Web Agents](https://arxiv.org/pdf/2504.01382) by Xue, Tianci, Weijian Qi, Tianneng Shi, Chan Hee Song, Boyu Gou, Dawn Song, Huan Sun, and Yu Su. 2025 - [Workarena: How capable are web agents at solving common knowledge work tasks?](https://arxiv.org/pdf/2403.07718) by Drouin, Alexandre, Maxime Gasse, Massimo Caccia, Issam H. Laradji, Manuel Del Verme, Tom Marty, Léo Boisvert et al. 2024