Large Language Models (LLMs) are increasingly used to answer factual, information-seeking questions (ISQs). While prior work often focuses on false misleading information, little attention has been paid to true but strategically persuasive content that can derail a model’s reasoning. To address this gap, we introduce a new evaluation dataset, TRUTH-TRAP, in two languages, i.e., English and Farsi, on Iran-related ISQs, each paired with a correct explanation and a true-yet-misleading hint. We then evaluate nine diverse LLMs (spanning proprietary and open-source systems) via factuality classification and multiple-choice QA tasks, finding that accuracy drops by 25%, on average, when models encounter these misleading yet factual hints. Also, the models’ predictions match the hint-aligned options up to 77 percent of the time. Notably, models often misjudge such hints in isolation yet still integrate them into final answers. Our results highlight a significant limitation in LLM outputs, underscoring the importance of robust fact-verification and emphasizing real-world risks posed by partial truths in domains like social media, education, and policy-making. Our dataset is openly available at https://github.com/Mamin78/ truthtrap_with_code.

Shafiei, M., Saffari, H., Pilehvar, M., Raganato, A. (2026). TruthTrap: A Bilingual Benchmark for Evaluating Factually Correct Yet Misleading Information in Question Answering. In Findings of the Association for Computational Linguistics: EACL 2026 (pp.2966-2987). Association for Computational Linguistics (ACL) [10.18653/v1/2026.findings-eacl.155].

TruthTrap: A Bilingual Benchmark for Evaluating Factually Correct Yet Misleading Information in Question Answering

Raganato A.
2026

Abstract

Large Language Models (LLMs) are increasingly used to answer factual, information-seeking questions (ISQs). While prior work often focuses on false misleading information, little attention has been paid to true but strategically persuasive content that can derail a model’s reasoning. To address this gap, we introduce a new evaluation dataset, TRUTH-TRAP, in two languages, i.e., English and Farsi, on Iran-related ISQs, each paired with a correct explanation and a true-yet-misleading hint. We then evaluate nine diverse LLMs (spanning proprietary and open-source systems) via factuality classification and multiple-choice QA tasks, finding that accuracy drops by 25%, on average, when models encounter these misleading yet factual hints. Also, the models’ predictions match the hint-aligned options up to 77 percent of the time. Notably, models often misjudge such hints in isolation yet still integrate them into final answers. Our results highlight a significant limitation in LLM outputs, underscoring the importance of robust fact-verification and emphasizing real-world risks posed by partial truths in domains like social media, education, and policy-making. Our dataset is openly available at https://github.com/Mamin78/ truthtrap_with_code.
paper
NLP
English
19th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2026 - 24 March 2026 - 29 March 2026
2026
Findings of the Association for Computational Linguistics: EACL 2026
9798891763869
2026
2966
2987
none
Shafiei, M., Saffari, H., Pilehvar, M., Raganato, A. (2026). TruthTrap: A Bilingual Benchmark for Evaluating Factually Correct Yet Misleading Information in Question Answering. In Findings of the Association for Computational Linguistics: EACL 2026 (pp.2966-2987). Association for Computational Linguistics (ACL) [10.18653/v1/2026.findings-eacl.155].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/611401
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact