Bicocca Open Archive

Large Language Models (LLMs) are increasingly used to answer factual, information-seeking questions (ISQs). While prior work often focuses on false misleading information, little attention has been paid to true but strategically persuasive content that can derail a model’s reasoning. To address this gap, we introduce a new evaluation dataset, TRUTH-TRAP, in two languages, i.e., English and Farsi, on Iran-related ISQs, each paired with a correct explanation and a true-yet-misleading hint. We then evaluate nine diverse LLMs (spanning proprietary and open-source systems) via factuality classification and multiple-choice QA tasks, finding that accuracy drops by 25%, on average, when models encounter these misleading yet factual hints. Also, the models’ predictions match the hint-aligned options up to 77 percent of the time. Notably, models often misjudge such hints in isolation yet still integrate them into final answers. Our results highlight a significant limitation in LLM outputs, underscoring the importance of robust fact-verification and emphasizing real-world risks posed by partial truths in domains like social media, education, and policy-making. Our dataset is openly available at https://github.com/Mamin78/ truthtrap_with_code.

Shafiei, M., Saffari, H., Pilehvar, M., Raganato, A. (2026). TruthTrap: A Bilingual Benchmark for Evaluating Factually Correct Yet Misleading Information in Question Answering. In Findings of the Association for Computational Linguistics: EACL 2026 (pp.2966-2987). Association for Computational Linguistics (ACL) [10.18653/v1/2026.findings-eacl.155].

TruthTrap: A Bilingual Benchmark for Evaluating Factually Correct Yet Misleading Information in Question Answering

Shafiei M.;Saffari H.;Pilehvar M. T.;Raganato A.

2026

Abstract

Large Language Models (LLMs) are increasingly used to answer factual, information-seeking questions (ISQs). While prior work often focuses on false misleading information, little attention has been paid to true but strategically persuasive content that can derail a model’s reasoning. To address this gap, we introduce a new evaluation dataset, TRUTH-TRAP, in two languages, i.e., English and Farsi, on Iran-related ISQs, each paired with a correct explanation and a true-yet-misleading hint. We then evaluate nine diverse LLMs (spanning proprietary and open-source systems) via factuality classification and multiple-choice QA tasks, finding that accuracy drops by 25%, on average, when models encounter these misleading yet factual hints. Also, the models’ predictions match the hint-aligned options up to 77 percent of the time. Notably, models often misjudge such hints in isolation yet still integrate them into final answers. Our results highlight a significant limitation in LLM outputs, underscoring the importance of robust fact-verification and emphasizing real-world risks posed by partial truths in domains like social media, education, and policy-making. Our dataset is openly available at https://github.com/Mamin78/ truthtrap_with_code.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				NLP
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				19th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2026 - 24 March 2026 - 29 March 2026
			
	Anno del convegno
	
				2026
			
	Titolo degli atti
	
				Findings of the Association for Computational Linguistics: EACL 2026
			
	ISBN del volume degli atti
	
				9798891763869
			
	Data di pubblicazione
	
				2026
			
	Pagina iniziale
	
				2966
			
	Pagina finale
	
				2987
			
	DOI dell'intervento
	
				https://dx.doi.org/10.18653/v1/2026.findings-eacl.155
			
	Fulltext
	
				none
			
	Citazione
	
				Shafiei, M., Saffari, H., Pilehvar, M., Raganato, A. (2026). TruthTrap: A Bilingual Benchmark for Evaluating Factually Correct Yet Misleading Information in Question Answering. In Findings of the Association for Computational Linguistics: EACL 2026 (pp.2966-2987). Association for Computational Linguistics (ACL) [10.18653/v1/2026.findings-eacl.155].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/611401

Citazioni

0

ND

Social impact