Bicocca Open Archive

Large Language Models (LLMs) have achieved remarkable success in generating human-like text and are increasingly integrated into real-world applications. However, their deployment raises significant safety concerns, including the risk of generating harmful, biased, or culturally inappropriate content. While several safety benchmarks exist for English, non-English contexts—such as Italian—remain critically underexplored, despite the growing demand for localized and culturally sensitive AI technologies. In this paper, we introduce BeaverTails-IT, the first Italian safety benchmark for LLMs, created through the machine translation of the original English BeaverTails dataset. We employ five state-of-the-art translation models, evaluate translation quality using automated metrics and human judgments, and provide guidelines for selecting high-quality safety prompts. Our benchmark enables the preliminary evaluation of Italian LLMs across key safety dimensions such as toxicity, bias, and ethical compliance. Beyond presenting the translated dataset, we offer a detailed analysis of its limitations, highlighting the challenges of using translated content as a proxy for native benchmarks. Our findings demonstrate the need for a dedicated, culturally grounded Italian safety benchmark to ensure effective and contextually appropriate evaluations.

Magazzù, G., Sormani, A., Rizzi, G., Pulerà, F., Scalena, D., Cariddi, S., et al. (2025). BeaverTails-IT: Towards a Safety Benchmark for Evaluating Italian Large Language Models. In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) (pp.625-635). CEUR Workshop Proceedings.

BeaverTails-IT: Towards a Safety Benchmark for Evaluating Italian Large Language Models

Magazzù Giuseppe;Sormani Alberto;Rizzi Giulia;Pulerà Francesca;Scalena Daniel;Cariddi Stefano;Michielon Edoardo;Pasqualini Marco;Stamile Claudio;Fersini Elisabetta

2025

Abstract

Large Language Models (LLMs) have achieved remarkable success in generating human-like text and are increasingly integrated into real-world applications. However, their deployment raises significant safety concerns, including the risk of generating harmful, biased, or culturally inappropriate content. While several safety benchmarks exist for English, non-English contexts—such as Italian—remain critically underexplored, despite the growing demand for localized and culturally sensitive AI technologies. In this paper, we introduce BeaverTails-IT, the first Italian safety benchmark for LLMs, created through the machine translation of the original English BeaverTails dataset. We employ five state-of-the-art translation models, evaluate translation quality using automated metrics and human judgments, and provide guidelines for selecting high-quality safety prompts. Our benchmark enables the preliminary evaluation of Italian LLMs across key safety dimensions such as toxicity, bias, and ethical compliance. Beyond presenting the translated dataset, we offer a detailed analysis of its limitations, highlighting the challenges of using translated content as a proxy for native benchmarks. Our findings demonstrate the need for a dedicated, culturally grounded Italian safety benchmark to ensure effective and contextually appropriate evaluations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Italian Benchmark; Large Language Models (LLMs); Machine Translation; Safety Evaluation;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) - September 24-26, 2025
			
	Anno del convegno
	
				2025
			
	Curatori della monografia
	
				Bosco, C; Jezek, E; Polignano, M; Sanguinetti, M
			
	Titolo degli atti
	
				Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
			
	ISBN del volume degli atti
	
				9791224305873
			
	Collana o serie
	
				CEUR WORKSHOP PROCEEDINGS
			
	Data di pubblicazione
	
				2025
			
	Numero del volume
	
				4112
			
	Pagina iniziale
	
				625
			
	Pagina finale
	
				635
			
	URL alternativo
	
				https://ceur-ws.org/Vol-4112/
			
	Fulltext
	
				open
			
	Citazione
	
				Magazzù, G., Sormani, A., Rizzi, G., Pulerà, F., Scalena, D., Cariddi, S., et al. (2025). BeaverTails-IT: Towards a Safety Benchmark for Evaluating Italian Large Language Models. In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) (pp.625-635). CEUR Workshop Proceedings.
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Magazzù et al-2025-CLiC-it-CEUR-VoR.pdf accesso aperto Descrizione: CEUR Workshop Proceedings Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 1.37 MB Formato Adobe PDF Visualizza/Apri	1.37 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/591547

Citazioni

0

ND

Social impact