Large Language Models (LLMs) have achieved remarkable success in generating human-like text and are increasingly integrated into real-world applications. However, their deployment raises significant safety concerns, including the risk of generating harmful, biased, or culturally inappropriate content. While several safety benchmarks exist for English, non-English contexts—such as Italian—remain critically underexplored, despite the growing demand for localized and culturally sensitive AI technologies. In this paper, we introduce BeaverTails-IT, the first Italian safety benchmark for LLMs, created through the machine translation of the original English BeaverTails dataset. We employ five state-of-the-art translation models, evaluate translation quality using automated metrics and human judgments, and provide guidelines for selecting high-quality safety prompts. Our benchmark enables the preliminary evaluation of Italian LLMs across key safety dimensions such as toxicity, bias, and ethical compliance. Beyond presenting the translated dataset, we offer a detailed analysis of its limitations, highlighting the challenges of using translated content as a proxy for native benchmarks. Our findings demonstrate the need for a dedicated, culturally grounded Italian safety benchmark to ensure effective and contextually appropriate evaluations.

Magazzù, G., Sormani, A., Rizzi, G., Pulerà, F., Scalena, D., Cariddi, S., et al. (2025). BeaverTails-IT: Towards a Safety Benchmark for Evaluating Italian Large Language Models. In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) (pp.625-635). CEUR Workshop Proceedings.

BeaverTails-IT: Towards a Safety Benchmark for Evaluating Italian Large Language Models

Magazzù Giuseppe;Rizzi Giulia;Pulerà Francesca;Scalena Daniel;Fersini Elisabetta
2025

Abstract

Large Language Models (LLMs) have achieved remarkable success in generating human-like text and are increasingly integrated into real-world applications. However, their deployment raises significant safety concerns, including the risk of generating harmful, biased, or culturally inappropriate content. While several safety benchmarks exist for English, non-English contexts—such as Italian—remain critically underexplored, despite the growing demand for localized and culturally sensitive AI technologies. In this paper, we introduce BeaverTails-IT, the first Italian safety benchmark for LLMs, created through the machine translation of the original English BeaverTails dataset. We employ five state-of-the-art translation models, evaluate translation quality using automated metrics and human judgments, and provide guidelines for selecting high-quality safety prompts. Our benchmark enables the preliminary evaluation of Italian LLMs across key safety dimensions such as toxicity, bias, and ethical compliance. Beyond presenting the translated dataset, we offer a detailed analysis of its limitations, highlighting the challenges of using translated content as a proxy for native benchmarks. Our findings demonstrate the need for a dedicated, culturally grounded Italian safety benchmark to ensure effective and contextually appropriate evaluations.
paper
Italian Benchmark; Large Language Models (LLMs); Machine Translation; Safety Evaluation;
English
Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) - September 24-26, 2025
2025
Bosco, C; Jezek, E; Polignano, M; Sanguinetti, M
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
9791224305873
2025
4112
625
635
https://ceur-ws.org/Vol-4112/
open
Magazzù, G., Sormani, A., Rizzi, G., Pulerà, F., Scalena, D., Cariddi, S., et al. (2025). BeaverTails-IT: Towards a Safety Benchmark for Evaluating Italian Large Language Models. In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) (pp.625-635). CEUR Workshop Proceedings.
File in questo prodotto:
File Dimensione Formato  
Magazzù et al-2025-CLiC-it-CEUR-VoR.pdf

accesso aperto

Descrizione: CEUR Workshop Proceedings
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 1.37 MB
Formato Adobe PDF
1.37 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/591547
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact