Large Language Models (LLMs) have achieved remarkable success in generating human-like text and are increasingly integrated into real-world applications. However, their deployment raises significant safety concerns, including the risk of generating harmful, biased, or culturally inappropriate content. While several safety benchmarks exist for English, non-English contexts—such as Italian—remain critically underexplored, despite the growing demand for localized and culturally sensitive AI technologies. In this paper, we introduce BeaverTails-IT, the first Italian safety benchmark for LLMs, created through the machine translation of the original English BeaverTails dataset. We employ five state-of-the-art translation models, evaluate translation quality using automated metrics and human judgments, and provide guidelines for selecting high-quality safety prompts. Our benchmark enables the preliminary evaluation of Italian LLMs across key safety dimensions such as toxicity, bias, and ethical compliance. Beyond presenting the translated dataset, we offer a detailed analysis of its limitations, highlighting the challenges of using translated content as a proxy for native benchmarks. Our findings demonstrate the need for a dedicated, culturally grounded Italian safety benchmark to ensure effective and contextually appropriate evaluations.
Magazzù, G., Sormani, A., Rizzi, G., Pulerà, F., Scalena, D., Cariddi, S., et al. (2025). BeaverTails-IT: Towards a Safety Benchmark for Evaluating Italian Large Language Models. In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) (pp.625-635). CEUR Workshop Proceedings.
BeaverTails-IT: Towards a Safety Benchmark for Evaluating Italian Large Language Models
Magazzù Giuseppe;Rizzi Giulia;Pulerà Francesca;Scalena Daniel;Fersini Elisabetta
2025
Abstract
Large Language Models (LLMs) have achieved remarkable success in generating human-like text and are increasingly integrated into real-world applications. However, their deployment raises significant safety concerns, including the risk of generating harmful, biased, or culturally inappropriate content. While several safety benchmarks exist for English, non-English contexts—such as Italian—remain critically underexplored, despite the growing demand for localized and culturally sensitive AI technologies. In this paper, we introduce BeaverTails-IT, the first Italian safety benchmark for LLMs, created through the machine translation of the original English BeaverTails dataset. We employ five state-of-the-art translation models, evaluate translation quality using automated metrics and human judgments, and provide guidelines for selecting high-quality safety prompts. Our benchmark enables the preliminary evaluation of Italian LLMs across key safety dimensions such as toxicity, bias, and ethical compliance. Beyond presenting the translated dataset, we offer a detailed analysis of its limitations, highlighting the challenges of using translated content as a proxy for native benchmarks. Our findings demonstrate the need for a dedicated, culturally grounded Italian safety benchmark to ensure effective and contextually appropriate evaluations.| File | Dimensione | Formato | |
|---|---|---|---|
|
Magazzù et al-2025-CLiC-it-CEUR-VoR.pdf
accesso aperto
Descrizione: CEUR Workshop Proceedings
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
1.37 MB
Formato
Adobe PDF
|
1.37 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


