Music Emotion Recognition (MER) is a challenging task considering the nuances of defining emotions. While unimodal models provide a good baseline for MER, multimodal models are becoming fundamental to provide an in-depth description of emotions. Leveraging on the multimodal MERGE dataset, we investigate the power of audio-related deep embeddings, lyrics informed features, and music-aware cues in providing an informative set of features for low-impact computational learning models. Results confirm that multimodal fusion outperforms unimodal approaches. Moreover, different experiments highlight the positive contribution of genre metadata and the potential use of harmonic features for real-time computationally low-impact applications. These findings confirm the importance of multimodal integration for robust and interpretable emotion recognition systems, while opening up future directions, including advanced feature fusion, user-specific model adaptation (user-tuning), and multi-label emotion representation.

Novacco, A., Gasparini, F., Rizzi, G., Saibene, A. (2026). Decoding Emotions: Multimodal Integration of Deep Embeddings, Lyrics and Music-Aware Cues. In Artificial Intelligence in Music, Sound, Art and Design - 15th International Conference, EvoMUSART 2026, Held as Part of EvoStar 2026, Toulouse, France, April 8–10, 2026, Proceedings (pp.367-382). Springer [10.1007/978-3-032-24350-8_24].

Decoding Emotions: Multimodal Integration of Deep Embeddings, Lyrics and Music-Aware Cues

Gasparini, Francesca
;
Rizzi, Giulia;Saibene, Aurora
Ultimo
2026

Abstract

Music Emotion Recognition (MER) is a challenging task considering the nuances of defining emotions. While unimodal models provide a good baseline for MER, multimodal models are becoming fundamental to provide an in-depth description of emotions. Leveraging on the multimodal MERGE dataset, we investigate the power of audio-related deep embeddings, lyrics informed features, and music-aware cues in providing an informative set of features for low-impact computational learning models. Results confirm that multimodal fusion outperforms unimodal approaches. Moreover, different experiments highlight the positive contribution of genre metadata and the potential use of harmonic features for real-time computationally low-impact applications. These findings confirm the importance of multimodal integration for robust and interpretable emotion recognition systems, while opening up future directions, including advanced feature fusion, user-specific model adaptation (user-tuning), and multi-label emotion representation.
poster + paper
Music Emotion Recognition (MER); Multimodal Learning; Emotion-aware System
English
15th International Conference, EvoMUSART 2026, Held as Part of EvoStar 2026 - April 8–10, 2026
2026
Machado, P; Romero, JJ; Rebelo, SM
Artificial Intelligence in Music, Sound, Art and Design - 15th International Conference, EvoMUSART 2026, Held as Part of EvoStar 2026, Toulouse, France, April 8–10, 2026, Proceedings
9783032243492
2026
367
382
reserved
Novacco, A., Gasparini, F., Rizzi, G., Saibene, A. (2026). Decoding Emotions: Multimodal Integration of Deep Embeddings, Lyrics and Music-Aware Cues. In Artificial Intelligence in Music, Sound, Art and Design - 15th International Conference, EvoMUSART 2026, Held as Part of EvoStar 2026, Toulouse, France, April 8–10, 2026, Proceedings (pp.367-382). Springer [10.1007/978-3-032-24350-8_24].
File in questo prodotto:
File Dimensione Formato  
Novacco et al-2026-EvoMUSART-VoR.pdf

Solo gestori archivio

Descrizione: Articolo originale
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 2.18 MB
Formato Adobe PDF
2.18 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/611041
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact