Decoding Emotions: Multimodal Integration of Deep Embeddings, Lyrics and Music-Aware Cues

Novacco, A; Gasparini, F; Rizzi, G; Saibene, A

doi:10.1007/978-3-032-24350-8_24

Music Emotion Recognition (MER) is a challenging task considering the nuances of defining emotions. While unimodal models provide a good baseline for MER, multimodal models are becoming fundamental to provide an in-depth description of emotions. Leveraging on the multimodal MERGE dataset, we investigate the power of audio-related deep embeddings, lyrics informed features, and music-aware cues in providing an informative set of features for low-impact computational learning models. Results confirm that multimodal fusion outperforms unimodal approaches. Moreover, different experiments highlight the positive contribution of genre metadata and the potential use of harmonic features for real-time computationally low-impact applications. These findings confirm the importance of multimodal integration for robust and interpretable emotion recognition systems, while opening up future directions, including advanced feature fusion, user-specific model adaptation (user-tuning), and multi-label emotion representation.

Novacco, A., Gasparini, F., Rizzi, G., Saibene, A. (2026). Decoding Emotions: Multimodal Integration of Deep Embeddings, Lyrics and Music-Aware Cues. In Artificial Intelligence in Music, Sound, Art and Design - 15th International Conference, EvoMUSART 2026, Held as Part of EvoStar 2026, Toulouse, France, April 8–10, 2026, Proceedings (pp.367-382). Springer [10.1007/978-3-032-24350-8_24].

Decoding Emotions: Multimodal Integration of Deep Embeddings, Lyrics and Music-Aware Cues

Novacco, Alessia^Primo;Gasparini, Francesca;Rizzi, Giulia;Saibene, Aurora^Ultimo

2026

Abstract

Music Emotion Recognition (MER) is a challenging task considering the nuances of defining emotions. While unimodal models provide a good baseline for MER, multimodal models are becoming fundamental to provide an in-depth description of emotions. Leveraging on the multimodal MERGE dataset, we investigate the power of audio-related deep embeddings, lyrics informed features, and music-aware cues in providing an informative set of features for low-impact computational learning models. Results confirm that multimodal fusion outperforms unimodal approaches. Moreover, different experiments highlight the positive contribution of genre metadata and the potential use of harmonic features for real-time computationally low-impact applications. These findings confirm the importance of multimodal integration for robust and interpretable emotion recognition systems, while opening up future directions, including advanced feature fusion, user-specific model adaptation (user-tuning), and multi-label emotion representation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				poster + paper
			
	Parole chiave
	
				Music Emotion Recognition (MER); Multimodal Learning; Emotion-aware System
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				15th International Conference, EvoMUSART 2026, Held as Part of EvoStar 2026 - April 8–10, 2026
			
	Anno del convegno
	
				2026
			
	Curatori della monografia
	
				Machado, P; Romero, JJ; Rebelo, SM
			
	Titolo degli atti
	
				Artificial Intelligence in Music, Sound, Art and Design - 15th International Conference, EvoMUSART 2026, Held as Part of EvoStar 2026, Toulouse, France, April 8–10, 2026, Proceedings
			
	ISBN del volume degli atti
	
				9783032243492
			
	Collana o serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Data di pubblicazione
	
				2026
			
	Pagina iniziale
	
				367
			
	Pagina finale
	
				382
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1007/978-3-032-24350-8_24
			
	Fulltext
	
				reserved
			
	Citazione
	
				Novacco, A., Gasparini, F., Rizzi, G., Saibene, A. (2026). Decoding Emotions: Multimodal Integration of Deep Embeddings, Lyrics and Music-Aware Cues. In Artificial Intelligence in Music, Sound, Art and Design - 15th International Conference, EvoMUSART 2026, Held as Part of EvoStar 2026, Toulouse, France, April 8–10, 2026, Proceedings (pp.367-382). Springer [10.1007/978-3-032-24350-8_24].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Novacco et al-2026-EvoMUSART-VoR.pdf Solo gestori archivio Descrizione: Articolo originale Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 2.18 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.18 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/611041

Citazioni

ND

ND

Bicocca Open Archive

Decoding Emotions: Multimodal Integration of Deep Embeddings, Lyrics and Music-Aware Cues

Novacco, Alessia^Primo;Gasparini, Francesca;Rizzi, Giulia;Saibene, Aurora^Ultimo

Primo

Ultimo

2026

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

Social impact

Bicocca Open Archive

Decoding Emotions: Multimodal Integration of Deep Embeddings, Lyrics and Music-Aware Cues

Novacco, AlessiaPrimo;Gasparini, Francesca;Rizzi, Giulia;Saibene, AuroraUltimo

Primo

Ultimo

2026

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Citazioni

Social impact

Conferma cancellazione

Novacco, Alessia^Primo;Gasparini, Francesca;Rizzi, Giulia;Saibene, Aurora^Ultimo

Scheda breve

Scheda completa

Scheda completa (DC)