Despite the vast literature on emotion recognition, intra- and inter-subject variability and emotional cultural differences are still outstanding challenges that limit the state-of-the-art model’s generalization ability and robustness to out-of-training distribution data. We argue that potential solution to these problems could be based on the use of unlabeled large-scale datasets available online, in particular those providing multi-modal streams, whose availability is increasing. The aim of this work is to explore the use of multi-modal large datasets, with both EEG and Eye-tracking data streams, to increase the robustness of an emotion recognition downstream task. Three data sets on different scales, with data from different numbers of subjects (117, 47, and 16 subjects) for different pretext tasks (gaze estimation, attention type recognition, and emotion recognition), were used for self-supervised pretraining of a deep learning model and compared with the performance obtained under fully supervised training with a small emotion recognition dataset, SEED-IV (15 subjects). The use of unlabeled multimodal datasets has shown promising results to improve emotion recognition robustness using Eye-related data, although further research is needed to fully benefit from the unprecedented amount of data available in the near future.

Ramos, I., Gianini, G., Damiani, E. (2025). Unlabeled Multimodal Datasets for Robust Emotion Recognition. In R. Chbeir, E. Damiani, S. Dustdar, Y. Manolopoulos, E. Masciari, E. Pitoura, et al. (a cura di), Management of Digital EcoSystems 16th International Conference, MEDES 2024, Naples, Italy, November 18–20, 2024, Proceedings (pp. 131-144). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-93598-5_10].

Unlabeled Multimodal Datasets for Robust Emotion Recognition

Gianini, Gabriele
Secondo
;
2025

Abstract

Despite the vast literature on emotion recognition, intra- and inter-subject variability and emotional cultural differences are still outstanding challenges that limit the state-of-the-art model’s generalization ability and robustness to out-of-training distribution data. We argue that potential solution to these problems could be based on the use of unlabeled large-scale datasets available online, in particular those providing multi-modal streams, whose availability is increasing. The aim of this work is to explore the use of multi-modal large datasets, with both EEG and Eye-tracking data streams, to increase the robustness of an emotion recognition downstream task. Three data sets on different scales, with data from different numbers of subjects (117, 47, and 16 subjects) for different pretext tasks (gaze estimation, attention type recognition, and emotion recognition), were used for self-supervised pretraining of a deep learning model and compared with the performance obtained under fully supervised training with a small emotion recognition dataset, SEED-IV (15 subjects). The use of unlabeled multimodal datasets has shown promising results to improve emotion recognition robustness using Eye-related data, although further research is needed to fully benefit from the unprecedented amount of data available in the near future.
Capitolo o saggio
Multi-modal Emotion Recognition; Self-supervised learning; Unlabeled datasets;
English
Management of Digital EcoSystems 16th International Conference, MEDES 2024, Naples, Italy, November 18–20, 2024, Proceedings
Chbeir, R; Damiani, E; Dustdar, S; Manolopoulos, Y; Masciari, E; Pitoura, E; Rinaldi, A
2025
9783031935978
2518
Springer Science and Business Media Deutschland GmbH
131
144
Ramos, I., Gianini, G., Damiani, E. (2025). Unlabeled Multimodal Datasets for Robust Emotion Recognition. In R. Chbeir, E. Damiani, S. Dustdar, Y. Manolopoulos, E. Masciari, E. Pitoura, et al. (a cura di), Management of Digital EcoSystems 16th International Conference, MEDES 2024, Naples, Italy, November 18–20, 2024, Proceedings (pp. 131-144). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-93598-5_10].
partially_open
File in questo prodotto:
File Dimensione Formato  
Ramos-2025-MEDES 2024_Unlabeled Multimodal Datasets-VoR.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 1.2 MB
Formato Adobe PDF
1.2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Ramos-2025-MEDES 2024_Unlabeled Multimodal Datasets-preprint.pdf

accesso aperto

Tipologia di allegato: Submitted Version (Pre-print)
Licenza: Altro
Dimensione 918.54 kB
Formato Adobe PDF
918.54 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/563762
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
Social impact