Bicocca Open Archive

Causal Discovery (CD) identifies cause-and-effect relationships from data using statistical learning. Several CD algorithms have been proposed relying on different assumptions, e.g. about the statistical relations among variables. However, which assumptions actually hold for a specific case study is not known a priori. Given a dataset obtained by sampling the joint distribution of all variables of a generative causal model, in general each algorithm could reconstruct a different Direct Acyclic Graph (DAG): some will be closer to the ground truth (GT) DAG than others, depending also on the applicability of the respective assumptions to the case study. As a consequence, given a collection of heterogeneous case studies, a hypothetical GT-aware oracle, able to select the best DAG out of the set of reconstructed DAGs, will outclass the average performance of the individual algorithms of the ensemble. In this work, we propose a supervised approach, relying on multilabel classification, to select the DAGs closest to GT by only comparing the topologies of the reconstructed DAGs. We carried out the study on a wide synthetic data set of causal models, sampling DAG topologies up to ten vertices, and using a representative set of linear and non-linear statistical dependencies. Whereas the best individual CD algorithm yields, on average, a distance from GT three times larger than the oracle, our algorithm features an average distance from GT only about 10% larger than the oracle.

Mio, C., Lin, J., Damiani, E., Gianini, G. (2025). Supervised Ensemble-based Causal DAG Selection. In SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing (pp.622-629). Association for Computing Machinery [10.1145/3672608.3707709].

Supervised Ensemble-based Causal DAG Selection

Mio, Corrado^Primo;Lin, Jianyi^Secondo;Damiani, Ernesto^Penultimo;Gianini, Gabriele^Ultimo

2025

Abstract

Causal Discovery (CD) identifies cause-and-effect relationships from data using statistical learning. Several CD algorithms have been proposed relying on different assumptions, e.g. about the statistical relations among variables. However, which assumptions actually hold for a specific case study is not known a priori. Given a dataset obtained by sampling the joint distribution of all variables of a generative causal model, in general each algorithm could reconstruct a different Direct Acyclic Graph (DAG): some will be closer to the ground truth (GT) DAG than others, depending also on the applicability of the respective assumptions to the case study. As a consequence, given a collection of heterogeneous case studies, a hypothetical GT-aware oracle, able to select the best DAG out of the set of reconstructed DAGs, will outclass the average performance of the individual algorithms of the ensemble. In this work, we propose a supervised approach, relying on multilabel classification, to select the DAGs closest to GT by only comparing the topologies of the reconstructed DAGs. We carried out the study on a wide synthetic data set of causal models, sampling DAG topologies up to ten vertices, and using a representative set of linear and non-linear statistical dependencies. Whereas the best individual CD algorithm yields, on average, a distance from GT three times larger than the oracle, our algorithm features an average distance from GT only about 10% larger than the oracle.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				causal discovery; D-separation based distance; ensemble approach; model selection; multi-label classification; structural hamming distance; structural intervention distance;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				40th ACM/SIGAPP Symposium on Applied Computing - 31 March 2025- 4 April 2025
			
	Anno del convegno
	
				2025
			
	Titolo degli atti
	
				SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing
			
	ISBN del volume degli atti
	
				9798400706295
			
	Data di pubblicazione
	
				2025
			
	Pagina iniziale
	
				622
			
	Pagina finale
	
				629
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1145/3672608.3707709
			
	URL alternativo
	
				https://dl.acm.org/doi/abs/10.1145/3672608.3707709
			
	Fulltext
	
				open
			
	Citazione
	
				Mio, C., Lin, J., Damiani, E., Gianini, G. (2025). Supervised Ensemble-based Causal DAG Selection. In SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing (pp.622-629). Association for Computing Machinery [10.1145/3672608.3707709].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Mio-2025-ACM/SIGAPP Symposium on Applied Computing-VoR.pdf accesso aperto Descrizione: Supervised Ensemble-based Causal DAG Selection Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 1.73 MB Formato Adobe PDF Visualizza/Apri	1.73 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/552841

Citazioni

0

0

Social impact