Improved Bayesian Network Structure Learning in the Model Averaging Paradigm

Liao, Zhenyu

dc.contributor.author	Liao, Zhenyu
dc.date.accessioned	2023-01-10 18:04:54 (GMT)
dc.date.available	2023-01-10 18:04:54 (GMT)
dc.date.issued	2023-01-10
dc.date.submitted	2022-12-24
dc.identifier.uri	http://hdl.handle.net/10012/19047
dc.description.abstract	A Bayesian network (BN) is a probabilistic graphical model with applications in knowledge discovery and prediction. Its structure can be learned from data using the well-known score-and-search approach, where a scoring function is used to evaluate the fit of a proposed BN to the data in an unsupervised manner, and the space of directed acyclic graphs is searched for the best-scoring BNs. However, selecting a single model (i.e., the best-scoring BN) is often not the best choice. When one is learning a BN from limited data, selecting a single model may be misleading as there may be many other BNs that have scores that are close to optimal, and the posterior probability of even the best-scoring BN is often close to zero. A more preferred alternative to committing to a single model is to perform some form of Bayesian or frequentist model averaging. A widely used data analysis methodology is to: (i) learn a set of plausible networks that fit the data, (ii) perform model averaging to obtain confidence measure for each edge, and (iii) select a threshold and report all edges with confidence higher than the threshold. In this manner, a representative network can be constructed from the edges that are deemed significant that can then be examined for probabilistic dependencies and possible cause-effect relations. This thesis presents several improvements to Bayesian network structure learning that benefit the data analysis methodology. We propose a novel approach to model averaging inspired by performance guarantees in approximation algorithms. Our approach has two primary advantages. First, our approach only considers credible models in that they are optimal or near-optimal in score. Second, our approach is more efficient and scales to significantly larger Bayesian networks than existing approaches. We empirically study a selection of widely used and also recently proposed scoring functions. We address design limitations of previous empirical studies by scaling our experiments to larger BNs, comparing on an extensive set of both ground truth BNs and real-world datasets, considering alternative performance metrics, and comparing scoring functions on two model averaging frameworks: the bootstrap and the credible set. Contrary to previous recommendations based on finding a single structure, we find that for model averaging the BDeu scoring function is the preferred choice in most scenarios for the bootstrap framework and a recent score qNML is the preferred choice for the credible set framework. We identify an important shortcoming in a widely used threshold selection method. We then propose a simple transfer learning approach for maximizing target metrics and selecting a threshold that can be generalized from proxy datasets to the target dataset and show on an extensive set of benchmarks that it can perform significantly better than previous approaches. We demonstrate via ensemble methods that combining results from multiple scores significantly improve both the bootstrap and the credible set approach on various metrics, and that combining all scores from both approaches still yields better results.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	Bayesian network	en
dc.subject	structure learning	en
dc.subject	machine learning	en
dc.title	Improved Bayesian Network Structure Learning in the Model Averaging Paradigm	en
dc.type	Doctoral Thesis	en
dc.pending	false
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	van Beek, Peter
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Liao_Zhenyu.pdf
Size:: 2.287Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record