Pattern Discovery and Disentanglement for Clinical Data Analysis

Zhou, Pei-Yuan

UWSpace will be migrating to a new version of its software from July 29th to August 1st. UWSpace will be offline for all UW community members during this time.

Show simple item record

dc.contributor.author	Zhou, Pei-Yuan
dc.date.accessioned	2020-09-10 15:38:39 (GMT)
dc.date.available	2020-09-10 15:38:39 (GMT)
dc.date.issued	2020-09-10
dc.date.submitted	2020-08-27
dc.identifier.uri	http://hdl.handle.net/10012/16276
dc.description.abstract	In recent years, machine learning approaches have important empirical successes on analysing data such as images, signals, texts and speeches with applications in biomedical and clinical areas. However, from the perspective of modelling, many machine learning methods still encounter crucial problems such as the lack of transparency and interpretability. Frequent Pattern Mining or Association Mining methods intend to solve the problem of interpretability, but they also encounter serious problems such as requiring exhaustive search and producing overwhelming numbers of patterns. From the perspective of data analysis, they do not render high prediction accuracy particularly for data with low volume, rare or imbalanced groups, rare cases or biases due to subtle overlapping or entanglement of the statistical and functional associations at the data source level. Hence, Professor Andrew K.C. Wong and I have developed a novel Pattern Discovery and Disentanglement (PDD) Method to discover explicit patterns and unveil knowledge from relational datasets even encompassing imbalanced groups, biases and anomalies. The statistically significant high-order patterns, pattern clusters and rare patterns are discovered in the disentangled Attribute Value Association (AVA) Spaces. They may be embedded in a relational dataset but overlapping or entangled with each other so that they are masked or obscured at the data level. The patterns discovered from the disentangled association source can be used for explicitly interpreting the original data, predicting the functional groups/classes and detecting anomalies and/or outliers. When class labels are not given, pattern/entity clustering can be more effectively discovered from the disentangled attribute value association (AVA) space than from the original records. The objective of this Master Thesis is to develop and validate the efficacy of PDD for genomic and clinical data analysis using a) protein sequence data, b) public clinical records from UCI dataset and c) a clinical dataset obtained from the School of Public Health and Health Systems at the University of Waterloo. The experimental results with superior performance in unsupervised and supervised learning than existing methods are presented in interpretable knowledge representation frameworks, interlinking the AVA disentangled sources, patterns, pattern/entity clusters and individual entities. In the clinical cases, it reveals the symptomatic patterns of individual patients, disease complexes/groups and subtle etiological sources. Hence it will have impacts in machine learning on genomic and clinical data with broad applications.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	pattern discovery	en
dc.subject	clinical data analysis	en
dc.subject	machine learning	en
dc.subject	data mining	en
dc.title	Pattern Discovery and Disentanglement for Clinical Data Analysis	en
dc.type	Master Thesis	en
dc.pending	false
uws-etd.degree.department	Systems Design Engineering	en
uws-etd.degree.discipline	System Design Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Master of Applied Science	en
uws.contributor.advisor	Wong, Andrew
uws.contributor.affiliation1	Faculty of Engineering	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Zhou_Peiyuan.pdf
Size:: 3.871Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record