Show simple item record

dc.contributor.authorChang, Bo Yuan 20:02:14 (GMT) 20:02:14 (GMT)
dc.description.abstractOne of the goals for Artificial Intelligence is to achieve human-like intelligence. To that end, several solutions were proposed over the decades, where causal structure discovery was proposed as a viable tool for enabling human-like reasoning. It can be treated as two stages, first causal discovery that examines the cause-effect relationships between variables, which are then used in the second stage, referred to as causal parameter inference, to perform causal inference using counterfactual/logic-like reasoning similar to how human beings approach a problem. Generally speaking, there are two types of causal discovery algorithms: those that work with random variables and those that work with time series data. The focus of this thesis will be on the latter. Performing causal studies on real world dataset is very challenging for time series data as it is prevalent to run into missing values. Currently, all existing causal algorithms require evenly-sampled time series data which unfortunately are not always available. In this thesis I proposed a systems that can address this difficulties that is hindering causal learning on real world datasets. The proposed system performs causal discovery using time series data with missing entries (i.e., sparsely sampled data at varying intervals). The solution put forward for this task is comprised of two parts: data filling with Gaussian Process Regression, and causal learning using a the traditional Vector Autoregressive Model or Machine Learning based approach. For the first part, experiments have shown that Gaussian Process Regression outperformed all the benchmark filling techniques such as K Nearest Neighbour regression, Parametric Linear filling as well as random variable filling. The obtained Root Mean Square Error for GPR filled was the smallest under across all filling percentages, comfortably beating benchmark algorithms by margins (RMSE difference varies from 0.05 to 1.5). As for the second part, an Echo State Network for causal learning is used due to its fast running time and higher prediction capabilities when compared with other causal learning algorithms available in the industry such as algorithms like Structural Expectation Maximization (SEM), and Subsampled Linear Auto-Regression Absolute Coefficients algorithm (SLARAC). When working with a 10 percent missing entries, the proposed system is capable of obtaining an MCC score of 0.31 on a -1 to +1 scale where +1 represents perfect prediction and -1 represents complete no usefulness of the result. The MCC score received from the proposed system significantly outperformed other methods such as SEM and SLARAC. To showcase the ability of the proposed system to adapt causal relationships on real world engineering applications, the experiment was conducted using a chemical refinery dataset called the Tennessee Eastman (TE) dataset.en
dc.publisherUniversity of Waterlooen
dc.subjecttime seriesen
dc.subjectdata fillingen
dc.subjectGranger Causalityen
dc.subjectmachine learningen
dc.subjectcausal inferenceen
dc.subject.lcshTime-series analysis—Computer programsen
dc.subject.lcshMachine learningen
dc.titleMultivariate Time Series Data Causal Discoveryen
dc.typeMaster Thesisen
dc.pendingfalse Design Engineeringen Design Engineeringen of Waterlooen
uws-etd.degreeMaster of Applied Scienceen
uws.contributor.advisorZelek, John
uws.contributor.affiliation1Faculty of Engineeringen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages