Feature Selection for Gene Expression Data Based on Hilbert-Schmidt Independence Criterion

Zarkoob, Hadi

dc.contributor.author	Zarkoob, Hadi
dc.date.accessioned	2010-05-25 13:19:13 (GMT)
dc.date.available	2010-05-25 13:19:13 (GMT)
dc.date.issued	2010-05-25T13:19:13Z
dc.date.submitted	2010-05-21
dc.identifier.uri	http://hdl.handle.net/10012/5247
dc.description.abstract	DNA microarrays are capable of measuring expression levels of thousands of genes, even the whole genome, in a single experiment. Based on this, they have been widely used to extend the studies of cancerous tissues to a genomic level. One of the main goals in DNA microarray experiments is to identify a set of relevant genes such that the desired outputs of the experiment mostly depend on this set, to the exclusion of the rest of the genes. This is motivated by the fact that the biological process in cell typically involves only a subset of genes, and not the whole genome. The task of selecting a subset of relevant genes is called feature (gene) selection. Herein, we propose a feature selection algorithm for gene expression data. It is based on the Hilbert-Schmidt independence criterion, and partly motivated by Rank-One Downdate (R1D) and the Singular Value Decomposition (SVD). The algorithm is computationally very fast and scalable to large data sets, and can be applied to response variables of arbitrary type (categorical and continuous). Experimental results of the proposed technique are presented on some synthetic and well-known microarray data sets. Later, we discuss the capability of HSIC in providing a general framework which encapsulates many widely used techniques for dimensionality reduction, clustering and metric learning. We will use this framework to explain two metric learning algorithms, namely the Fisher discriminant analysis (FDA) and closed form metric learning (CFML). As a result of this framework, we are able to propose a new metric learning method. The proposed technique uses the concepts from normalized cut spectral clustering and is associated with an underlying convex optimization problem.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	Feature selection	en
dc.subject	Hilbert-Schmidt Independence Criterion	en
dc.subject	Gene expression data	en
dc.title	Feature Selection for Gene Expression Data Based on Hilbert-Schmidt Independence Criterion	en
dc.type	Master Thesis	en
dc.pending	false	en
dc.subject.program	Statistics	en
uws-etd.degree.department	Statistics and Actuarial Science	en
uws-etd.degree	Master of Mathematics	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Zarkoob_Hadi.pdf
Size:: 1.312Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record