Show simple item record

dc.contributor.authorPal, David
dc.date.accessioned2009-05-22 14:03:10 (GMT)
dc.date.available2009-05-22 14:03:10 (GMT)
dc.date.issued2009-05-22T14:03:10Z
dc.date.submitted2009-05-21
dc.identifier.urihttp://hdl.handle.net/10012/4445
dc.description.abstractThis thesis studies two problems in theoretical machine learning. The first part of the thesis investigates the statistical stability of clustering algorithms. In the second part, we study the relative advantage of having unlabeled data in classification problems. Clustering stability was proposed and used as a model selection method in clustering tasks. The main idea of the method is that from a given data set two independent samples are taken. Each sample individually is clustered with the same clustering algorithm, with the same setting of its parameters. If the two resulting clusterings turn out to be close in some metric, it is concluded that the clustering algorithm and the setting of its parameters match the data set, and that clusterings obtained are meaningful. We study asymptotic properties of this method for certain types of cost minimizing clustering algorithms and relate their asymptotic stability to the number of optimal solutions of the underlying optimization problem. In classification problems, it is often expensive to obtain labeled data, but on the other hand, unlabeled data are often plentiful and cheap. We study how the access to unlabeled data can decrease the amount of labeled data needed in the worst-case sense. We propose an extension of the probably approximately correct (PAC) model in which this question can be naturally studied. We show that for certain basic tasks the access to unlabeled data might, at best, halve the amount of labeled data needed.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectmachine learningen
dc.subjectstatisticsen
dc.subjectunsupervised learningen
dc.subjectsemi-supervised learningen
dc.subjectlearning theoryen
dc.titleContributions to Unsupervised and Semi-Supervised Learningen
dc.typeDoctoral Thesisen
dc.pendingfalseen
dc.subject.programComputer Scienceen
uws-etd.degree.departmentSchool of Computer Scienceen
uws-etd.degreeDoctor of Philosophyen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages