Show simple item record

dc.contributor.authorWytsma, Alexandra
dc.date.accessioned2017-09-21 17:47:51 (GMT)
dc.date.available2017-09-21 17:47:51 (GMT)
dc.date.issued2017-09-21
dc.date.submitted2017-08-22
dc.identifier.urihttp://hdl.handle.net/10012/12423
dc.description.abstractThe naive Bayes model is a simple model that has been used for many decades, often as a baseline, for both supervised and unsupervised learning. With a latent class variable it is one of the simplest latent variable models, and is often used for clustering. The estimation of its parameters by maximum likelihood (e.g. using gradient ascent, expectation maximization) is subject to local optima since the objective is non-concave. However, the conditions under which global optimality can be guaranteed are currently unknown. I provide a first characterization of the optima of the na ̈ıve Bayes model. For problems with up to three features, I describe comprehensive conditions that ensure global optimality. For more than three features, I show that all stationary points exhibit marginal distributions with respect to the features that match those of the training data. In a second line of work, I consider the naive Bayes model with an observed class variable, which is often used for classification. Well known results provide some upper bounds on order of the sample complexity for agnostic PAC learning, however exact bounds are unknown. These bounds would show exactly how much data is needed for model training using a particular algorithm. I detail the framework for determining an exact tight bound on sample complexity, and prove some of the sub-theorems that this framework rests on. I also provide some insight into the nature of the distributions that are hardest to model within specified accuracy parameters.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectmachine learningen
dc.subjectoptimizationen
dc.subjectsample complexityen
dc.titleNaive Bayes Data Complexity and Characterization of Optima of the Unsupervised Expected Likelihooden
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws.contributor.advisorPoupart, Pascal
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages