UWSpace will be migrating to a new version of its software from July 29th to August 1st. UWSpace will be offline for all UW community members during this time.
A Theoretical Study of Clusterability and Clustering Quality
Abstract
Clustering is a widely used technique, with applications ranging
from data mining, bioinformatics and image analysis to marketing,
psychology, and city planning. Despite the practical importance of
clustering, there is very limited theoretical analysis of the topic.
We make a step towards building theoretical foundations for
clustering by carrying out an abstract analysis of two central
concepts in clustering; clusterability and clustering quality.
We compare a number of notions of clusterability found in the
literature. While all these notions attempt to measure the same
property, and all appear to be reasonable, we show that they are
pairwise inconsistent. In addition, we give the first computational
complexity analysis of a few notions of clusterability.
In the second part of the thesis, we discuss how the quality of a
given clustering can be defined (and measured). Users often need to
compare the quality of clusterings obtained by different methods.
Perhaps more importantly, users need to determine whether a given
clustering is sufficiently good for being used in further data
mining analysis. We analyze what a measure of clustering quality
should look like. We do that by introducing a set of requirements
(`axioms') of clustering quality measures. We propose a number of
clustering quality measures that satisfy these requirements.
Collections
Cite this version of the work
Margareta Ackerman
(2008).
A Theoretical Study of Clusterability and Clustering Quality. UWSpace.
http://hdl.handle.net/10012/3478
Other formats
Related items
Showing items related by title, author, creator and subject.
-
Theoretical foundations for efficient clustering
Kushagra, Shrinu (University of Waterloo, 2019-06-07)Clustering aims to group together data instances which are similar while simultaneously separating the dissimilar instances. The task of clustering is challenging due to many factors. The most well-studied is the high ... -
Evaluating Clusterings by Estimating Clarity
Whissell, John (University of Waterloo, 2012-10-12)In this thesis I examine clustering evaluation, with a subfocus on text clusterings specifically. The principal work of this thesis is the development, analysis, and testing of a new internal clustering quality measure ... -
Approximation Algorithms for Clustering and Facility Location Problems
Ahmadian, Sara (University of Waterloo, 2017-04-06)Facility location problems arise in a wide range of applications such as plant or warehouse location problems, cache placement problems, and network design problems, and have been widely studied in Computer Science and ...