Show simple item record

dc.contributor.authorMahmoud, Hatem 19:48:08 (GMT) 19:48:08 (GMT)
dc.description.abstractA data integration system offers a single interface to multiple structured data sources. Many application contexts (e.g., searching structured data on the web) involve the integration of large numbers of structured data sources. At web scale, it is impractical to use manual or semi-automatic data integration methods, so a pay-as-you-go approach is more appropriate. A pay-as-you-go approach entails using a fully automatic approximate data integration technique to provide an initial data integration system (i.e., an initial mediated schema, and initial mappings from source schemas to the mediated schema), and then refining the system as it gets used. Previous research has investigated automatic approximate data integration techniques, but all existing techniques require the schemas being integrated to belong to the same conceptual domain. At web scale, it is impractical to classify schemas into domains manually or semi-automatically, which limits the applicability of these techniques. In this thesis, I present an approach for clustering schemas into domains without any human intervention and based only on the names of attributes in the schemas. My clustering approach deals with uncertainty in assigning schemas to domains using a probabilistic model. I also propose a query classifier that determines, for a given keyword query, the most relevant domains to this query. I experimentally demonstrate the effectiveness of my schema clustering and query classification techniques.en
dc.publisherUniversity of Waterlooen
dc.titleSchema clustering and retrieval for multi-domain pay-as-you-go data intergration systemsen
dc.typeMaster Thesisen
dc.subject.programComputer Scienceen of Computer Scienceen
uws-etd.degreeMaster of Mathematicsen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages