Show simple item record

dc.contributor.authorWanxin, Li
dc.date.accessioned2022-05-19 12:28:46 (GMT)
dc.date.available2022-05-19 12:28:46 (GMT)
dc.date.issued2022-05-19
dc.date.submitted2022-05-12
dc.identifier.urihttp://hdl.handle.net/10012/18296
dc.description.abstractWe propose MT-MAG, a novel machine learning-based software tool for the complete or partial hierarchically-structured taxonomic classification of metagenome-assembled genomes (MAGs). MT-MAG is capable of classifying large and diverse metagenomic datasets: a total of 245.68 Gbp in the training sets, and 9.6 Gbp in the test sets analyzed in this study. MT-MAG is, to the best of our knowledge, the first machine learning method for taxonomic assignment of metagenomic data that offers a “partial classification” option, whereby a classification at a higher taxonomic level is provided for MAGs that cannot be classified to the Species level. MT-MAG outputs complete or partial classification paths, and interpretable numerical classification confidences of its classifications, at all taxonomic ranks. To assess the performance of MT-MAG, we define a “weighted classification accuracy,” with a weighting scheme reflecting the fact that partial classifications at different ranks are not equally informative. For the two benchmarking datasets analyzed (genomes from human gut microbiome species, and bacterial and archaeal genomes assembled from cow rumen metagenomic sequences), MT-MAG achieves an average of 80.13% in weighted classification accuracy. At the Species level, MT-MAG outperforms DeepMicrobes, the only other comparable software tool, by an average of 35.75% in weighted classification accuracy. In addition, MT-MAG is able to completely classify an average of 67.7% of the sequences at the Species level, compared with DeepMicrobes which only classifies 47.45%. Moreover, MT-MAG provides additional information for sequences that it could not classify at the Species level, resulting in the partial or complete classification of 95.15%, of the genomes in the datasets analyzed. Lastly, unlike other taxonomic assignment tools (e.g., GDTB-Tk), MT-MAG is an alignment-free and genetic marker-free tool, able to provide additional bioinformatics analysis to confirm existing or tentative taxonomic assignments.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://github.com/wxli0/MT-MAGen
dc.relation.urihttps://github.com/wxli0/MLDSPen
dc.subjectmetagenomicsen
dc.subjectmetagenome-assembled genomesen
dc.subjectMAGsen
dc.subjecttaxonomic assignmentsen
dc.subjectmachine learningen
dc.subjectinterpretable machine learningen
dc.subjectbioinformaticsen
dc.subjectpartial taxonomic classificationen
dc.subjectweighted classification accuracyen
dc.titleMT-MAG: Accurate and interpretable machine learning for complete or partial taxonomic assignments of metagenome-assembled genomesen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws-etd.embargo.terms0en
uws.contributor.advisorKari, Lila
uws.contributor.advisorYu, Yaoliang
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages