Show simple item record

dc.contributor.authorBergstrand, Lee
dc.date.accessioned2020-05-07 16:57:10 (GMT)
dc.date.available2020-05-07 16:57:10 (GMT)
dc.date.issued2020-05-07
dc.date.submitted2020-05-05
dc.identifier.urihttp://hdl.handle.net/10012/15828
dc.description.abstractUnderstanding the distribution of biochemical pathways across microorganisms is critical to understanding these organism's evolution, ecology, and industrial applicability. Advances in genome sequencing and pathway databases have made genomically predicting what pathways an organism possesses a common technique. Researchers are moving on to scaling such analyses towards comparing the presence and absence of pathways across multiple microbes from the same environment or lineage. However, performing such analyses at scale is currently bottlenecked by the sheer number of pathways per organism and the lack of powerful tools to facilitate such comparisons. This thesis presents a new set of tools, called Micromeda, that will assist users in making comparative genomic analyses. Micromeda consists of three core components. These components are Micromeda-Client, which generates interactive heat maps that allow users to perform visual pathway comparisons; Micromeda-Server, which provides data to Micromeda-Client; and Pygenprop, which allows users to perform programmatic comparisons of multiple organism pathways. Micromeda uses the Genome Properties database as its pathway information source. This database is unique from other pathway databases because it maps directly between protein domains and pathway steps. The domains that the database uses are those from the InterPro consortium of protein databases. With Micromeda, the process of discovering an organism's pathways begins with the domain annotation of an organism's proteins by InterProScan. Afterwards, Pygenprop is used to combine these annotations with information from the Genome Properties database to predict biochemical pathways. This prediction of pathways from domain data results in the creation of a Micromeda file. This novel file type carries both the pathway annotations for multiple organisms and the sequences of proteins that support these annotations. In the context of the Genome Properties database, such pathways are referred to as genome properties, and pathway annotations are referred to as property assignments. The newly created Micromeda file can later be uploaded to Micromeda-Client and Server for heat map-based visualization. Pygenprop uses object orient programming techniques to represent the Genome Properties database as a series of in-memory objects. These objects are used extensively within Pygenprop's property assignment process and Micromeda as a whole. Pygenprop is written in Python. The library's tight integration with the Python data science ecosystem, which results in it being compatible with many emerging data science and machine learning tools, lays the foundation for the library becoming the backbone of a new generation of automated pathway analysis tools. Micromeda-Server is a Python web server application that provides data from uploaded Micromeda files to Micromeda-Client. Micromeda-Server makes data accessible via a web application programming interface (API). The API provides clients, such as Micromeda-Client, with access to property assignments and protein sequences found within uploaded Micromeda files. The API can also provide information about individual pathways and the overall structure of the Genome Properties database. Micromeda-Client is a web client application whose purpose is to provide interactive pathway analysis heat maps to users. These heat maps are used to compare pathways across organisms within a dataset. The interactivity of these heat maps allows for pathway annotations to be aggregated into summaries of multiple pathways or be disaggregated down to a pathway step level. At a step level, users can see differences in the presence of pathways steps. Individual pathways of interest can also be looked up via text search. The heat map interface also allows users to download protein sequences that support individual pathway steps across multiple organisms. Rather than having to spend time reviewing spreadsheets of pathway annotations or using existing ineffectual pathway annotation visualization software, researchers can now perform their analyses using Micromeda's streamlined and efficient heat maps. For large datasets, Pygenprop can be used to compare the predicted pathways of multiple organisms programmatically. Micromeda has the potential for shaping the way that future researchers perform pathway analysis.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://github.com/Micromedaen
dc.subjectbiologyen
dc.subjectbioinformaticsen
dc.subjectpathway analysisen
dc.subjectGenome Propertiesen
dc.subjectvisualizationen
dc.subjectdata scienceen
dc.subjectweben
dc.subjectinformaticsen
dc.subjectpathway databaseen
dc.subjectdata visualizationen
dc.subjectgenomicsen
dc.subjectmetabolismen
dc.subjectgraphen
dc.subjectdirected acyclic graphen
dc.subjectpythonen
dc.subject.lcshComparative genomicsen
dc.subject.lcshBioinformaticsen
dc.titleMicromeda: a genome property prediction pipeline and web visualization toolen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentBiologyen
uws-etd.degree.disciplineBiologyen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Scienceen
uws.contributor.advisorNeufeld, Josh
uws.contributor.advisorDoxey, Andrew
uws.contributor.affiliation1Faculty of Scienceen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages