Show simple item record

dc.contributor.authorLiang, Chengzhien
dc.date.accessioned2006-08-22 14:21:14 (GMT)
dc.date.available2006-08-22 14:21:14 (GMT)
dc.date.issued2001en
dc.date.submitted2001en
dc.identifier.urihttp://hdl.handle.net/10012/1050
dc.description.abstractConsensus pattern problem (CPP) aims at finding conserved regions, or motifs, in unaligned sequences. This problem is NP-hard under various scoring schemes. To solve this problem for protein sequences more efficiently,a new scoring scheme and a randomized algorithm based on substitution matrix are proposed here. Any practical solutions to a bioinformatics problem must observe twoprinciples: (1) the problem that it solves accurately describes the real problem; in CPP, this requires the scoring scheme be able to distinguisha real motif from background; (2) it provides an efficient algorithmto solve the mathematical problem. A key question in protein motif-finding is how to determine the motif length. One problem in EM algorithms to solve CPP is how to find good startingpoints to reach the global optimum. These two questions were both well addressed under this scoring scheme,which made the randomized algorithm both fast and accurate in practice. A software, COPIA (COnsensus Pattern Identification and Analysis),has been developed implementing this algorithm. Experiments using sequences from the von Willebrand factor (vWF)familyshowed that it worked well on finding multiple motifs and repeats. COPIA's ability to find repeats makes it also useful in illustrating the internal structures of multidomain proteins. Comparative studies using several groups of protein sequences demonstrated that COPIA performed better than the commonly used motif-finding programs.en
dc.formatapplication/pdfen
dc.format.extent439052 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.rightsCopyright: 2001, Liang, Chengzhi. All rights reserved.en
dc.subjectComputer Scienceen
dc.subjectbioinformatics softwareen
dc.subjectmultiple alignmenten
dc.subjectmotif-findingen
dc.subjectconsensus pattern problemen
dc.titleCOPIA: A New Software for Finding Consensus Patterns in Unaligned Protein Sequencesen
dc.typeMaster Thesisen
dc.pendingfalseen
uws-etd.degree.departmentSchool of Computer Scienceen
uws-etd.degreeMaster of Mathematicsen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages