Show simple item record

dc.contributor.authorElshamy, Abdallah
dc.date.accessioned2023-12-22 15:10:34 (GMT)
dc.date.issued2023-12-22
dc.date.submitted2023-12-18
dc.identifier.urihttp://hdl.handle.net/10012/20195
dc.description.abstractScientific workflows are typically data-intensive. They consist of many stages, each of which may contain hundreds to even thousands of tasks. Traditionally, scientific workflows have been executed using the serverful computing model. Serverless computing presents an attractive alternative to the serverful computing model as it frees developers from managing and provisioning resources and offers a fine-grained billing model. In this work, we study the viability and evaluate the trade-offs of using the serverless paradigm to run scientific workflows. We follow an empirical approach to evaluate the performance and cost benefits of this paradigm and to study the suitability of the current serverless software stack to support complex data-intensive scientific workflows. Specifically, we discuss, implement, and evaluate three orchestration approaches for executing scientific workflows: serverful-centralized, serverless-centralized, and serverless-decentralized. This work is the first to implement and evaluate a purely serverless orchestration approach that does not require deploying a dedicated workflow manager for scientific workflows. Our evaluation shows that serverless orchestration approaches cause a noticeable performance overhead for some workflow patterns (e.g., reduce stages) due to accessing a large amount of remote data. We propose two optimizations (i.e., prefetching file privileges and container placement) that exploit data locality to mitigate that impact. Our evaluation with three scientific workflows—Montage, 1000Genomes, and SRA Search—shows that serverless-centralized and serverless-decentralized achieve a comparable performance to a serverful approach. Also, our results show that prefetching file privileges and container placement optimizations improve the performance by 32.6% and 44% respectively when compared to an unoptimized version for the Montage application. We also introduce a cost model to estimate which approach is cheaper for a given application and a cloud provider. Our cost analysis shows that answering this question depends on the characteristics of the workflow and the pricing of the cloud provider.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectServerless Computingen
dc.subjectCloud Computingen
dc.subjectScientific Workflowsen
dc.titleAn Empirical Evaluation of the Viability of the Serverless Paradigm for Scientific Workflowsen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws-etd.embargo.terms1 yearen
uws.contributor.advisorAl-Kiswany, Samer
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws-etd.embargo2024-12-21T15:10:34Z
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages