On Improving Distributed Pregel-like Graph Processing Systems

Han, Minyang

dc.contributor.author	Han, Minyang
dc.date.accessioned	2015-07-24 18:22:29 (GMT)
dc.date.issued	2015-07-24
dc.date.submitted	2015
dc.identifier.uri	http://hdl.handle.net/10012/9484
dc.description.abstract	The considerable interest in distributed systems that can execute algorithms to process large graphs has led to the creation of many graph processing systems. However, existing systems suffer from two major issues: (1) poor performance due to frequent global synchronization barriers and limited scalability; and (2) lack of support for graph algorithms that require serializability, the guarantee that parallel executions of an algorithm produce the same results as some serial execution of that algorithm. Many graph processing systems use the bulk synchronous parallel (BSP) model, which allows graph algorithms to be easily implemented and reasoned about. However, BSP suffers from poor performance due to stale messages and frequent global synchronization barriers. While asynchronous models have been proposed to alleviate these overheads, existing systems that implement such models have limited scalability or retain frequent global barriers and do not always support graph mutations or algorithms with multiple computation phases. We propose barrierless asynchronous parallel (BAP), a new computation model that overcomes the limitations of existing asynchronous models by reducing both message staleness and global synchronization while retaining support for graph mutations and algorithms with multiple computation phases. We present GiraphUC, which implements our BAP model in the open source distributed graph processing system Giraph, and evaluate it at scale to demonstrate that BAP provides efficient and transparent asynchronous execution of algorithms that are programmed synchronously. Secondly, very few systems provide serializability, despite the fact that many graph algorithms require it for accuracy, correctness, or termination. To address this deficiency, we provide a complete solution that can be implemented on top of existing graph processing systems to provide serializability. Our solution formalizes the notion of serializability and the conditions under which it can be provided for graph processing systems. We propose a partition-based synchronization technique that enforces these conditions efficiently to provide serializability. We implement this technique into Giraph and GiraphUC to demonstrate that it is configurable, transparent to algorithm developers, and more performant than existing techniques.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo
dc.subject	graph processing	en
dc.subject	BSP	en
dc.subject	BAP	en
dc.subject	GiraphUC	en
dc.subject	serializability	en
dc.subject	partition-based distributed locking	en
dc.subject	Pregel-like	en
dc.title	On Improving Distributed Pregel-like Graph Processing Systems	en
dc.type	Master Thesis	en
dc.pending	false
dc.subject.program	Computer Science	en
dc.description.embargoterms	4 months	en
dc.date.embargountil	2015-11-21T18:22:29Z
uws-etd.degree.department	Computer Science (David R. Cheriton School of)	en
uws-etd.degree	Master of Mathematics	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Han_Minyang.pdf
Size:: 1.738Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record