Topics in Modeling and Analysis of Low-Latency Systems

Ghanbarian, Samira

dc.contributor.author	Ghanbarian, Samira
dc.date.accessioned	2023-12-08 16:36:07 (GMT)
dc.date.available	2023-12-08 16:36:07 (GMT)
dc.date.issued	2023-12-08
dc.date.submitted	2023-12-08
dc.identifier.uri	http://hdl.handle.net/10012/20145
dc.description.abstract	Cloud-based architectures have become integral elements of modern networking infrastructure and are characterized by a large number of servers operating in parallel. Optimizing performance in these systems, with a particular focus on specific metrics such as system response time and the probability of loss, is critical to ensure user satisfaction. To address this challenge, this thesis analyzes load balancing policies that are designed to efficiently assign incoming user requests to the servers such that the system performance is optimized. In particular, the thesis focuses on a specialized category known as "randomized dynamic load balancing policies". These policies optimize system performance by dynamically adapting assignment decisions based on the current state of the system while interacting with a randomly selected subset of servers. Given the complex interdependencies among servers and the large size of these systems, an exact analysis of these systems is intractable. Consequently, the thesis studies these systems in the system size limit. It employs relevant limit theorems, including mean-field techniques and Stein's approach, as crucial mathematical tools. Furthermore, the thesis evaluates the accuracy of these limits when applied to systems of finite size, providing valuable insights into the practical applicability of the proposed load balancing policies. Motivated by different types of user requests or jobs, the thesis focuses on two main job categories: single-server jobs which can only run on a single server to represent non-parallelizable requests, and multiserver jobs, which can run on multiple servers simultaneously modeling parallelizable requests. The first part of the thesis studies single-server jobs in a system comprising a large number of processor sharing servers operating in parallel, where servers have different processing speeds and unlimited queueing buffers. The objective is to design randomized load balancing policies that minimize the average response time of jobs. A novel policy is introduced that allocates incoming jobs to servers based on predefined thresholds, state information from a randomly sampled subset of servers, and their processing speeds. The policy subsumes a broad class of other load balancing policies by adjusting the threshold levels, offering a unified framework for concurrent analysis of multiple load balancing policies. It is shown that under this policy, the system achieves the maximal stability region. Moreover, it is shown that as the system size approaches infinity, the transient and stationary stochastic occupancy measure of the system converges to a deterministic mean-field limit and the unique fixed point of this mean-field limit, respectively. As a result, the study of the asymptotic average response time of jobs becomes feasible through the fixed point of the mean-field limit. The analysis continues by studying error estimation related to asymptotic values in finite-sized systems. It is shown that when the mean delay of the finite-size system is approximated by its asymptotic value, the error is proportional to the inverse square root of the system size. Subsequently, the thesis analyzes adaptive multiserver jobs in loss systems, where they can be parallelized across a variable number of servers, up to a maximum degree of parallelization. In loss systems, each server can process only a finite number of jobs simultaneously and blocks any additional jobs beyond this capacity. Therefore, the goal is to devise randomized job assignment schemes that optimize the average response time of accepted jobs and the blocking probability while interacting with a sampled subset of servers. A load balancing policy is proposed, where the number of allocated servers for processing each job depends on the state information of a randomly sampled subset of servers and the maximum degree of parallelization. Employing Stein's method, it is shown that, provided that the sampling size grows at an appropriate rate, the difference between the steady-state system and a suitable deterministic system that exhibits optimality, decreases to zero as the system size increases. Thus, as the system size approaches infinity, the steady-state system achieves a zero blocking probability and optimal average response time for accepted jobs. Additionally, the thesis analyzes error estimation for these asymptotic values in finite-sized systems and establishes the error bounds as a function of the number of servers in the system.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	load balancing	en
dc.subject	performance evaluation	en
dc.subject	stochastic systems	en
dc.subject	mean-field	en
dc.subject	Stein's method	en
dc.title	Topics in Modeling and Analysis of Low-Latency Systems	en
dc.type	Doctoral Thesis	en
dc.pending	false
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Mazumdar, Ravi
uws.contributor.affiliation1	Faculty of Engineering	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Ghanbarian_Samira.pdf
Size:: 1.162Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record