On Convergence Analysis of Stochastic and Distributed Gradient-Based Algorithms

Zhang, Mengyao

dc.contributor.author	Zhang, Mengyao
dc.date.accessioned	2024-05-08 18:37:05 (GMT)
dc.date.available	2024-05-08 18:37:05 (GMT)
dc.date.issued	2024-05-08
dc.date.submitted	2024-05-03
dc.identifier.uri	http://hdl.handle.net/10012/20545
dc.description.abstract	Optimization is a fundamental mathematical discipline focused on finding the best solution from a set of feasible choices. It is vital in various applications, including engineering, economics, data science, and beyond. Stochastic optimization and distributed optimization are crucial paradigms in the optimization field. Stochastic optimization deals with uncertainty and variability in problem parameters, providing a framework for decision-making under probabilistic conditions. On the other hand, distributed optimization tackles large-scale problems by harnessing the collective power of multiple agents or nodes, each with local information and local communication capabilities. This thesis aims to modify and analyze the existing stochastic methods and develop the algorithms and theory to solve the unconstrained distributed optimization problem. For stochastic adaptive gradient-based methods, including RMSprop, Adadelta, Adam, AdaGrad, Nadam, and AMSgrad, which are popular stochastic optimization methods commonly used in machine learning and deep learning, the Chapter 2 provides a concise and rigorous proof of the almost sure convergence guarantees towards a critical point in the context of smooth and non-convex objective functions. To the best of our knowledge, this work offers the first almost sure convergence rates for these stochastic adaptive gradient-based methods. For non-convex objective functions, we show that a weighted average of the squared gradient norms in each aforementioned method achieves a unified convergence rate of $o(1/{t^{\frac{1}{2}-\theta}})$ for all $\theta\in\left(0,\frac{1}{2}\right)$. Moreover, for strongly convex objective functions, the convergence rates for RMSprop and Adadelta can be further improved to $o(1/{t^{1-\theta}})$ for all $\theta\in\left(0,\frac{1}{2}\right)$. {These rates are arbitrarily close to their optimal convergence rates possible.} As a locking-free parallel stochastic gradient descent algorithm, Hogwild! algorithm is commonly used for training large-scale machine learning models. In Chapter 3, we will provide an almost sure convergence rates analysis for Hogwild! algorithm under different assumptions on the loss function. We first prove its almost sure convergence rate on strongly convex function, which matches the optimal convergence rate of the classic stochastic gradient descent (SGD) method to an arbitrarily small factor. For non-convex loss function, a weighted average of the squared gradient, as well as the last iterations of the algorithm converges to zero almost surely. We also provide a last-iterate almost sure convergence rate analysis for this method on general convex smooth functions. Another aspect of the research addresses the convergence rate analysis of the gradient-based distributed optimization algorithms, which have been shown to achieve computational efficiency and rapid convergence while requiring weaker assumptions. We first propose a novel gradient-based proportional-integral (PI) algorithm in Chapter 4, and prove that its convergence rate matches that of the centralized gradient descent method under the strong convexity assumption. We then relax this assumption and discuss the local linear convergence of its virtual state for strictly convex cost functions. In Chapter 5, we propose the powered proportional-integral (PI) algorithm and prove its convergence in finite time under the assumption of strict convexity. Then, we discuss the fixed-time convergence of its virtual state for strongly convex cost functions. Finally, we demonstrate the practicality of the distributed algorithms proposed in this thesis through simulation results.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	Optimization	en
dc.subject	Distributed Optimization	en
dc.subject	Stochastic Optimization	en
dc.subject	Adaptive Methods	en
dc.title	On Convergence Analysis of Stochastic and Distributed Gradient-Based Algorithms	en
dc.type	Doctoral Thesis	en
dc.pending	false
uws-etd.degree.department	Applied Mathematics	en
uws-etd.degree.discipline	Applied Mathematics	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Zhang, Mengyao
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Zhang_Mengyao.pdf
Size:: 8.884Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record