Show simple item record

dc.contributor.authorZhang, Mengyao
dc.date.accessioned2024-05-08 18:37:05 (GMT)
dc.date.available2024-05-08 18:37:05 (GMT)
dc.date.issued2024-05-08
dc.date.submitted2024-05-03
dc.identifier.urihttp://hdl.handle.net/10012/20545
dc.description.abstractOptimization is a fundamental mathematical discipline focused on finding the best solution from a set of feasible choices. It is vital in various applications, including engineering, economics, data science, and beyond. Stochastic optimization and distributed optimization are crucial paradigms in the optimization field. Stochastic optimization deals with uncertainty and variability in problem parameters, providing a framework for decision-making under probabilistic conditions. On the other hand, distributed optimization tackles large-scale problems by harnessing the collective power of multiple agents or nodes, each with local information and local communication capabilities. This thesis aims to modify and analyze the existing stochastic methods and develop the algorithms and theory to solve the unconstrained distributed optimization problem. For stochastic adaptive gradient-based methods, including RMSprop, Adadelta, Adam, AdaGrad, Nadam, and AMSgrad, which are popular stochastic optimization methods commonly used in machine learning and deep learning, the Chapter 2 provides a concise and rigorous proof of the almost sure convergence guarantees towards a critical point in the context of smooth and non-convex objective functions. To the best of our knowledge, this work offers the first almost sure convergence rates for these stochastic adaptive gradient-based methods. For non-convex objective functions, we show that a weighted average of the squared gradient norms in each aforementioned method achieves a unified convergence rate of $o(1/{t^{\frac{1}{2}-\theta}})$ for all $\theta\in\left(0,\frac{1}{2}\right)$. Moreover, for strongly convex objective functions, the convergence rates for RMSprop and Adadelta can be further improved to $o(1/{t^{1-\theta}})$ for all $\theta\in\left(0,\frac{1}{2}\right)$. {These rates are arbitrarily close to their optimal convergence rates possible.} As a locking-free parallel stochastic gradient descent algorithm, Hogwild! algorithm is commonly used for training large-scale machine learning models. In Chapter 3, we will provide an almost sure convergence rates analysis for Hogwild! algorithm under different assumptions on the loss function. We first prove its almost sure convergence rate on strongly convex function, which matches the optimal convergence rate of the classic stochastic gradient descent (SGD) method to an arbitrarily small factor. For non-convex loss function, a weighted average of the squared gradient, as well as the last iterations of the algorithm converges to zero almost surely. We also provide a last-iterate almost sure convergence rate analysis for this method on general convex smooth functions. Another aspect of the research addresses the convergence rate analysis of the gradient-based distributed optimization algorithms, which have been shown to achieve computational efficiency and rapid convergence while requiring weaker assumptions. We first propose a novel gradient-based proportional-integral (PI) algorithm in Chapter 4, and prove that its convergence rate matches that of the centralized gradient descent method under the strong convexity assumption. We then relax this assumption and discuss the local linear convergence of its virtual state for strictly convex cost functions. In Chapter 5, we propose the powered proportional-integral (PI) algorithm and prove its convergence in finite time under the assumption of strict convexity. Then, we discuss the fixed-time convergence of its virtual state for strongly convex cost functions. Finally, we demonstrate the practicality of the distributed algorithms proposed in this thesis through simulation results.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectOptimizationen
dc.subjectDistributed Optimizationen
dc.subjectStochastic Optimizationen
dc.subjectAdaptive Methodsen
dc.titleOn Convergence Analysis of Stochastic and Distributed Gradient-Based Algorithmsen
dc.typeDoctoral Thesisen
dc.pendingfalse
uws-etd.degree.departmentApplied Mathematicsen
uws-etd.degree.disciplineApplied Mathematicsen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeDoctor of Philosophyen
uws-etd.embargo.terms0en
uws.contributor.advisorZhang, Mengyao
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages