AlphaSMT: A Reinforcement Learning Guided SMT Solver

Lu, Zhengyang

dc.contributor.author	Lu, Zhengyang
dc.date.accessioned	2023-06-19 14:44:00 (GMT)
dc.date.available	2023-06-19 14:44:00 (GMT)
dc.date.issued	2023-06-19
dc.date.submitted	2023-06-14
dc.identifier.uri	http://hdl.handle.net/10012/19562
dc.description.abstract	Satisfiability Modulo Theories (SMT) solvers are programs that decide whether a first-order logic formula is satisfiable. Over the last two decades, these solvers have become central to many methods and tools in fields as diverse as software engineering, verification, security, and Artificial Intelligence. Most modern SMT solvers provide user-controllable strategies, i.e., users can define a strategy that customizes a solving algorithm for their own problems by combining individual tactics as building blocks. A tactic is a well-defined and implemented reasoning step provided by the SMT solver, which either simplifies, trans- forms, or solves the given input SMT formula. The flexibility of customizing a strategy to a specialized type of formula is important since no existing strategy is considered optimal for all instances. However, finding a good customized strategy is challenging even for experts. In this thesis we present a novel class of reinforcement-learning (RL) guided methods, implemented in the Z3 SMT solver and that we refer to as AlphaSMT, which adaptively constructs the expected best strategy for any given input SMT formula. Briefly, the AlphaSMT RL framework combines deep Monte-Carlo Tree Search (MCTS) and logical reasoning in a unique way in order to enable the RL agent to learn the best combination of tactics for a given class of formulas. In more detail, a deep neural network serves as both the value function and the policy, evaluating state-action pairs and making the decision of which tactic to choose at each step. The neural network is trained toward the optimal policy by learning from self-exploring sampling solving processes. MCTS is used as a lookahead planning step for each decision made in the sampling processes. We evaluated the performance of AlphaSMT on benchmark sets from three SMT logics, namely, quantifier-free non-linear real arithmetic (QF NRA), quantifier-free non-linear integer arithmetic (QF NIA), and quantifier-free bit-vector (QF BV). In all these logics, AlphaSMT outperforms its base solver, Z3, by solving up to 80.5% more instances in a testing set. Evaluation results also show that a reasonably longer tactic timeout helps solve more instances and a pre-solver contributes significantly to the speedup.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	automated reasoning	en
dc.subject	reinforcement learning	en
dc.subject	SMT solver	en
dc.subject	artificial intelligence	en
dc.title	AlphaSMT: A Reinforcement Learning Guided SMT Solver	en
dc.type	Master Thesis	en
dc.pending	false
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Master of Applied Science	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Ganesh, Vijay
uws.contributor.affiliation1	Faculty of Engineering	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Lu_Zhengyang.pdf
Size:: 2.844Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record