Show simple item record

dc.contributor.authorDing, Zishuo
dc.date.accessioned2024-01-25 14:29:37 (GMT)
dc.date.available2024-01-25 14:29:37 (GMT)
dc.date.issued2024-01-25
dc.date.submitted2024-01-24
dc.identifier.urihttp://hdl.handle.net/10012/20285
dc.description.abstractMachine learning-based approaches have been widely used to address natural language processing (NLP) problems. Considering the similarities between natural language text and source code, researchers have been working on the application of NLP techniques to code-related tasks. However, it is crucial to acknowledge that source code and natural language are different by their natures. For example, source code is highly structured and executable; while NLP techniques may not understand the structure of source code. As a result, applying NLP techniques directly may not yield optimal results, and effectively adapting these techniques to suit software engineering tasks remains a significant challenge. To tackle this challenge, in this thesis, we focus on two important intersections between the source code and natural language text: (1) learning and evaluating distributed code representations (i.e., code embeddings), which plays a fundamental role in numerous software engineering tasks, especially in the era of deep learning, and (2) improving the textual information in logging statements (i.e., logging texts), which record useful information (i.e., logs) to support various software engineering activities. For distributed code representations, we first conduct a comprehensive survey of existing code embedding techniques. This survey encompasses techniques borrowed from NLP, as well as those specifically tailored for source code. We also identify six downstream software engineering tasks to evaluate the effectiveness of the learned code embeddings. Moreover, based on our analysis of existing code embedding techniques, we propose a novel approach to learn more generalizable code embeddings in a task-agnostic manner. This approach represents source code as graphs and leverages Graph Convolutional Networks to learn code embeddings that exhibit greater generalizability. For the textual information in logging statements, we propose to improve the current logging practices from two aspects: (1) proactively suggesting the generation of new logging texts: we propose automated deep learning-based approaches that generate logging texts by translating the related source code into short textual descriptions; (2) retroactively analyzing existing logging texts: we make the first attempt to comprehensively study the temporal relations between logging and its corresponding source code, which is later successfully used to detect anti-patterns in existing logging statements. Based on the experimental results on the subject systems, we anticipate that our work can offer valuable suggestions and support to developers, aiding them in the effective utilization of NLP techniques for software engineering tasks.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectcode structureen
dc.subjectsoftware engineeringen
dc.subjectnatural language processingen
dc.subjectloggingen
dc.subjectcode embeddingen
dc.titleBeyond Natural Language Processing: Advancing Software Engineering Tasks through Code Structureen
dc.typeDoctoral Thesisen
dc.pendingfalse
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degree.disciplineElectrical and Computer Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeDoctor of Philosophyen
uws-etd.embargo.terms0en
uws.contributor.advisorWeiyi, Shang
uws.contributor.affiliation1Faculty of Engineeringen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages