Reducing Health Misinformation in Search Results

Zhang, Dake

dc.contributor.author	Zhang, Dake
dc.date.accessioned	2022-08-22 19:14:31 (GMT)
dc.date.available	2022-08-22 19:14:31 (GMT)
dc.date.issued	2022-08-22
dc.date.submitted	2022-08-08
dc.identifier.uri	http://hdl.handle.net/10012/18602
dc.description.abstract	People commonly search the web for answers to health-related questions. With health information being added to the Internet every day, misinformation proliferates and disseminates wildly. Previous work has shown that if health misinformation exists in search results, people can make incorrect decisions, which may cause negative effects on their lives. To reduce health misinformation in search results, we need to be able to find web documents that contain correct information and promote them to higher positions in search results over documents that contain misinformation. In this thesis, we describe our efforts in reducing health misinformation in search results. First, we describe our participation in the TREC 2021 Health Misinformation Track, which provides a framework for evaluating ranking approaches to reducing health misinformation in search results. This track uses the Compatibility Difference as the primary evaluation metric, which measures the approach's ability to rank correct and credible documents before incorrect and non-credible documents. In the 2021 track, runs that used the provided correct answers were viewed as manual runs. By making use of the known answers and applying a Stance Detection Model for reranking, our manual method achieved a Compatibility Difference score of 0.176, a dramatic improvement over the BM25 baseline with a score of -0.022. Second, as an extension of our work above, we present a pipeline to automatically derive correct answers by learning trustworthy web sources and then reduce health misinformation in search engine results. Determining the correct answer has been a difficult hurdle to overcome for participants in the TREC Health Misinformation Track. In the 2021 track, automatic runs were not allowed to use the known answer to a topic’s health question. By exploiting an existing set of health questions and corresponding known answers, we show it is possible to learn which web hosts are trustworthy, from which we can predict the correct answers to the 2021 health questions with an accuracy of 76%. Using our predicted answers, we can promote documents that we predict contain this answer and achieve a Compatibility Difference score of 0.129, achieving a three-fold performance increase compared with the previous best automatic method with a score of 0.043. To wrap up, evaluated on the TREC 2021 Health Misinformation Track, our final pipeline achieves new state-of-the-art performance among automatic runs.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.title	Reducing Health Misinformation in Search Results	en
dc.type	Master Thesis	en
dc.pending	false
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Master of Mathematics	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Smucker, Mark
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Zhang_Dake.pdf
Size:: 1.227Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record