Truth Discovery in Streaming Data and Crowdsourcing Applications
[NOTE: Chapter 6 is embargoed until June 16, 2021]
With the development of Internet and cellular network, it becomes much easier for people to receive information from multiple data sources. However, the data from different sources describing the same entity or object is usually conflicting and erroneous. Therefore, it is important to assess the data veracity, resolve the conflicts and extract the trustworthy information among the multi-source data for the downstream applications.
In this thesis, I focus on the truth discovery models to assess data veracity. Truth discovery is an emerging technique that estimates the most trustworthy information (also known as truth) of each object from the multi-source data. Specifically, a truth discovery model is usually an unsupervised learning model that learns the unknown source reliability from the observed multi-source data to better estimate object truth. This thesis advances truth discovery in applications where data is collected from data streams and crowdsourcing applications, specifically studies how to use object correlation in streaming data truth discovery and how to improve the accuracy and efficiency of streaming data truth discovery. For crowdsourcing applications, the thesis presents two truth discovery models that can better model human behaviors in the truth discovery steps. As most truth discovery methods are unsupervised learning models in which the ground truths of objects are unknown, the thesis also discusses how to use a small set of ground truths to guide the source reliability estimation and develops a semi-supervised truth discovery model to better discover object truths.