A Lightweight, Effective and Efficient Model for Label Aggregation in Crowdsourcing

Date
2023-10-26
Authors
Yang, Yi
Zhao, Zhong-qiu
Wu, Gongqing
Zhuo, Xingrui
Liu, Qing
Bai, Quan
Li, Weihua
Supervisor
Item type
Journal Article
Degree name
Journal Title
Journal ISSN
Volume Title
Publisher
Association for Computing Machinery (ACM)
Abstract

Due to the presence of noise in crowdsourced labels, label aggregation (LA) has become a standard procedure for post-processing these labels. LA methods estimate true labels from crowdsourced labels by modeling worker quality. However, most existing LA methods are iterative in nature. They require multiple passes through all crowdsourced labels, jointly and iteratively updating true labels and worker qualities until a termination condition is met. As a result, these methods are burdened with high space and time complexities, which restrict their applicability in scenarios where scalability and online aggregation are essential. Furthermore, defining a suitable termination condition for iterative algorithms can be challenging. In this paper, we view LA as a dynamic system and represent it as a Dynamic Bayesian Network. From this dynamic model, we derive two lightweight and scalable algorithms: LAonepass and LAtwopass. These algorithms can efficiently and effectively estimate worker qualities and true labels by traversing all labels at most twice, thereby eliminating the need for explicit termination conditions and multiple traversals over the crowdsourced labels. Due to their dynamic nature, the proposed algorithms are also capable of performing label aggregation online. We provide theoretical proof of the convergence property of the proposed algorithms and bound the error of the estimated worker qualities. Furthermore, we analyze the space and time complexities of our proposed algorithms, demonstrating their equivalence to those of majority voting. Through experiments conducted on 20 real-world datasets, we demonstrate that our proposed algorithms can effectively and efficiently aggregate labels in both offline and online settings, even though they traverse all labels at most twice. The code is on https://github.com/yyang318/LA_onepass.

Description
Keywords
46 Information and Computing Sciences , 4603 Computer Vision and Multimedia Computation , 0801 Artificial Intelligence and Image Processing , 0806 Information Systems , Artificial Intelligence & Image Processing , 4604 Cybersecurity and privacy , 4605 Data management and data science , 4606 Distributed computing and systems software
Source
ACM Transactions on Knowledge Discovery from Data, ISSN: 1556-4681 (Print); 1556-472X (Online), Association for Computing Machinery (ACM). doi: 10.1145/3630102
Rights statement
Copyright © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.