Repository logo
 

A Lightweight, Effective and Efficient Model for Label Aggregation in Crowdsourcing

aut.relation.journalACM Transactions on Knowledge Discovery from Data
dc.contributor.authorYang, Yi
dc.contributor.authorZhao, Zhong-qiu
dc.contributor.authorWu, Gongqing
dc.contributor.authorZhuo, Xingrui
dc.contributor.authorLiu, Qing
dc.contributor.authorBai, Quan
dc.contributor.authorLi, Weihua
dc.date.accessioned2023-11-23T21:55:22Z
dc.date.available2023-11-23T21:55:22Z
dc.date.issued2023-10-26
dc.description.abstractDue to the presence of noise in crowdsourced labels, label aggregation (LA) has become a standard procedure for post-processing these labels. LA methods estimate true labels from crowdsourced labels by modeling worker quality. However, most existing LA methods are iterative in nature. They require multiple passes through all crowdsourced labels, jointly and iteratively updating true labels and worker qualities until a termination condition is met. As a result, these methods are burdened with high space and time complexities, which restrict their applicability in scenarios where scalability and online aggregation are essential. Furthermore, defining a suitable termination condition for iterative algorithms can be challenging. In this paper, we view LA as a dynamic system and represent it as a Dynamic Bayesian Network. From this dynamic model, we derive two lightweight and scalable algorithms: LAonepass and LAtwopass. These algorithms can efficiently and effectively estimate worker qualities and true labels by traversing all labels at most twice, thereby eliminating the need for explicit termination conditions and multiple traversals over the crowdsourced labels. Due to their dynamic nature, the proposed algorithms are also capable of performing label aggregation online. We provide theoretical proof of the convergence property of the proposed algorithms and bound the error of the estimated worker qualities. Furthermore, we analyze the space and time complexities of our proposed algorithms, demonstrating their equivalence to those of majority voting. Through experiments conducted on 20 real-world datasets, we demonstrate that our proposed algorithms can effectively and efficiently aggregate labels in both offline and online settings, even though they traverse all labels at most twice. The code is on https://github.com/yyang318/LA_onepass.
dc.identifier.citationACM Transactions on Knowledge Discovery from Data, ISSN: 1556-4681 (Print); 1556-472X (Online), Association for Computing Machinery (ACM). doi: 10.1145/3630102
dc.identifier.doi10.1145/3630102
dc.identifier.issn1556-4681
dc.identifier.issn1556-472X
dc.identifier.urihttp://hdl.handle.net/10292/16997
dc.languageen
dc.publisherAssociation for Computing Machinery (ACM)
dc.relation.urihttps://dl.acm.org/doi/10.1145/3630102
dc.rightsCopyright © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
dc.rights.accessrightsOpenAccess
dc.subject46 Information and Computing Sciences
dc.subject4603 Computer Vision and Multimedia Computation
dc.subject0801 Artificial Intelligence and Image Processing
dc.subject0806 Information Systems
dc.subjectArtificial Intelligence & Image Processing
dc.subject4604 Cybersecurity and privacy
dc.subject4605 Data management and data science
dc.subject4606 Distributed computing and systems software
dc.titleA Lightweight, Effective and Efficient Model for Label Aggregation in Crowdsourcing
dc.typeJournal Article
pubs.elements-id527968

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Yang et al_2023_.pdf
Size:
1.31 MB
Format:
Adobe Portable Document Format
Description:
Journal article