A Geometric Approach to Textual Augmented Data Filtering

aut.relation.conference8th International Conference on Data Mining, Communications and Information Technology
aut.relation.endpage012007
aut.relation.issue1
aut.relation.startpage012007
aut.relation.volume2833
dc.contributor.authorFeng, SJH
dc.contributor.authorLai, EMK
dc.contributor.authorLi, W
dc.date.accessioned2024-10-17T00:10:48Z
dc.date.available2024-10-17T00:10:48Z
dc.date.issued2024-09-09
dc.description.abstractData augmentation is necessary if the amount of training data is insufficient for supervised learning. For natural language processing tasks, obtaining good quality augmented data is not easy. This paper introduces GATFilter, a novel method for filtering out inappropriate augmented textual data for text classification (TC). Utilizing geometric concepts, more specifically the principle component and convex hull analyses, this method adeptly preserves the semantic integrity of words within augmented texts. GATFilter is versatile and applicable across various types of textual augmentation methods. Experiments using several datasets and augmentation strategies showed that classifiers trained with GATFilter-filtered augmented data sets showed improvements in key performance metrics, including accuracy, precision, recall, and F1 score. The method’s efficacy is notably influenced by the quality of the underlying augmentation techniques, indicating its potential to complement and refine various text augmentation strategies. Furthermore, our analysis showed that GATFilter is particularly able to amplify the effectiveness of methods that generate good quality augmented data. GATFilter is openly available online on Github1, and as a Python package2
dc.identifier.doi10.1088/1742-6596/2833/1/012007
dc.identifier.issn1742-6588
dc.identifier.issn1742-6596
dc.identifier.urihttp://hdl.handle.net/10292/18138
dc.publisherIOP Publishing
dc.relation.urihttps://iopscience.iop.org/article/10.1088/1742-6596/2833/1/012007
dc.rightsContent from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
dc.rights.accessrightsOpenAccess
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject51 Physical Sciences
dc.subject0202 Atomic, Molecular, Nuclear, Particle and Plasma Physics
dc.subject0204 Condensed Matter Physics
dc.subject0299 Other Physical Sciences
dc.subject51 Physical sciences
dc.titleA Geometric Approach to Textual Augmented Data Filtering
dc.typeConference Contribution
pubs.elements-id569954
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Feng et al_2024_A geometric approach to textual augmented data filtering.pdf
Size:
1.28 MB
Format:
Adobe Portable Document Format
Description:
Journal article