A Geometric Approach to Textual Augmented Data Filtering

Feng, SJH; Lai, EMK; Li, W

A Geometric Approach to Textual Augmented Data Filtering

Files

Journal article(1.28 MB)

Date

2024-09-09

Authors

Feng, SJH

Lai, EMK

Li, W

Item type

Conference Contribution

Publisher

IOP Publishing

Abstract

Data augmentation is necessary if the amount of training data is insufficient for supervised learning. For natural language processing tasks, obtaining good quality augmented data is not easy. This paper introduces GATFilter, a novel method for filtering out inappropriate augmented textual data for text classification (TC). Utilizing geometric concepts, more specifically the principle component and convex hull analyses, this method adeptly preserves the semantic integrity of words within augmented texts. GATFilter is versatile and applicable across various types of textual augmentation methods. Experiments using several datasets and augmentation strategies showed that classifiers trained with GATFilter-filtered augmented data sets showed improvements in key performance metrics, including accuracy, precision, recall, and F1 score. The method’s efficacy is notably influenced by the quality of the underlying augmentation techniques, indicating its potential to complement and refine various text augmentation strategies. Furthermore, our analysis showed that GATFilter is particularly able to amplify the effectiveness of methods that generate good quality augmented data. GATFilter is openly available online on Github1, and as a Python package2

Keywords

51 Physical Sciences , 0202 Atomic, Molecular, Nuclear, Particle and Plasma Physics , 0204 Condensed Matter Physics , 0299 Other Physical Sciences , 51 Physical sciences

DOI

10.1088/1742-6596/2833/1/012007

Publisher's version

https://iopscience.iop.org/article/10.1088/1742-6596/2833/1/012007

Rights statement

Content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Permanent link

http://hdl.handle.net/10292/18138

Collections

School of Engineering, Computer and Mathematical Sciences - Te Kura Mātai Pūhanga, Rorohiko, Pāngarau

Full item page