Anomaly Detection in Text Data Sets Using Character-Level Representation

Date
2021-04-28
Authors
Mohaghegh, Mahsa
Abdurakhmanov, Amantay
Supervisor
Item type
Journal Article
Degree name
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Physics (IoP)
Abstract

This paper proposes a character-level representation of unsupervised text data sets for anomaly detection problems. An empirical examination of the character-level text representation was conducted to demonstrate the ability to separate outlying and normal records using an ensemble of multiple classic numerical anomaly classifiers. Experimental results obtained on two different data sets confirmed the applicability of the developed unsupervised model to detect outlying instances in various real-world scenarios, providing the opportunity to quickly assess a large amount of textual data in terms of information consistency and conformity without knowledge of the data content itself.

Description
Keywords
0202 Atomic, Molecular, Nuclear, Particle and Plasma Physics , 0204 Condensed Matter Physics , 0299 Other Physical Sciences , 51 Physical sciences
Source
Journal of Physics : Conference Series, ISSN: 1742-6588 (Print), Institute of Physics (IoP). doi: 10.1088/1742-6596/1880/1/012028
Rights statement