An evaluation of POS tagging for tweets using HMM modeling

aut.researcherNand, Parma
dc.contributor.authorNand, P
dc.contributor.authorPerera, R
dc.date.accessioned2015-03-05T03:25:09Z
dc.date.available2015-03-05T03:25:09Z
dc.date.copyright2015-01
dc.date.issued2015-01
dc.description.abstractRecently there has been an increased demand for natural language processing tools that work well on unstructured and noisy texts such as texts from Twitter messages. It has been shown that tools developed for structured texts, do not work well when used on unstructured texts hence necessitates considerable customization and re-training for the tools to be able to achieve the same accuracy on unstructured texts. This paper presents the results of testing a HMM (Hidden Markov Model) based POS (Part-Of-Speech) tagger customized for unstructured texts. The tagger was trained on Tweeter messages on existing publicly available data and customized for abbreviations and named entities common in Tweets. We evaluated the tagger firstly training and testing on the same source corpus and later did cross-validation testing by training on one Twitter corpus and testing on a different Twitter corpus. We also did similar experiments with the datasets using a CRF (Conditional Random Frequency) based state-of-the-art POS tagger customized for Tweet messages. The results show that the CRF-based POS tagger from GATE performed slightly better compared to the HMM model at token level, however at the sentence level the performances were approximately the same. An even more intriguing result was that the cross-validation experiments showed that both the tagger’s results deteriorated by approximately 25% at the token level and a massive 80% at the sentence level. This suggests vast differences between the two Tweet corpora used and emphasizes the importance of recall values for NLP systems. A detailed analysis of this deterioration is presented and the HMM trained model together with the data has also been made available for research purposes.
dc.identifier.citation, published in: Proceedings of the 38th Australasian Computer Science Conference.
dc.identifier.urihttps://hdl.handle.net/10292/8450
dc.publisherAustralian Computer Society (ACS)
dc.relation.urihttp://crpit.com/
dc.rightsCopyright c 2015, Australian Computer Society, Inc. This paper appeared at the Thirty-Eight Australasian Computer Science Conference, ACSC2015, Sydney, Australia January 2015. Conferences in Research and Practice in Information Technology (CRPIT), Vol. Conferences in Research and Practice in Information Technology, Vol. 159., David Parry, Ed. Reproduction for academic, not-for-profit purposes permitted provided this text is included.
dc.rights.accessrightsOpenAccess
dc.subjectSocial Media
dc.subjectHMM POS Tagger
dc.subjectTwitter
dc.subjectMachine Learning
dc.subjectPOS Tagging
dc.titleAn evaluation of POS tagging for tweets using HMM modeling
dc.typeConference Contribution
pubs.elements-id179625
pubs.organisational-data/AUT
pubs.organisational-data/AUT/Design & Creative Technologies
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nand, Perera - 2015 - An Evaluation of POS tagging for Tweets Using HMM Modeling.pdf
Size:
499.79 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RE4.10 Grant of Licence.docx
Size:
14.05 KB
Format:
Microsoft Word 2007+
Description: