An evaluation of POS tagging for tweets using HMM modeling

Nand, P; Perera, R

An evaluation of POS tagging for tweets using HMM modeling

aut.researcher	Nand, Parma
dc.contributor.author	Nand, P
dc.contributor.author	Perera, R
dc.date.accessioned	2015-03-05T03:25:09Z
dc.date.available	2015-03-05T03:25:09Z
dc.date.copyright	2015-01
dc.date.issued	2015-01
dc.description.abstract	Recently there has been an increased demand for natural language processing tools that work well on unstructured and noisy texts such as texts from Twitter messages. It has been shown that tools developed for structured texts, do not work well when used on unstructured texts hence necessitates considerable customization and re-training for the tools to be able to achieve the same accuracy on unstructured texts. This paper presents the results of testing a HMM (Hidden Markov Model) based POS (Part-Of-Speech) tagger customized for unstructured texts. The tagger was trained on Tweeter messages on existing publicly available data and customized for abbreviations and named entities common in Tweets. We evaluated the tagger firstly training and testing on the same source corpus and later did cross-validation testing by training on one Twitter corpus and testing on a different Twitter corpus. We also did similar experiments with the datasets using a CRF (Conditional Random Frequency) based state-of-the-art POS tagger customized for Tweet messages. The results show that the CRF-based POS tagger from GATE performed slightly better compared to the HMM model at token level, however at the sentence level the performances were approximately the same. An even more intriguing result was that the cross-validation experiments showed that both the tagger’s results deteriorated by approximately 25% at the token level and a massive 80% at the sentence level. This suggests vast differences between the two Tweet corpora used and emphasizes the importance of recall values for NLP systems. A detailed analysis of this deterioration is presented and the HMM trained model together with the data has also been made available for research purposes.
dc.identifier.citation	, published in: Proceedings of the 38th Australasian Computer Science Conference.
dc.identifier.uri	https://hdl.handle.net/10292/8450
dc.publisher	Australian Computer Society (ACS)
dc.relation.uri	http://crpit.com/
dc.rights	Copyright c 2015, Australian Computer Society, Inc. This paper appeared at the Thirty-Eight Australasian Computer Science Conference, ACSC2015, Sydney, Australia January 2015. Conferences in Research and Practice in Information Technology (CRPIT), Vol. Conferences in Research and Practice in Information Technology, Vol. 159., David Parry, Ed. Reproduction for academic, not-for-profit purposes permitted provided this text is included.
dc.rights.accessrights	OpenAccess
dc.subject	Social Media
dc.subject	HMM POS Tagger
dc.subject	Twitter
dc.subject	Machine Learning
dc.subject	POS Tagging
dc.title	An evaluation of POS tagging for tweets using HMM modeling
dc.type	Conference Contribution
pubs.elements-id	179625
pubs.organisational-data	/AUT
pubs.organisational-data	/AUT/Design & Creative Technologies

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Nand, Perera - 2015 - An Evaluation of POS tagging for Tweets Using HMM Modeling.pdf
Size:: 499.79 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: RE4.10 Grant of Licence.docx
Size:: 14.05 KB
Format:: Microsoft Word 2007+
Description:

Download

Collections

School of Engineering, Computer and Mathematical Sciences - Te Kura Mātai Pūhanga, Rorohiko, Pāngarau