Show simple item record

dc.contributor.authorNand, P
dc.contributor.authorPerera, R
dc.date.accessioned2015-03-05T03:25:09Z
dc.date.available2015-03-05T03:25:09Z
dc.date.copyright2015-01
dc.identifier.citation, published in: Proceedings of the 38th Australasian Computer Science Conference.
dc.identifier.urihttp://hdl.handle.net/10292/8450
dc.description.abstractRecently there has been an increased demand for natural language processing tools that work well on unstructured and noisy texts such as texts from Twitter messages. It has been shown that tools developed for structured texts, do not work well when used on unstructured texts hence necessitates considerable customization and re-training for the tools to be able to achieve the same accuracy on unstructured texts. This paper presents the results of testing a HMM (Hidden Markov Model) based POS (Part-Of-Speech) tagger customized for unstructured texts. The tagger was trained on Tweeter messages on existing publicly available data and customized for abbreviations and named entities common in Tweets. We evaluated the tagger firstly training and testing on the same source corpus and later did cross-validation testing by training on one Twitter corpus and testing on a different Twitter corpus. We also did similar experiments with the datasets using a CRF (Conditional Random Frequency) based state-of-the-art POS tagger customized for Tweet messages. The results show that the CRF-based POS tagger from GATE performed slightly better compared to the HMM model at token level, however at the sentence level the performances were approximately the same. An even more intriguing result was that the cross-validation experiments showed that both the tagger’s results deteriorated by approximately 25% at the token level and a massive 80% at the sentence level. This suggests vast differences between the two Tweet corpora used and emphasizes the importance of recall values for NLP systems. A detailed analysis of this deterioration is presented and the HMM trained model together with the data has also been made available for research purposes.
dc.publisherAustralian Computer Society (ACS)
dc.relation.urihttp://crpit.com/
dc.rightsCopyright c 2015, Australian Computer Society, Inc. This paper appeared at the Thirty-Eight Australasian Computer Science Conference, ACSC2015, Sydney, Australia January 2015. Conferences in Research and Practice in Information Technology (CRPIT), Vol. Conferences in Research and Practice in Information Technology, Vol. 159., David Parry, Ed. Reproduction for academic, not-for-profit purposes permitted provided this text is included.
dc.subjectSocial Media
dc.subjectHMM POS Tagger
dc.subjectTwitter
dc.subjectMachine Learning
dc.subjectPOS Tagging
dc.titleAn evaluation of POS tagging for tweets using HMM modeling
dc.typeConference Contribution
dc.rights.accessrightsOpenAccess
pubs.elements-id179625


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record