An evaluation of POS tagging for tweets using HMM modeling
aut.researcher | Nand, Parma | |
dc.contributor.author | Nand, P | |
dc.contributor.author | Perera, R | |
dc.date.accessioned | 2015-03-05T03:25:09Z | |
dc.date.available | 2015-03-05T03:25:09Z | |
dc.date.copyright | 2015-01 | |
dc.date.issued | 2015-01 | |
dc.description.abstract | Recently there has been an increased demand for natural language processing tools that work well on unstructured and noisy texts such as texts from Twitter messages. It has been shown that tools developed for structured texts, do not work well when used on unstructured texts hence necessitates considerable customization and re-training for the tools to be able to achieve the same accuracy on unstructured texts. This paper presents the results of testing a HMM (Hidden Markov Model) based POS (Part-Of-Speech) tagger customized for unstructured texts. The tagger was trained on Tweeter messages on existing publicly available data and customized for abbreviations and named entities common in Tweets. We evaluated the tagger firstly training and testing on the same source corpus and later did cross-validation testing by training on one Twitter corpus and testing on a different Twitter corpus. We also did similar experiments with the datasets using a CRF (Conditional Random Frequency) based state-of-the-art POS tagger customized for Tweet messages. The results show that the CRF-based POS tagger from GATE performed slightly better compared to the HMM model at token level, however at the sentence level the performances were approximately the same. An even more intriguing result was that the cross-validation experiments showed that both the tagger’s results deteriorated by approximately 25% at the token level and a massive 80% at the sentence level. This suggests vast differences between the two Tweet corpora used and emphasizes the importance of recall values for NLP systems. A detailed analysis of this deterioration is presented and the HMM trained model together with the data has also been made available for research purposes. | |
dc.identifier.citation | , published in: Proceedings of the 38th Australasian Computer Science Conference. | |
dc.identifier.uri | https://hdl.handle.net/10292/8450 | |
dc.publisher | Australian Computer Society (ACS) | |
dc.relation.uri | http://crpit.com/ | |
dc.rights | Copyright c 2015, Australian Computer Society, Inc. This paper appeared at the Thirty-Eight Australasian Computer Science Conference, ACSC2015, Sydney, Australia January 2015. Conferences in Research and Practice in Information Technology (CRPIT), Vol. Conferences in Research and Practice in Information Technology, Vol. 159., David Parry, Ed. Reproduction for academic, not-for-profit purposes permitted provided this text is included. | |
dc.rights.accessrights | OpenAccess | |
dc.subject | Social Media | |
dc.subject | HMM POS Tagger | |
dc.subject | ||
dc.subject | Machine Learning | |
dc.subject | POS Tagging | |
dc.title | An evaluation of POS tagging for tweets using HMM modeling | |
dc.type | Conference Contribution | |
pubs.elements-id | 179625 | |
pubs.organisational-data | /AUT | |
pubs.organisational-data | /AUT/Design & Creative Technologies |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Nand, Perera - 2015 - An Evaluation of POS tagging for Tweets Using HMM Modeling.pdf
- Size:
- 499.79 KB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
Loading...
- Name:
- RE4.10 Grant of Licence.docx
- Size:
- 14.05 KB
- Format:
- Microsoft Word 2007+
- Description: