A HMM POS Tagger for Micro-blogging Type Texts
The high volume of communication via micro-blogging type messages has created an increased demand for text processing tools customised the unstructured text genre. The available text processing tools developed on structured texts has been shown to deteriorate significantly when used on unstructured, micro-blogging type texts. In this paper, we present the results of testing a HMM based POS (Part-Of-Speech) tagging model customized for unstructured texts. We also evaluated the tagger against published CRF based state-of-the-art POS tagging models customized for Tweet messages using three publicly available Tweet corpora. Finally, we did cross-validation tests with both the taggers by training them on one Tweet corpus and testing them on another one.