Framework for Sentiment Classification for Morphologically Rich Languages: A Case Study for Sinhala
Medagoda, Nishantha Priyanka Kumara
MetadataShow full metadata
This thesis presents a framework for sentiment analysis for morphologically rich languages. Sentiment analysis is the domain of analysing and extracting people’s emotions, feelings, expressions, attitudes and experiences expressed in texts especially, in the digital media, such as web blogs, customer reviews. The primary issue of applying the contemporary sentiment classification techniques for morphologically rich languages is the unavailability of lexical resources. That is these techniques are highly resourced intensive, and the required lexical resources are not freely available for such languages. In addition, the methods are weak in adapting to the linguistic complexities that are shown in morphologically rich languages. The thesis and the related publications represent the first ever attempt of sentiment analysis for the Sinhala language, which is said to be a highly morphologically rich language. The thesis proposed novel approaches for generating the lexical resources for sentiment classification using limited resources. The first approach examined the cross-linguistic sentiment lexicon generation by considering a sentiment lexicon for English and basic dictionary of the target morphological rich language. In the subsequent task, a sentiment lexicon was generated using the novel approach incorporating morphological features. These morphological features include affixes; prefixes and suffixes. Thirdly, a graph based method was proposed to compile a lexical resource for sentiment classification with polarity scores. The researcher investigated the classical text classification techniques for Sinhala. The thesis identified the best classification algorithm for Sinhala with dominant linguistic features. Finally, an extensive set of experiments that demonstrated the exploration of language-specific classification features for Sinhala. These language-specific features include part of speech, negation, intensifiers and shifters. We introduce and discuss rule-based approaches to incorporate negations and intensifiers. The research contributes to sentiment classification for morphologically rich languages by proposing the framework that uses limited resources to build the lexical resources and efficient algorithms to classify opinions. The achievements confirm, concerning classification accuracies, the feasibility of sentiment classification for morphologically rich languages such as Sinhala. In addition, the achieved accuracies would be benchmarks for sentiment classification for Sinhala as well as other morphologically rich languages. Based on the promising outcomes and the simplicity, the proposed framework can be applied to any morphologically rich language.