Framework for Sentiment Classification for Morphologically Rich Languages: A Case Study for Sinhala

Medagoda, Nishantha Priyanka Kumara

Framework for Sentiment Classification for Morphologically Rich Languages: A Case Study for Sinhala

aut.embargo	No	en_NZ
aut.thirdpc.contains	Yes	en_NZ
aut.thirdpc.permission	No	en_NZ
aut.thirdpc.removed	Yes	en_NZ
dc.contributor.advisor	Pears, Russel
dc.contributor.advisor	Whalley, Jacqueline
dc.contributor.advisor	Sallis, Philp
dc.contributor.author	Medagoda, Nishantha Priyanka Kumara
dc.date.accessioned	2017-06-13T02:45:42Z
dc.date.available	2017-06-13T02:45:42Z
dc.date.copyright	2017
dc.date.created	2017
dc.date.issued	2017
dc.date.updated	2017-06-13T01:55:35Z
dc.description.abstract	This thesis presents a framework for sentiment analysis for morphologically rich languages. Sentiment analysis is the domain of analysing and extracting people’s emotions, feelings, expressions, attitudes and experiences expressed in texts especially, in the digital media, such as web blogs, customer reviews. The primary issue of applying the contemporary sentiment classification techniques for morphologically rich languages is the unavailability of lexical resources. That is these techniques are highly resourced intensive, and the required lexical resources are not freely available for such languages. In addition, the methods are weak in adapting to the linguistic complexities that are shown in morphologically rich languages. The thesis and the related publications represent the first ever attempt of sentiment analysis for the Sinhala language, which is said to be a highly morphologically rich language. The thesis proposed novel approaches for generating the lexical resources for sentiment classification using limited resources. The first approach examined the cross-linguistic sentiment lexicon generation by considering a sentiment lexicon for English and basic dictionary of the target morphological rich language. In the subsequent task, a sentiment lexicon was generated using the novel approach incorporating morphological features. These morphological features include affixes; prefixes and suffixes. Thirdly, a graph based method was proposed to compile a lexical resource for sentiment classification with polarity scores. The researcher investigated the classical text classification techniques for Sinhala. The thesis identified the best classification algorithm for Sinhala with dominant linguistic features. Finally, an extensive set of experiments that demonstrated the exploration of language-specific classification features for Sinhala. These language-specific features include part of speech, negation, intensifiers and shifters. We introduce and discuss rule-based approaches to incorporate negations and intensifiers. The research contributes to sentiment classification for morphologically rich languages by proposing the framework that uses limited resources to build the lexical resources and efficient algorithms to classify opinions. The achievements confirm, concerning classification accuracies, the feasibility of sentiment classification for morphologically rich languages such as Sinhala. In addition, the achieved accuracies would be benchmarks for sentiment classification for Sinhala as well as other morphologically rich languages. Based on the promising outcomes and the simplicity, the proposed framework can be applied to any morphologically rich language.	en_NZ
dc.identifier.uri	https://hdl.handle.net/10292/10544
dc.language.iso	en	en_NZ
dc.publisher	Auckland University of Technology
dc.rights.accessrights	OpenAccess
dc.subject	Sentiment Analysis	en_NZ
dc.subject	Opinion Mining	en_NZ
dc.subject	Natural Language Processing	en_NZ
dc.subject	Machine Learning	en_NZ
dc.title	Framework for Sentiment Classification for Morphologically Rich Languages: A Case Study for Sinhala	en_NZ
dc.type	Thesis
thesis.degree.grantor	Auckland University of Technology
thesis.degree.level	Doctoral Theses
thesis.degree.name	Doctor of Philosophy	en_NZ

Files

Original bundle

Now showing 1 - 1 of 1

Name:: MedagodaN2.pdf.pdf
Size:: 1.75 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 889 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Doctoral Theses