Framework for Sentiment Classification for Morphologically Rich Languages: A Case Study for Sinhala

aut.embargoNoen_NZ
aut.thirdpc.containsYesen_NZ
aut.thirdpc.permissionNoen_NZ
aut.thirdpc.removedYesen_NZ
dc.contributor.advisorPears, Russel
dc.contributor.advisorWhalley, Jacqueline
dc.contributor.advisorSallis, Philp
dc.contributor.authorMedagoda, Nishantha Priyanka Kumara
dc.date.accessioned2017-06-13T02:45:42Z
dc.date.available2017-06-13T02:45:42Z
dc.date.copyright2017
dc.date.created2017
dc.date.issued2017
dc.date.updated2017-06-13T01:55:35Z
dc.description.abstractThis thesis presents a framework for sentiment analysis for morphologically rich languages. Sentiment analysis is the domain of analysing and extracting people’s emotions, feelings, expressions, attitudes and experiences expressed in texts especially, in the digital media, such as web blogs, customer reviews. The primary issue of applying the contemporary sentiment classification techniques for morphologically rich languages is the unavailability of lexical resources. That is these techniques are highly resourced intensive, and the required lexical resources are not freely available for such languages. In addition, the methods are weak in adapting to the linguistic complexities that are shown in morphologically rich languages. The thesis and the related publications represent the first ever attempt of sentiment analysis for the Sinhala language, which is said to be a highly morphologically rich language. The thesis proposed novel approaches for generating the lexical resources for sentiment classification using limited resources. The first approach examined the cross-linguistic sentiment lexicon generation by considering a sentiment lexicon for English and basic dictionary of the target morphological rich language. In the subsequent task, a sentiment lexicon was generated using the novel approach incorporating morphological features. These morphological features include affixes; prefixes and suffixes. Thirdly, a graph based method was proposed to compile a lexical resource for sentiment classification with polarity scores. The researcher investigated the classical text classification techniques for Sinhala. The thesis identified the best classification algorithm for Sinhala with dominant linguistic features. Finally, an extensive set of experiments that demonstrated the exploration of language-specific classification features for Sinhala. These language-specific features include part of speech, negation, intensifiers and shifters. We introduce and discuss rule-based approaches to incorporate negations and intensifiers. The research contributes to sentiment classification for morphologically rich languages by proposing the framework that uses limited resources to build the lexical resources and efficient algorithms to classify opinions. The achievements confirm, concerning classification accuracies, the feasibility of sentiment classification for morphologically rich languages such as Sinhala. In addition, the achieved accuracies would be benchmarks for sentiment classification for Sinhala as well as other morphologically rich languages. Based on the promising outcomes and the simplicity, the proposed framework can be applied to any morphologically rich language.en_NZ
dc.identifier.urihttps://hdl.handle.net/10292/10544
dc.language.isoenen_NZ
dc.publisherAuckland University of Technology
dc.rights.accessrightsOpenAccess
dc.subjectSentiment Analysisen_NZ
dc.subjectOpinion Miningen_NZ
dc.subjectNatural Language Processingen_NZ
dc.subjectMachine Learningen_NZ
dc.titleFramework for Sentiment Classification for Morphologically Rich Languages: A Case Study for Sinhalaen_NZ
dc.typeThesis
thesis.degree.grantorAuckland University of Technology
thesis.degree.levelDoctoral Theses
thesis.degree.nameDoctor of Philosophyen_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MedagodaN2.pdf.pdf
Size:
1.75 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
889 B
Format:
Item-specific license agreed upon to submission
Description:
Collections