Performing Sentiment Analysis on Large Email Text Data

aut.embargoNoen_NZ
aut.thirdpc.containsNoen_NZ
dc.contributor.advisorNand, Parma
dc.contributor.authorSreehari, Sandeep
dc.date.accessioned2020-07-27T21:15:32Z
dc.date.available2020-07-27T21:15:32Z
dc.date.copyright2020
dc.date.issued2020
dc.date.updated2020-07-27T09:50:35Z
dc.description.abstractCompanies and organizations all over the world aims to progress and prosper and anyone who wishes so is expected to know about the current progress of the company which can be got from live data. One such live data is found via email data. By analysing email data which comprises chains of conversation between the employees of the company and clients, one can make a judgment as to how well the progress is. But to perform analysis on such large data is tiresome, time consuming and prone to error if done manually. Sentiment analysis which is a domain under Natural Language Processing is a concept which can address this issue. Using Sentiment analysis, we can make such a judgment about the progress of the company or organization. The purpose of this thesis or research work is to bring out the most efficient and best algorithm to perform sentiment analysis on large data set comprising email data with the best precision. This thesis throws light on understanding the basic concepts of sentiment analysis and then showcases a model which performs sentiment analysis on an email data set. Drawbacks of the current model are observed and either an improvement is made to it or a new model is developed to address those drawbacks. Every new model features something new either in terms of handling the data or making use of better classification algorithms and giver better performance values compared to the previous model. The performance is measured in terms of precision, recall and accuracy. In the thesis, an algorithm is demonstrated to show how sentiment analysis is performs where supervised learning is made use of. The next model is built using this model which makes use of a larger email data set. The first model uses a simple K-nearest neighbours classifier to give us the performance measures. The next few models are built to improve the values by using different classifiers and new features such as Named Entity Recognition and Vectorization. In order to achieve greater values, a model was implemented using Artificial Neural Networks and its derivatives like LSTM. Finally, a domain agnostic model built using the concept of bidirectional LSTM gave the best values and this is the model that is presented as the best. The model also has a few features implemented like Word2vec embedding and Dask to improve the efficiency during run time. The literature survey section shows how researching about work conducted by others in the same domain enabled me to come up with the models. The thesis shows an experimental quantitative approach where models are experimented with and a better model is prepared to improve the performance measures. A section is also presented to explain the various concepts, algorithms and formulas used. The thesis concludes by showing the best model to perform sentiment analysis on the large data set and why it is the best. The advantages and strengths of the model are discussed.en_NZ
dc.identifier.urihttps://hdl.handle.net/10292/13557
dc.language.isoenen_NZ
dc.publisherAuckland University of Technology
dc.rights.accessrightsOpenAccess
dc.subjectSentiment Analysis.,en_NZ
dc.subjectAlgorithmen_NZ
dc.subjectClassificationen_NZ
dc.subjectDataseten_NZ
dc.subjectArtificial Neural Networksen_NZ
dc.titlePerforming Sentiment Analysis on Large Email Text Dataen_NZ
dc.typeThesisen_NZ
thesis.degree.grantorAuckland University of Technology
thesis.degree.levelMasters Theses
thesis.degree.nameMaster of Computer and Information Sciencesen_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SreehariS.pdf
Size:
1.81 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
897 B
Format:
Item-specific license agreed upon to submission
Description:
Collections