Show simple item record

dc.contributor.advisorNand, Parma
dc.contributor.advisorTegginmath, Shoba
dc.contributor.authorKasture, Abhijeet Sudhir
dc.date.accessioned2015-11-26T00:09:39Z
dc.date.available2015-11-26T00:09:39Z
dc.date.copyright2015
dc.date.created2015
dc.identifier.urihttp://hdl.handle.net/10292/9277
dc.description.abstractCyberbullying is prevalent in most countries across the globe. The aim of this research was to develop a predictive model to identify the occurrence of cyberbullying tweets on Twitter. The paradigm shift in the Internet of Things was observed a decade ago, which resulted in enormous growth in the number of active Internet users. Today, this number has exceeded three billion. Social networking websites are classic examples of Internet applications that have large numbers of active users. Twitter, for instance, is one of the most famous social networking portals, with more than 300 million active users at any given time. However, unfortunately it is also a stage for users who are involved in unethical use of the Internet, such as cyberbullying. With such a staggering number of active users on the Internet, cyberbullying has become a widespread global phenomenon. It has extremely adverse effects on its victims. In some cases victims have committed suicide in response to the shame and hatred that is associated with cyberbullying . In this research, 1313 unique tweets were collected from Twitter. With the help of psychological studies referring to, the behavior of individuals and the use of dialects pertaining to verbal aggressiveness, 376 tweets were manually tagged as cyberbullying tweets in the first phase. In the next phase, every word in a tweet was individually categorised based on the pragmatics of language. In order to achieve this, tweets were categorised using Linguistic Inquiry and Word Count (LIWC), a psychometric evaluation tool that categorises text based on Linguistic Processes, Psychological Processes, Personal Concerns and Spoken Categories. Collectively, they add up to 67 sub-word-categories. In the next step of the psychometric evaluation, LIWC calculated the degree to which different word-categories were used by people in cyberbullying. Psychometric evaluation therefore aided in effective text categorisation and quantifying the degree of word usage, which was observed to be a gap in previous studies. As a result, tweets were converted to a multi-dimensional attribute relational numeric dataset. This dataset was very rich in terms of the information that it carried. This dataset was then used to train machine learning classifiers in Weka to develop a predictive model to detect cyberbullying. The data was randomly segmented 66% for training the predictive model and 34% for testing it. It was seen that the Random Forest classifier built the predictive model with a precision value of 0.97, indicating that binary classifiers outperformed the multiclass classifiers in detecting cyberbullying tweets.en_NZ
dc.language.isoenen_NZ
dc.publisherAuckland University of Technology
dc.subjectCyberbullyingen_NZ
dc.subjectPredictive analyticsen_NZ
dc.subjectPsychometric evaluationen_NZ
dc.subjectPragmatics of languageen_NZ
dc.subjectTwitteren_NZ
dc.subjectTweetsen_NZ
dc.subjectText classification techniquesen_NZ
dc.subjectPsychometric analysisen_NZ
dc.subjectNatural Language Processingen_NZ
dc.subjectLinguistic Inquiry and Word Count (LIWC)en_NZ
dc.subjectTAGS archiving toolen_NZ
dc.subjectWEKAen_NZ
dc.subjectRandom forest; Multilayer perceptron; Support vector machines; Decision treesen_NZ
dc.subjectVerbal aggressionen_NZ
dc.titleA predictive model to detect online cyberbullyingen_NZ
dc.typeThesis
thesis.degree.grantorAuckland University of Technology
thesis.degree.grantorAuckland University of Technology
thesis.degree.levelMasters Theses
thesis.degree.nameMaster of Computer and Information Sciencesen_NZ
thesis.degree.discipline
dc.rights.accessrightsOpenAccess
dc.date.updated2015-11-25T22:35:59Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record