Author Identification in Free Texts

aut.embargoNoen_NZ
aut.thirdpc.containsNoen_NZ
dc.contributor.advisorNand, Parma
dc.contributor.authorWang, Yahui (Kay)
dc.date.accessioned2020-07-01T22:28:21Z
dc.date.available2020-07-01T22:28:21Z
dc.date.copyright2020
dc.date.issued2020
dc.date.updated2020-07-01T22:25:35Z
dc.description.abstractInformation Extraction is a popular topic in the Natural Language Processing area. This thesis focuses on author identi cation in free text. This study divided the author identi cation task into two subtask, quotation extraction and speaker attribution. The entire system contains two parts, a rule based model for quotation extraction and a machine learning model for speaker attribution. The resource domain used in this thesis is the literary narrative. There is also a generalisation test on the news domain. The results of the experiment show that the rule based model can achieve a 0.88 F-score on quotation extraction and the best result of a machine learning model is 85.7% accuracy. The overall test on the entire system returns 77.9% accuracy on the literary source domain and 73.6% on the news domain.en_NZ
dc.identifier.urihttps://hdl.handle.net/10292/13480
dc.language.isoenen_NZ
dc.publisherAuckland University of Technology
dc.rights.accessrightsOpenAccess
dc.subjectNatural Language Processing (NLP)en_NZ
dc.subjectAuthor Identificationen_NZ
dc.subjectQuotation Extractionen_NZ
dc.subjectSpeaker Attributionen_NZ
dc.subjectConditional Random Field (CRF)en_NZ
dc.subjectSupport Vector Machine (SVM)en_NZ
dc.titleAuthor Identification in Free Textsen_NZ
dc.typeThesisen_NZ
thesis.degree.grantorAuckland University of Technology
thesis.degree.levelMasters Theses
thesis.degree.nameMaster of Computer and Information Sciencesen_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
WangK.pdf
Size:
453.3 KB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
897 B
Format:
Item-specific license agreed upon to submission
Description:
Collections