Author Identification in Free Texts

Wang, Yahui (Kay)

Author Identification in Free Texts

Files

Thesis(453.3 KB)

Date

2020

Authors

Wang, Yahui (Kay)

Supervisor

Nand, Parma

Item type

Thesis

Degree name

Master of Computer and Information Sciences

Publisher

Auckland University of Technology

Abstract

Information Extraction is a popular topic in the Natural Language Processing area. This thesis focuses on author identi cation in free text. This study divided the author identi cation task into two subtask, quotation extraction and speaker attribution. The entire system contains two parts, a rule based model for quotation extraction and a machine learning model for speaker attribution. The resource domain used in this thesis is the literary narrative. There is also a generalisation test on the news domain. The results of the experiment show that the rule based model can achieve a 0.88 F-score on quotation extraction and the best result of a machine learning model is 85.7% accuracy. The overall test on the entire system returns 77.9% accuracy on the literary source domain and 73.6% on the news domain.

Keywords

Natural Language Processing (NLP) , Author Identification , Quotation Extraction , Speaker Attribution , Conditional Random Field (CRF) , Support Vector Machine (SVM)

Permanent link

https://hdl.handle.net/10292/13480

Collections

Masters Theses

Full item page