Information Mining from New Zealand published annual reports relating Māori affairs
This thesis aims to design and implement an information system which will mine information from online annual reports across organizations in New Zealand which have made efforts towards enhancing Matauranga Māori in line with the New Zealand government's long term strategic objectives. A system has been proposed that will be able to traverse through a large number of annual reports and extract information on the presence and the extent activities related to Matauranga across institutions. The extracted information includes examples of initiatives towards education, health, and housing, as well as data on the success rate of such initiatives. A total of 216 annual reports published by 48 different organizations in the period 2008-2015 were used as the data source and they include governmental, non-governmental, private and trust organizations. The proposed system makes use of NLP, the Semantic web, Ontology and RDF technology to extract, encode and present the information. Four sets of relations have been developed for four different sectors which include Health, Education, Finance, and Language and Culture. It resulted in the identification of 330 triples (subject-predicate-object) which encodes pertinent information in the organization concerning Māori and Pacific people. A tool has been developed and implemented for converting normal text into ontologies to analyze them. In order to do this, we used open NLP derived from Apache, Protégé from Stanford University, Owl GRED and the Visual Web Data was used. The ontologies developed were analyzed using XML and graphical analysis which shows how natural text can be converted into relational ontologies with Resource Description Framework presentation.