Information extraction from free text comments in questionnaires
The last 15 years have seen a tremendous explosion in the amount of information available, encoded both in structured forms such as databases and XML files as well as free, naturally occurring forms such as HTML pages and word documents. This availability of free texts has created a need for automated text processing tools so that information can be extracted in a timely and effective manner. This research investigated the extraction of information from free text responses to open-ended questions in questionnaires. The research undertook to develop a framework for analyzing open question responses to extract structured information which can then be conflated with the closed question responses in order to produce a more informative report from the survey, in particular to determine the sentiment expressed in the response. Specifically, this research will help in understanding the positive or negative nature of the respondent’s answers through the creation of software tools using Natural Language Toolkit (NLTK) and data mining and Natural Language Processing techniques and will help surveyors (Health centers, doctors, data analysts) obtain additional information from surveys. There is also a discussion of existing sentiment analysis solutions as well as the different components and ways of analyzing sentiment and creating a Natural Language Processing tool which would be interesting to future developers of such systems. This research was successfully able to classify free text responses as positive or negative. While we appreciate that more time to fine tune the application and perform more training and testing would have been useful, the results obtained are promising. We have successfully developed a platform which can be used for generating a custom corpus and provide interested developers a starting framework to develop sentiment analysis tools.