Extracting Data From Line Charts in Scanned Medical Documents
Silva de Azevedo, Kathleen
MetadataShow full metadata
Hand-drawn charts contained in printed forms are used to summarize data in a format that can be quickly processed and understood by humans. They differ from computer- generated charts in a few different ways: Firstly, hand-drawn charts are less predictable than computer-generated charts due to the inherent unpredictability of human beha- viour; Secondly, they present higher levels of noise as they must be scanned prior to processing, which interferes with the signal. Much of past research has explored the recognition of machine-generated charts, but with no focus on hand-drawn charts in a noisy medium. Therefore, this research develops methods for the recognition of line charts in scanned medical documents. The approach uses geometrical and positional relationships between the elements of the chart to determine the values of its markers, with no human intervention. The experiments were conducted using two distinct data- sets: one with 200 machine-generated charts and another with 478 scanned medical form sheets. Experimental evaluation showed a high level of accuracy for the method devised to process the machine-generated dataset. The method applied to the medical form sheets extracted the markers with a low level of error. As future work, the rate of extraction may be improved by making the procedure that detects the region in which the data lines are contained more precise.