Clinical SOAP Notes Completeness Checking Using Machine Learning
Date
Supervisor
Item type
Journal Article
Degree name
Journal Title
Journal ISSN
Volume Title
Publisher
AME Publishing Company
Abstract
Background: Subjective, Objective, Assessment, Plan (SOAP) notes are a critical component of medical documentation. Their completeness and accuracy are critical in healthcare settings. This study evaluates the effectiveness of machine learning methods for analyzing SOAP notes to determine whether each note contains all four required elements. Methods: Using a dataset of 889 SOAP notes from primary care physician progress notes, we compared multinomial Naive Bayes, logistic regression, random forest, and multilayer perceptron neural networks. We applied standard text preprocessing including lowercasing, tokenization, and vectorization with Term Frequency-Inverse Document Frequency (TF-IDF) weighting. Performance was evaluated using accuracy, precision, recall, and F1 Score through five-fold cross-validation and final testing. Results: The Naive Bayes classifier outperformed other models, achieving 80.39% accuracy, 81.76% precision, 80.39% recall, and 78.49% F1 Score on the test set. Closer examination revealed that adjusting the probability threshold parameter allows balancing false positives (incorrectly identifying present sections) against false negatives (missing present sections). The superior performance of Naive Bayes suggests that simpler models may be sufficient for SOAP note classification tasks in resource-constrained healthcare settings. The adjustable probability threshold provides flexibility for implementation across different clinical specialties, where documentation practices and tolerance for false positives versus false negatives may vary. Integration into electronic health record systems could enable real-time documentation feedback. Conclusions: Our findings demonstrate that even simple machine learning approaches can effectively identify missing SOAP elements, potentially improving documentation quality, reducing medical errors, and enhancing communication among healthcare providers. The adaptive algorithm’s flexibility allows implementation across different clinical settings with minimal computational resources.Description
Source
Journal of Medical Artificial Intelligence, ISSN: 2617-2496 (Print); 2617-2496 (Online), AME Publishing Company, 0(0), 0-0. doi: 10.21037/jmai-24-370
Publisher's version
Rights statement
This is an Open Access article distributed in accordance with the Creative Commons
Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license).
See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
