Repository logo
 

Clinical SOAP Notes Completeness Checking Using Machine Learning

Supervisor

Item type

Journal Article

Degree name

Journal Title

Journal ISSN

Volume Title

Publisher

AME Publishing Company

Abstract

Background: Subjective, Objective, Assessment, Plan (SOAP) notes are a critical component of medical documentation. Their completeness and accuracy are critical in healthcare settings. This study evaluates the effectiveness of machine learning methods for analyzing SOAP notes to determine whether each note contains all four required elements. Methods: Using a dataset of 889 SOAP notes from primary care physician progress notes, we compared multinomial Naive Bayes, logistic regression, random forest, and multilayer perceptron neural networks. We applied standard text preprocessing including lowercasing, tokenization, and vectorization with Term Frequency-Inverse Document Frequency (TF-IDF) weighting. Performance was evaluated using accuracy, precision, recall, and F1 Score through five-fold cross-validation and final testing. Results: The Naive Bayes classifier outperformed other models, achieving 80.39% accuracy, 81.76% precision, 80.39% recall, and 78.49% F1 Score on the test set. Closer examination revealed that adjusting the probability threshold parameter allows balancing false positives (incorrectly identifying present sections) against false negatives (missing present sections). The superior performance of Naive Bayes suggests that simpler models may be sufficient for SOAP note classification tasks in resource-constrained healthcare settings. The adjustable probability threshold provides flexibility for implementation across different clinical specialties, where documentation practices and tolerance for false positives versus false negatives may vary. Integration into electronic health record systems could enable real-time documentation feedback. Conclusions: Our findings demonstrate that even simple machine learning approaches can effectively identify missing SOAP elements, potentially improving documentation quality, reducing medical errors, and enhancing communication among healthcare providers. The adaptive algorithm’s flexibility allows implementation across different clinical settings with minimal computational resources.

Description

Source

Journal of Medical Artificial Intelligence, ISSN: 2617-2496 (Print); 2617-2496 (Online), AME Publishing Company, 0(0), 0-0. doi: 10.21037/jmai-24-370

Rights statement

This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.