Clinical SOAP Notes Completeness Checking Using Machine Learning

Feng, Sherry JH; Joseph, Jithu; Lai, Edmund M-K

Clinical SOAP Notes Completeness Checking Using Machine Learning

Files

Journal article(348.06 KB)

Date

2025-08-20

Authors

Feng, Sherry JH

Joseph, Jithu

Lai, Edmund M-K

Item type

Journal Article

Publisher

AME Publishing Company

Abstract

Background: Subjective, Objective, Assessment, Plan (SOAP) notes are a critical component of medical documentation. Their completeness and accuracy are critical in healthcare settings. This study evaluates the effectiveness of machine learning methods for analyzing SOAP notes to determine whether each note contains all four required elements. Methods: Using a dataset of 889 SOAP notes from primary care physician progress notes, we compared multinomial Naive Bayes, logistic regression, random forest, and multilayer perceptron neural networks. We applied standard text preprocessing including lowercasing, tokenization, and vectorization with Term Frequency-Inverse Document Frequency (TF-IDF) weighting. Performance was evaluated using accuracy, precision, recall, and F1 Score through five-fold cross-validation and final testing. Results: The Naive Bayes classifier outperformed other models, achieving 80.39% accuracy, 81.76% precision, 80.39% recall, and 78.49% F1 Score on the test set. Closer examination revealed that adjusting the probability threshold parameter allows balancing false positives (incorrectly identifying present sections) against false negatives (missing present sections). The superior performance of Naive Bayes suggests that simpler models may be sufficient for SOAP note classification tasks in resource-constrained healthcare settings. The adjustable probability threshold provides flexibility for implementation across different clinical specialties, where documentation practices and tolerance for false positives versus false negatives may vary. Integration into electronic health record systems could enable real-time documentation feedback. Conclusions: Our findings demonstrate that even simple machine learning approaches can effectively identify missing SOAP elements, potentially improving documentation quality, reducing medical errors, and enhancing communication among healthcare providers. The adaptive algorithm’s flexibility allows implementation across different clinical settings with minimal computational resources.

Keywords

46 Information and Computing Sciences, 31 Biological Sciences, 4611 Machine Learning, Machine Learning and Artificial Intelligence, Networking and Information Technology R&D (NITRD)

Source

Journal of Medical Artificial Intelligence, ISSN: 2617-2496 (Print); 2617-2496 (Online), AME Publishing Company, 0(0), 0-0. doi: 10.21037/jmai-24-370

DOI

10.21037/jmai-24-370

Publisher's version

https://jmai.amegroups.org/article/view/10223/html

Rights statement

This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

Permanent link

http://hdl.handle.net/10292/19909

Collections

School of Engineering, Computer and Mathematical Sciences - Te Kura Mātai Pūhanga, Rorohiko, Pāngarau

Full item page

Clinical SOAP Notes Completeness Checking Using Machine Learning

Files

Date

Authors

Supervisor

Item type

Degree name

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Source

DOI

Publisher's version

Rights statement

Permanent link

Collections