Repository logo
 

Automated Risk Assessment of Opioid Use: Analysis Using Pre-trained Transformers on Social Media Data

aut.relation.endpagee77783
aut.relation.journalJMIR Infodemiology
aut.relation.startpagee77783
aut.relation.volume6
dc.contributor.authorAhmad, Muhammad
dc.contributor.authorOrji, Rita
dc.contributor.authorAmjad, Maaz
dc.contributor.authorSiddique, Abubakar
dc.contributor.authorKubysheva, Nailya
dc.contributor.authorBatyrshin, Ildar
dc.contributor.authorSidorov, Grigori
dc.date.accessioned2026-06-08T20:38:19Z
dc.date.available2026-06-08T20:38:19Z
dc.date.issued2025-05-20
dc.description.abstractBACKGROUND: The illegal use of opioids has emerged as a major global public health concern, contributing to widespread addiction and a growing number of overdose-related deaths. In response, the US federal government has invested billions of dollars in combating the opioid epidemic through treatment, prevention, and law enforcement initiatives. Despite these efforts, there remains an urgent need for automated tools capable of detecting overdose cases and assessing the risk levels of substances-tools that can enable faster, more effective responses with less reliance on human intervention. Social media, particularly Reddit, has become a valuable source of self-reported data on opioid misuse, offering rich insights into user experiences and symptoms. OBJECTIVE: This research aimed to develop an advanced automated tool for detecting opioid overdose risks and classifying substances into high-risk and low-risk categories by analyzing social media posts. METHODS: A multistage methodology was used to achieve the objectives of this work. First, a new dataset was constructed from Reddit posts and manually annotated. Each post was labeled according to the risk level of the mentioned substance, using contextual indicators and user-reported experiences as the basis for classification. To ensure reliability and annotator consistency, detailed annotation guidelines were developed and applied throughout the labeling process. Second, a bidirectional encoder representation from transformers for biomedical text mining (BioBERT)-based classification framework was implemented and enhanced with a custom attention mechanism to capture relevant semantic information for more accurate predictions. Third, the model's performance was evaluated using 5-fold cross-validation and compared against several baseline approaches, including traditional supervised learning, deep learning, and transfer learning methods. In total, 14 experiments were conducted to evaluate comparative effectiveness. To further assess the contribution of the attention layer, the best-performing model was also evaluated against a version incorporating the standard self-attention mechanism, using a train-test split. Finally, a paired t test was conducted to statistically assess the performance difference between the BioBERT-based model and the strongest baseline, extreme gradient boosting (XGBoost), providing validation of the observed improvements. RESULTS: The proposed BioBERT model with custom attention achieved an F1-score of 0.99 in cross-validation, outperforming the best baseline, XGBoost (F1-score=0.97), with a relative improvement of 2.06%. A paired t test conducted across the 5 folds (n=5) confirmed that the performance gain was statistically significant (P=.003), providing strong evidence that the improvement reflects genuine advances in overdose risk detection. CONCLUSIONS: This paper demonstrates the potential of leveraging social media data and advanced natural language processing models to build reliable systems for opioid overdose risk detection. The BioBERT model with custom attention shows state-of-the-art performance and robustness, offering a powerful tool to support timely intervention and harm reduction strategies in the ongoing opioid crisis.
dc.identifier.citationJMIR Infodemiology, ISSN: 2564-1891 (Print); 2564-1891 (Online), JMIR Publications Inc., 6, e77783-e77783. doi: 10.2196/77783
dc.identifier.doi10.2196/77783
dc.identifier.issn2564-1891
dc.identifier.issn2564-1891
dc.identifier.urihttp://hdl.handle.net/10292/21338
dc.languageeng
dc.publisherJMIR Publications Inc.
dc.relation.urihttps://infodemiology.jmir.org/2026/1/e77783
dc.rights© Muhammad Ahmad, Rita Orji, Maaz Amjad, Abubakar Siddique, Nailya Kubysheva, Ildar Batyrshin, Grigori Sidorov. Originally published in JMIR Infodemiology (https://infodemiology.jmir.org), 19.Feb.2026. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Infodemiology, is properly cited. The complete bibliographic information, a link to the original publication on https://infodemiology.jmir.org/, as well as this copyright and license information must be included.
dc.rights.accessrightsOpenAccess
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectAI
dc.subjectBERT
dc.subjectReddit
dc.subjectartificial intelligence
dc.subjectchronic pain
dc.subjectdata mining
dc.subjectdeep learning
dc.subjectdrug abuse
dc.subjectopioid overdose
dc.subjectsocial media
dc.subjecttransformer
dc.subject4203 Health Services and Systems
dc.subject42 Health Sciences
dc.subjectDrug Abuse (NIDA only)
dc.subjectOpioids
dc.subjectBrain Disorders
dc.subjectOpioid Misuse and Addiction
dc.subjectNetworking and Information Technology R&D (NITRD)
dc.subjectBehavioral and Social Science
dc.subjectMachine Learning and Artificial Intelligence
dc.subjectPhysical Injury - Accidents and Adverse Effects
dc.subjectPrevention
dc.subjectSubstance Misuse
dc.subjectData Science
dc.subjectMental health
dc.subject3 Good Health and Well Being
dc.subject4203 Health services and systems
dc.subject.meshAnalgesics, Opioid
dc.subject.meshData Mining
dc.subject.meshHumans
dc.subject.meshOpioid-Related Disorders
dc.subject.meshReproducibility of Results
dc.subject.meshRisk Assessment
dc.subject.meshSocial Media
dc.subject.meshUnited States
dc.subject.meshSocial Media
dc.subject.meshHumans
dc.subject.meshRisk Assessment
dc.subject.meshOpioid-Related Disorders
dc.subject.meshAnalgesics, Opioid
dc.subject.meshData Mining
dc.subject.meshUnited States
dc.subject.meshReproducibility of Results
dc.subject.meshHumans
dc.subject.meshOpioid-Related Disorders
dc.subject.meshAnalgesics, Opioid
dc.subject.meshRisk Assessment
dc.subject.meshReproducibility of Results
dc.subject.meshUnited States
dc.subject.meshData Mining
dc.subject.meshSocial Media
dc.subject.meshSocial Media
dc.subject.meshHumans
dc.subject.meshRisk Assessment
dc.subject.meshOpioid-Related Disorders
dc.subject.meshAnalgesics, Opioid
dc.subject.meshData Mining
dc.subject.meshUnited States
dc.subject.meshReproducibility of Results
dc.titleAutomated Risk Assessment of Opioid Use: Analysis Using Pre-trained Transformers on Social Media Data
dc.typeJournal Article
pubs.elements-id754724

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
infodemiology-2026-1-e77783.pdf
Size:
498.39 KB
Format:
Adobe Portable Document Format
Description:
Journal article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.37 KB
Format:
Plain Text
Description: