Repository logo
 

Automated Risk Assessment of Opioid Use: Analysis Using Pre-trained Transformers on Social Media Data

Authors

Ahmad, Muhammad
Orji, Rita
Amjad, Maaz
Siddique, Abubakar
Kubysheva, Nailya
Batyrshin, Ildar
Sidorov, Grigori

Supervisor

Item type

Journal Article

Degree name

Journal Title

Journal ISSN

Volume Title

Publisher

JMIR Publications Inc.

Abstract

BACKGROUND: The illegal use of opioids has emerged as a major global public health concern, contributing to widespread addiction and a growing number of overdose-related deaths. In response, the US federal government has invested billions of dollars in combating the opioid epidemic through treatment, prevention, and law enforcement initiatives. Despite these efforts, there remains an urgent need for automated tools capable of detecting overdose cases and assessing the risk levels of substances-tools that can enable faster, more effective responses with less reliance on human intervention. Social media, particularly Reddit, has become a valuable source of self-reported data on opioid misuse, offering rich insights into user experiences and symptoms. OBJECTIVE: This research aimed to develop an advanced automated tool for detecting opioid overdose risks and classifying substances into high-risk and low-risk categories by analyzing social media posts. METHODS: A multistage methodology was used to achieve the objectives of this work. First, a new dataset was constructed from Reddit posts and manually annotated. Each post was labeled according to the risk level of the mentioned substance, using contextual indicators and user-reported experiences as the basis for classification. To ensure reliability and annotator consistency, detailed annotation guidelines were developed and applied throughout the labeling process. Second, a bidirectional encoder representation from transformers for biomedical text mining (BioBERT)-based classification framework was implemented and enhanced with a custom attention mechanism to capture relevant semantic information for more accurate predictions. Third, the model's performance was evaluated using 5-fold cross-validation and compared against several baseline approaches, including traditional supervised learning, deep learning, and transfer learning methods. In total, 14 experiments were conducted to evaluate comparative effectiveness. To further assess the contribution of the attention layer, the best-performing model was also evaluated against a version incorporating the standard self-attention mechanism, using a train-test split. Finally, a paired t test was conducted to statistically assess the performance difference between the BioBERT-based model and the strongest baseline, extreme gradient boosting (XGBoost), providing validation of the observed improvements. RESULTS: The proposed BioBERT model with custom attention achieved an F1-score of 0.99 in cross-validation, outperforming the best baseline, XGBoost (F1-score=0.97), with a relative improvement of 2.06%. A paired t test conducted across the 5 folds (n=5) confirmed that the performance gain was statistically significant (P=.003), providing strong evidence that the improvement reflects genuine advances in overdose risk detection. CONCLUSIONS: This paper demonstrates the potential of leveraging social media data and advanced natural language processing models to build reliable systems for opioid overdose risk detection. The BioBERT model with custom attention shows state-of-the-art performance and robustness, offering a powerful tool to support timely intervention and harm reduction strategies in the ongoing opioid crisis.

Description

Keywords

AI, BERT, Reddit, artificial intelligence, chronic pain, data mining, deep learning, drug abuse, opioid overdose, social media, transformer, 4203 Health Services and Systems, 42 Health Sciences, Drug Abuse (NIDA only), Opioids, Brain Disorders, Opioid Misuse and Addiction, Networking and Information Technology R&D (NITRD), Behavioral and Social Science, Machine Learning and Artificial Intelligence, Physical Injury - Accidents and Adverse Effects, Prevention, Substance Misuse, Data Science, Mental health, 3 Good Health and Well Being, 4203 Health services and systems

Source

JMIR Infodemiology, ISSN: 2564-1891 (Print); 2564-1891 (Online), JMIR Publications Inc., 6, e77783-e77783. doi: 10.2196/77783

Rights statement

© Muhammad Ahmad, Rita Orji, Maaz Amjad, Abubakar Siddique, Nailya Kubysheva, Ildar Batyrshin, Grigori Sidorov. Originally published in JMIR Infodemiology (https://infodemiology.jmir.org), 19.Feb.2026. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Infodemiology, is properly cited. The complete bibliographic information, a link to the original publication on https://infodemiology.jmir.org/, as well as this copyright and license information must be included.