Repository logo
 

Sign Language Recognition from Digital Videos Using Feature Pyramid Network with Detection Transformer

aut.relation.journalMultimedia Tools and Applications
dc.contributor.authorLiu, Yu
dc.contributor.authorNand, Parma
dc.contributor.authorHossain, Md Akbar
dc.contributor.authorNguyen, Minh
dc.contributor.authorYan, Wei Qi
dc.date.accessioned2023-03-06T03:15:04Z
dc.date.available2023-03-06T03:15:04Z
dc.date.copyright2023-02-28
dc.description.abstractSign language recognition is one of the fundamental ways to assist deaf people to communicate with others. An accurate vision-based sign language recognition system using deep learning is a fundamental goal for many researchers. Deep convolutional neural networks have been extensively considered in the last few years, and a slew of architectures have been proposed. Recently, Vision Transformer and other Transformers have shown apparent advantages in object recognition compared to traditional computer vision models such as Faster R-CNN, YOLO, SSD, and other deep learning models. In this paper, we propose a Vision Transformer-based sign language recognition method called DETR (Detection Transformer), aiming to improve the current state-of-the-art sign language recognition accuracy. The DETR method proposed in this paper is able to recognize sign language from digital videos with a high accuracy using a new deep learning model ResNet152 + FPN (i.e., Feature Pyramid Network), which is based on Detection Transformer. Our experiments show that the method has excellent potential for improving sign language recognition accuracy. For instance, our newly proposed net ResNet152 + FPN is able to enhance the detection accuracy up to 1.70% on the test dataset of sign language compared to the standard Detection Transformer models. Besides, an overall accuracy 96.45% was attained by using the proposed method.
dc.identifier.citationMultimedia Tools and Applications, ISSN: 1380-7501 (Print); 1573-7721 (Online), Springer Science and Business Media LLC. doi: 10.1007/s11042-023-14646-0
dc.identifier.doi10.1007/s11042-023-14646-0
dc.identifier.issn1380-7501
dc.identifier.issn1573-7721
dc.identifier.urihttps://hdl.handle.net/10292/15941
dc.languageen
dc.publisherSpringer Science and Business Media LLC
dc.relation.urihttps://link.springer.com/article/10.1007/s11042-023-14646-0
dc.rights.accessrightsOpenAccess
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subject0801 Artificial Intelligence and Image Processing
dc.subject0803 Computer Software
dc.subject0805 Distributed Computing
dc.subject0806 Information Systems
dc.subjectArtificial Intelligence & Image Processing
dc.subjectSoftware Engineering
dc.subject4009 Electronics, sensors and digital hardware
dc.subject4603 Computer vision and multimedia computation
dc.subject4605 Data management and data science
dc.subject4606 Distributed computing and systems software
dc.titleSign Language Recognition from Digital Videos Using Feature Pyramid Network with Detection Transformer
dc.typeJournal Article
pubs.elements-id495260

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
s11042-023-14646-0.pdf
Size:
1.23 MB
Format:
Adobe Portable Document Format
Description:
Journal article