Open Research

Permanent link for this community

https://hdl.handle.net/10292/549

About

Tuwhera Open Access Research Outputs provides free access to full texts of scholarly works from AUT's Schools, Research Institutes and Centres.

AUT's research is built on a foundation of innovation and excellence, with the aim that its discoveries and applications are shared in ways that enhance wellbeing and prosperity.

Adding your outputs

AUT staff research outputs are added to this collection via Research Elements. All items submitted to this collection are checked to ensure material does not breach publisher copyright and is suitable for archiving prior to being made open access.

Find out more about making your work open access

For help with Research Elements contact the Research and Innovation Office.

Browse

Now showing 1 - 7 of 7

A Privacy-Preserving Word Embedding Text Classification Model Based on Privacy Boundary Constructed by Deep Belief Network
(Springer Science and Business Media LLC, 2023-09-15) Ma, Bo; Lai, Edmund; Yan, Wei Qi; Wu, Jinsong
To effectively extract and classify the information from reports or documents and protect the privacy of the extracted results, we propose a privacy classification named Word Embedding Combination Privacy-preserving Support Vector Machine (WECPPSVM) model to classify the text. In addition, this paper also proposes the Privacy-preserving Distribution and Independent Frequent Subsequence Extraction Algorithm (PPDIFSEA), which calculates the degree of independence of the training data input to the classification model by training the Deep Belief Network(DBN) in PPDIFSEA, then obtains the Privacy Boundary(PB). PB is an indispensable condition for both data sampling and privacy noise generation. And this model can protect privacy by injecting the privacy noise into the classification result, this method can interfere with the background knowledge-based privacy attack. Our quantitative analysis shows that the WECPPSVM proposed in this paper can approach mainstream text classification algorithms in terms of text classification accuracy while preserving privacy without increasing computational complexity. In addition, the fusion study and privacy threat evaluation also verify that the proposed PPDIFSEA method combined with WECPPSVM achieves an acceptable level of classification accuracy and privacy protection.
Apple Ripeness Identification from Digital Images Using Transformers
(Springer Science and Business Media LLC, 2023-06-10) Xiao, Bingjie; Nguyen, Minh; Yan, Wei Qi
We describe a non-destructive test of apple ripeness using digital images of multiple types of apples. In this paper, fruit images are treated as data samples, artificial intelligence models are employed to implement the classification of fruits and the identification of maturity levels. In order to obtain the ripeness classifications of fruits, we make use of deep learning models to conduct our experiments; we evaluate the test results of our proposed models. In order to ensure the accuracy of our experimental results, we created our own dataset, and obtained the best accuracy of fruit classification by comparing Transformer model and YOLO model in deep learning, thereby attaining the best accuracy of fruit maturity recognition. At the same time, we also combined YOLO model with attention module and gave the fast object detection by using the improved YOLO model.
CISO: Co-iteration Semi-supervised Learning for Visual Object Detection
(Springer Science and Business Media LLC, 2023-09-19) Qi, Jianchun; Nguyen, Minh; Yan, Wei Qi
Semi-supervised learning offers a solution to the high cost and limited availability of manually labeled samples in supervised learning. In semi-supervised visual object detection, the use of unlabeled data can significantly enhance the performance of deep learning models. In this paper, we introduce an end-to-end framework, named CISO (Co-Iteration Semi-Supervised Learning for Object Detection), which integrates a knowledge distillation approach and a collaborative, iterative semi-supervised learning strategy. To maximize the utilization of pseudo-label data and address the scarcity of pseudo-label data due to high threshold settings, we propose a mean iteration approach where all unlabeled data is applied to each training iteration. Pseudo-label data with high confidence is extracted based on an ever-changing threshold (average intersection over union of all pseudo-labeled data). This strategy not only ensures the accuracy of the pseudo-label but also optimizes the use of unlabeled data. Subsequently, we apply a weak-strong data augmentation strategy to update the model. Lastly, we evaluate CISO using Swin Transformer model and conduct comprehensive experiments on MS-COCO. Our framework showcases impressive results, outperforms the state-of-the-art methods by 2.16 mAP and 1.54 mAP with 10% and 5% labeled data, respectively.
Fruit Ripeness Identification Using YOLOv8 Model
(Springer Science and Business Media LLC, 2023-08-31) Xiao, Bingjie; Nguyen, Minh; Yan, Wei Qi
Deep learning-based visual object detection is a fundamental aspect of computer vision. These models not only locate and classify multiple objects within an image, but they also identify bounding boxes. The focus of this paper's research work is to classify fruits as ripe or overripe using digital images. Our proposed model extracts visual features from fruit images and analyzes fruit peel characteristics to predict the fruit's class. We utilize our own datasets to train two "anchor-free" models: YOLOv8 and CenterNet, aiming to produce accurate predictions. The CenterNet network primarily incorporates ResNet-50 and employs the deconvolution module DeConv for feature map upsampling. The final three branches of convolutional neural networks are applied to predict the heatmap. The YOLOv8 model leverages CSP and C2f modules for lightweight processing. After analyzing and comparing the two models, we found that the C2f module of the YOLOv8 model significantly enhances classification results, achieving an impressive accuracy rate of 99.5%.
NUNI - Waste: Novel Semi-supervised Semantic Segmentation Waste Classification with Non-uniform Data Augmentation
(Springer Science and Business Media LLC, 2024-01-25) Qi, Jianchun; Nguyen, Minh; Yan, Wei Qi
Waste categorization and recycling are critical approaches for converting waste into valuable and functional materials, thereby significantly aiding in land preservation, reducing pollution, and optimizing resource usages. However, real-world classification and identification of recyclable waste face substantial hurdles due to the intricate and unpredictable nature of wastes, as well as the limited availability of comprehensive waste datasets. These factors limit efficacy of the existing research work in the domain of waste management. In this paper, we utilize semantic segmentation at individual pixel level and introduce a semi-supervised metod for authentic waste classification scenarios, leveraging the Zerowaste dataset. We devise a non-standard data augmentation strategy that mimics the ever-changing conditions of real-world waste environments. Additionally, we introduce an adaptive weighted loss function and dynamically adjust the ratio of positive to negative samples through a masking method, ensuring the model learns from relevant samples. Lastly, to maintain consistency between predictions made on data-augmented images and the original counterparts, we remove input perturbations. Our method proves to be effective, as verified by an array of standard experiments and ablation studies, achieved an accuracy improvement of 3.74% over the baseline Zerowaste method.
Pose Estimation for Swimmers in Video Surveillance
(Springer Science and Business Media LLC, 2023-09-01) Cao, Xiaowen; Yan, Wei Qi
Traditional models for pose estimation in video surveillance are based on graph structures, in this paper, we propose a method that breaks the limitation of template matching within a range of pose changes to obtain robust results. We implement our swimmer pose estimation method based on deep learning. We take use of High-Resolution Net (HRNet) to extract and fuse visual features of visual object and complete the object detection using the key points of human joint. The proposed model could be applied to all kinds of swimming styles throughout appropriate training. Compared with the methods that require multimodel combinations and training, the proposed method directly achieves the end-to-end prediction, which is easily to be implemented and deployed. In addition, a cross-fusion module is added between parallel networks, which assists the network to make use of the characteristics of multiple resolutions. The proposed network has achieved ideal results in the pose estimation of swimmers by comparing HRNet-W32 and HRNet-W48. In addition, we propose an annotated key point dataset of swimmers which was created from the view of underwater swimmers. Compared with side view, the torso of swimmers collected by the underwater view is much suitable for a broad spectrum of machine vision tasks.
Sign Language Recognition from Digital Videos Using Feature Pyramid Network with Detection Transformer
(Springer Science and Business Media LLC, ) Liu, Yu; Nand, Parma; Hossain, Md Akbar; Nguyen, Minh; Yan, Wei Qi
Sign language recognition is one of the fundamental ways to assist deaf people to communicate with others. An accurate vision-based sign language recognition system using deep learning is a fundamental goal for many researchers. Deep convolutional neural networks have been extensively considered in the last few years, and a slew of architectures have been proposed. Recently, Vision Transformer and other Transformers have shown apparent advantages in object recognition compared to traditional computer vision models such as Faster R-CNN, YOLO, SSD, and other deep learning models. In this paper, we propose a Vision Transformer-based sign language recognition method called DETR (Detection Transformer), aiming to improve the current state-of-the-art sign language recognition accuracy. The DETR method proposed in this paper is able to recognize sign language from digital videos with a high accuracy using a new deep learning model ResNet152 + FPN (i.e., Feature Pyramid Network), which is based on Detection Transformer. Our experiments show that the method has excellent potential for improving sign language recognition accuracy. For instance, our newly proposed net ResNet152 + FPN is able to enhance the detection accuracy up to 1.70% on the test dataset of sign language compared to the standard Detection Transformer models. Besides, an overall accuracy 96.45% was attained by using the proposed method.

Browse

Browsing Open Research by Subject "0803 Computer Software"

Results Per Page

Sort Options