Gong, Liang YuLi, Xue JunChong, Peter Han Joo2024-12-162024-12-162024-11-29Sensors, ISSN: 1424-8220 (Print); 1424-8220 (Online), MDPI AG, 24(23), 7635-7635. doi: 10.3390/s242376351424-82201424-8220http://hdl.handle.net/10292/18474Spoofing attacks (or Presentation Attacks) are easily accessible to facial recognition systems, making the online financial system vulnerable. Thus, it is urgent to develop an anti-spoofing solution with superior generalization ability due to the high demand for spoofing attack detection. Although multi-modality methods such as combining depth images with RGB images and feature fusion methods could currently perform well with certain datasets, the cost of obtaining the depth information and physiological signals, especially that of the biological signal is relatively high. This paper proposes a representation learning method of an Auto-Encoder structure based on Swin Transformer and ResNet, then applies cross-entropy loss, semi-hard triplet loss, and Smooth L1 pixel-wise loss to supervise the model training. The architecture contains three parts, namely an Encoder, a Decoder, and an auxiliary classifier. The Encoder part could effectively extract the features with patches’ correlations and the Decoder aims to generate universal “Clue Maps” for further contrastive learning. Finally, the auxiliary classifier is adopted to assist the model in making the decision, which regards this result as one preliminary result. In addition, extensive experiments evaluated Attack Presentation Classification Error Rate (APCER), Bonafide Presentation Classification Error Rate (BPCER) and Average Classification Error Rate (ACER) performances on the popular spoofing databases (CelebA, OULU, and CASIA-MFSD) to compare with several existing anti-spoofing models, and our approach could outperform existing models which reach 1.2% and 1.6% ACER on intra-dataset experiment. In addition, the inter-dataset on CASIA-MFSD (training set) and Replay-attack (Testing set) reaches a new state-of-the-art performance with 23.8% Half Total Error Rate (HTER).© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).https://creativecommons.org/licenses/by/4.0/4603 Computer Vision and Multimedia Computation46 Information and Computing Sciences4605 Data Management and Data Science4611 Machine Learning0301 Analytical Chemistry0502 Environmental Science and Management0602 Ecology0805 Distributed Computing0906 Electrical and Electronic EngineeringAnalytical Chemistry3103 Ecology4008 Electrical engineering4009 Electronics, sensors and digital hardware4104 Environmental management4606 Distributed computing and systems softwareFacial Anti-Spoofing Using “Clue Maps”Journal ArticleOpenAccess10.3390/s24237635