Chen, RujiaGhobakhlou, AkbarNarayanan, Ajit2024-12-112024-12-112024-11-22Applied Sciences, ISSN: 2076-3417 (Print); 2076-3417 (Online), MDPI AG, 14(23), 10837-10837. doi: 10.3390/app1423108372076-34172076-3417http://hdl.handle.net/10292/18449Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial–temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. Here, we propose a hierarchical residual attention network using a scaled combination of multiple spectrograms, including STFT, Log-Mel, MFCC, and CST features (Chroma, Spectral contrast, and Tonnetz), to create a comprehensive sound representation. This model enhances the focus on relevant spectrogram parts through attention mechanisms. Experimental results with the OpenMIC-2018 dataset show significant improvement in classification accuracy, especially with the “Magnified 1/4 Size” configuration. Future work will optimize CST feature scaling, explore advanced attention mechanisms, and apply the model to other audio tasks to assess its generalizability.© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).https://creativecommons.org/licenses/by/4.0/46 Information and Computing Sciences4611 Machine LearningHierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-SpectrogramJournal ArticleOpenAccess10.3390/app142310837