Hierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-Spectrogram
Date
Authors
Chen, Rujia
Ghobakhlou, Akbar
Narayanan, Ajit
Supervisor
Item type
Journal Article
Degree name
Journal Title
Journal ISSN
Volume Title
Publisher
MDPI AG
Abstract
Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial–temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. Here, we propose a hierarchical residual attention network using a scaled combination of multiple spectrograms, including STFT, Log-Mel, MFCC, and CST features (Chroma, Spectral contrast, and Tonnetz), to create a comprehensive sound representation. This model enhances the focus on relevant spectrogram parts through attention mechanisms. Experimental results with the OpenMIC-2018 dataset show significant improvement in classification accuracy, especially with the “Magnified 1/4 Size” configuration. Future work will optimize CST feature scaling, explore advanced attention mechanisms, and apply the model to other audio tasks to assess its generalizability.Description
Keywords
46 Information and Computing Sciences, 4611 Machine Learning
Source
Applied Sciences, ISSN: 2076-3417 (Print); 2076-3417 (Online), MDPI AG, 14(23), 10837-10837. doi: 10.3390/app142310837
Publisher's version
Rights statement
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
