Hierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-Spectrogram
Date
Supervisor
Item type
Journal Article
Degree name
Journal Title
Journal ISSN
Volume Title
Publisher
MDPI AG
Abstract
Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial–temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. Here, we propose a hierarchical residual attention network using a scaled combination of multiple spectrograms, including STFT, Log-Mel, MFCC, and CST features (Chroma, Spectral contrast, and Tonnetz), to create a comprehensive sound representation. This model enhances the focus on relevant spectrogram parts through attention mechanisms. Experimental results with the OpenMIC-2018 dataset show significant improvement in classification accuracy, especially with the “Magnified 1/4 Size” configuration. Future work will optimize CST feature scaling, explore advanced attention mechanisms, and apply the model to other audio tasks to assess its generalizability.Description
Source
Applied Sciences, ISSN: 2076-3417 (Print); 2076-3417 (Online), MDPI AG, 14(23), 10837-10837. doi: 10.3390/app142310837
Publisher's version
Rights statement
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
