Our team is on break until January 7, 2026. Inquiries will be addressed shortly after our return. Thank you for your patience and happy holidays!
Repository logo
 

Hierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-Spectrogram

Supervisor

Item type

Journal Article

Degree name

Journal Title

Journal ISSN

Volume Title

Publisher

MDPI AG

Abstract

Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial–temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. Here, we propose a hierarchical residual attention network using a scaled combination of multiple spectrograms, including STFT, Log-Mel, MFCC, and CST features (Chroma, Spectral contrast, and Tonnetz), to create a comprehensive sound representation. This model enhances the focus on relevant spectrogram parts through attention mechanisms. Experimental results with the OpenMIC-2018 dataset show significant improvement in classification accuracy, especially with the “Magnified 1/4 Size” configuration. Future work will optimize CST feature scaling, explore advanced attention mechanisms, and apply the model to other audio tasks to assess its generalizability.

Description

Source

Applied Sciences, ISSN: 2076-3417 (Print); 2076-3417 (Online), MDPI AG, 14(23), 10837-10837. doi: 10.3390/app142310837

Rights statement

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).