Repository logo
 

Musical Instrument Recognition Using Convolutional Neural Networks and Spectrograms

Date

Supervisor

Narayanan, Ajit
Ghobakhlou, Akbar

Item type

Thesis

Degree name

Doctor of Philosophy

Journal Title

Journal ISSN

Volume Title

Publisher

Auckland University of Technology

Abstract

This thesis presents a study on musical instrument recognition, leveraging Convolutional Neural Networks (CNNs) and a One-vs-All (OvA) classification framework. The primary goal is to accurately identify individual instruments within complex polyphonic music under varying noise conditions. The research introduces a novel approach by incorporating multiple spectrogram features and attention mechanisms, aiming to enhance classification accuracy and robustness in real-world scenarios. The study includes nine interrelated and progressively developed experiments, starting with the development of binary classifiers for specific instruments and progressing to large-scale evaluations using the NSynth and Open-MIC datasets. Early experiments validated the feasibility of the OvA approach on small datasets, while subsequent experiments expanded the model's scope to encompass ten instrument families, demonstrating its scalability and adaptability. Contributions of this work include a detailed analysis of various spectrogram techniques, which highlighted the superiority of a combined spectrogram approach in capturing diverse acoustic features. This combination was effective in handling challenging polyphonic contexts, where single spectrogram techniques often fell short. The integration of attention mechanisms further refined the model's focus on the critical spectral and temporal patterns, improving instrument recognition in complex environments. Key findings revealed that while the CNN-OvA models excelled in recognizing individual instruments and moderately complex mixes, they encountered challenges with larger ensembles. The thesis also offers some insights through feature map and heatmap analyses, which contribute to a better understanding of the interpretability of the CNN model's responses to spectrogram inputs. These analyses elucidate the underlying decision-making processes and highlight potential areas for further refinement and optimization in model design. This research contributes to the field of Music Information Retrieval (MIR) by exploring a novel classification approach, conducting empirical analyses, and providing insights that could inform future developments. It underscores the potential of CNNs, enhanced by multi-spectrogram features and attention mechanisms, to tackle the complexities of polyphonic music recognition, paving the way for further advancements in the field.

Description

Keywords

Source

DOI

Publisher's version

Rights statement

Collections