Musical Instrument Recognition Using Convolutional Neural Networks and Spectrograms

Chen, Rujia

Musical Instrument Recognition Using Convolutional Neural Networks and Spectrograms

aut.embargo	No
dc.contributor.advisor	Narayanan, Ajit
dc.contributor.advisor	Ghobakhlou, Akbar
dc.contributor.author	Chen, Rujia
dc.date.accessioned	2025-03-17T01:53:25Z
dc.date.available	2025-03-17T01:53:25Z
dc.date.issued	2025
dc.description.abstract	This thesis presents a study on musical instrument recognition, leveraging Convolutional Neural Networks (CNNs) and a One-vs-All (OvA) classification framework. The primary goal is to accurately identify individual instruments within complex polyphonic music under varying noise conditions. The research introduces a novel approach by incorporating multiple spectrogram features and attention mechanisms, aiming to enhance classification accuracy and robustness in real-world scenarios. The study includes nine interrelated and progressively developed experiments, starting with the development of binary classifiers for specific instruments and progressing to large-scale evaluations using the NSynth and Open-MIC datasets. Early experiments validated the feasibility of the OvA approach on small datasets, while subsequent experiments expanded the model's scope to encompass ten instrument families, demonstrating its scalability and adaptability. Contributions of this work include a detailed analysis of various spectrogram techniques, which highlighted the superiority of a combined spectrogram approach in capturing diverse acoustic features. This combination was effective in handling challenging polyphonic contexts, where single spectrogram techniques often fell short. The integration of attention mechanisms further refined the model's focus on the critical spectral and temporal patterns, improving instrument recognition in complex environments. Key findings revealed that while the CNN-OvA models excelled in recognizing individual instruments and moderately complex mixes, they encountered challenges with larger ensembles. The thesis also offers some insights through feature map and heatmap analyses, which contribute to a better understanding of the interpretability of the CNN model's responses to spectrogram inputs. These analyses elucidate the underlying decision-making processes and highlight potential areas for further refinement and optimization in model design. This research contributes to the field of Music Information Retrieval (MIR) by exploring a novel classification approach, conducting empirical analyses, and providing insights that could inform future developments. It underscores the potential of CNNs, enhanced by multi-spectrogram features and attention mechanisms, to tackle the complexities of polyphonic music recognition, paving the way for further advancements in the field.
dc.identifier.uri	http://hdl.handle.net/10292/18867
dc.language.iso	en
dc.publisher	Auckland University of Technology
dc.rights.accessrights	OpenAccess
dc.title	Musical Instrument Recognition Using Convolutional Neural Networks and Spectrograms
dc.type	Thesis
thesis.degree.grantor	Auckland University of Technology
thesis.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ChenR.pdf
Size:: 9.28 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 895 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Doctoral Theses