Repository logo
 

Musical Instrument Recognition Using Convolutional Neural Networks and Spectrograms

aut.embargoNo
dc.contributor.advisorNarayanan, Ajit
dc.contributor.advisorGhobakhlou, Akbar
dc.contributor.authorChen, Rujia
dc.date.accessioned2025-03-17T01:53:25Z
dc.date.available2025-03-17T01:53:25Z
dc.date.issued2025
dc.description.abstractThis thesis presents a study on musical instrument recognition, leveraging Convolutional Neural Networks (CNNs) and a One-vs-All (OvA) classification framework. The primary goal is to accurately identify individual instruments within complex polyphonic music under varying noise conditions. The research introduces a novel approach by incorporating multiple spectrogram features and attention mechanisms, aiming to enhance classification accuracy and robustness in real-world scenarios. The study includes nine interrelated and progressively developed experiments, starting with the development of binary classifiers for specific instruments and progressing to large-scale evaluations using the NSynth and Open-MIC datasets. Early experiments validated the feasibility of the OvA approach on small datasets, while subsequent experiments expanded the model's scope to encompass ten instrument families, demonstrating its scalability and adaptability. Contributions of this work include a detailed analysis of various spectrogram techniques, which highlighted the superiority of a combined spectrogram approach in capturing diverse acoustic features. This combination was effective in handling challenging polyphonic contexts, where single spectrogram techniques often fell short. The integration of attention mechanisms further refined the model's focus on the critical spectral and temporal patterns, improving instrument recognition in complex environments. Key findings revealed that while the CNN-OvA models excelled in recognizing individual instruments and moderately complex mixes, they encountered challenges with larger ensembles. The thesis also offers some insights through feature map and heatmap analyses, which contribute to a better understanding of the interpretability of the CNN model's responses to spectrogram inputs. These analyses elucidate the underlying decision-making processes and highlight potential areas for further refinement and optimization in model design. This research contributes to the field of Music Information Retrieval (MIR) by exploring a novel classification approach, conducting empirical analyses, and providing insights that could inform future developments. It underscores the potential of CNNs, enhanced by multi-spectrogram features and attention mechanisms, to tackle the complexities of polyphonic music recognition, paving the way for further advancements in the field.
dc.identifier.urihttp://hdl.handle.net/10292/18867
dc.language.isoen
dc.publisherAuckland University of Technology
dc.rights.accessrightsOpenAccess
dc.titleMusical Instrument Recognition Using Convolutional Neural Networks and Spectrograms
dc.typeThesis
thesis.degree.grantorAuckland University of Technology
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ChenR.pdf
Size:
9.28 MB
Format:
Adobe Portable Document Format
Description:
Thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
895 B
Format:
Item-specific license agreed upon to submission
Description:

Collections