Musical Instrument Recognition Using Convolutional Neural Networks and Spectrograms
| aut.embargo | No | |
| dc.contributor.advisor | Narayanan, Ajit | |
| dc.contributor.advisor | Ghobakhlou, Akbar | |
| dc.contributor.author | Chen, Rujia | |
| dc.date.accessioned | 2025-03-17T01:53:25Z | |
| dc.date.available | 2025-03-17T01:53:25Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | This thesis presents a study on musical instrument recognition, leveraging Convolutional Neural Networks (CNNs) and a One-vs-All (OvA) classification framework. The primary goal is to accurately identify individual instruments within complex polyphonic music under varying noise conditions. The research introduces a novel approach by incorporating multiple spectrogram features and attention mechanisms, aiming to enhance classification accuracy and robustness in real-world scenarios. The study includes nine interrelated and progressively developed experiments, starting with the development of binary classifiers for specific instruments and progressing to large-scale evaluations using the NSynth and Open-MIC datasets. Early experiments validated the feasibility of the OvA approach on small datasets, while subsequent experiments expanded the model's scope to encompass ten instrument families, demonstrating its scalability and adaptability. Contributions of this work include a detailed analysis of various spectrogram techniques, which highlighted the superiority of a combined spectrogram approach in capturing diverse acoustic features. This combination was effective in handling challenging polyphonic contexts, where single spectrogram techniques often fell short. The integration of attention mechanisms further refined the model's focus on the critical spectral and temporal patterns, improving instrument recognition in complex environments. Key findings revealed that while the CNN-OvA models excelled in recognizing individual instruments and moderately complex mixes, they encountered challenges with larger ensembles. The thesis also offers some insights through feature map and heatmap analyses, which contribute to a better understanding of the interpretability of the CNN model's responses to spectrogram inputs. These analyses elucidate the underlying decision-making processes and highlight potential areas for further refinement and optimization in model design. This research contributes to the field of Music Information Retrieval (MIR) by exploring a novel classification approach, conducting empirical analyses, and providing insights that could inform future developments. It underscores the potential of CNNs, enhanced by multi-spectrogram features and attention mechanisms, to tackle the complexities of polyphonic music recognition, paving the way for further advancements in the field. | |
| dc.identifier.uri | http://hdl.handle.net/10292/18867 | |
| dc.language.iso | en | |
| dc.publisher | Auckland University of Technology | |
| dc.rights.accessrights | OpenAccess | |
| dc.title | Musical Instrument Recognition Using Convolutional Neural Networks and Spectrograms | |
| dc.type | Thesis | |
| thesis.degree.grantor | Auckland University of Technology | |
| thesis.degree.name | Doctor of Philosophy |
