A Spiking Neural Network Architecture for Localizing Event-Trigger Indoor Moving Sound Sources

Date
2024
Authors
Roozbehi, Zahra
Supervisor
Narayanan, Ajit
Mohaghegh, Mahsa
Item type
Thesis
Degree name
Doctor of Philosophy
Journal Title
Journal ISSN
Volume Title
Publisher
Auckland University of Technology
Abstract

Imagine being blindfolded in a room and hearing a voice fading and moving around. How do we track the sound's origin and distance, and how do we distinguish what is being said? What computational methods and techniques exist for addressing this problem?

Sound source localization refers to acoustic methods and technology for determining sound source in a three-dimensional space. However, existing methods struggle in real-world scenarios with background noise and multiple moving sound sources. In the domain of real-time applications, researchers continue to face challenges in sound source tracking and classification.

The aim of this thesis is to introduce and evaluate a novel approach based on spiking neural networks to address the challenge of sound localization in dynamic environments. We present the adaptive resonance theory-based reservoir spiking neural network (ART-rSNN) and demonstrate its application in real-time, multi-source sound detection and classification.

Extensive simulations comparing our approach with other conventional machine learning models reveal that these models have some problems in categorizing and detecting sound events with multiple sources in real-time in comparison to our approach. The ART-rSNN can dynamically and autonomously adjust its neuron configuration based on received sound cues. This dynamic characteristic enables it to concentrate computation exclusively in the vicinity of estimated sound sources – a departure from static methods.

Overall, our framework handles the challenges of spatio-temporal data analysis required for this task while demonstrating adaptability in managing changing acoustic environments. What sets our work apart is its reliance on the measured power of sound without necessitating prior spatial sound source data for supervised learning. This distinctive feature improves the performance of our approach, especially in scenarios where other deep-learning approaches struggle to handle multiple sound sources using only time-domain raw signals.

In conclusion, the dynamic adaptability of our ART-rSNN, coupled with its performance in noisy environments and multi-source scenarios, positions it as a promising advancement in the field of AI-based approaches to sound localization and classification.

Description
Keywords
Source
DOI
Publisher's version
Rights statement
Collections