Brain-Inspired Audio-Visual Information Processing Using Spiking Neural Networks

Wendt, Anne

Brain-Inspired Audio-Visual Information Processing Using Spiking Neural Networks

aut.embargo	No	en_NZ
aut.thirdpc.contains	Yes	en_NZ
aut.thirdpc.permission	Yes	en_NZ
dc.contributor.advisor	Whalley, Jacqueline
dc.contributor.advisor	Petrova, Krassie
dc.contributor.author	Wendt, Anne
dc.date.accessioned	2021-09-22T02:50:04Z
dc.date.available	2021-09-22T02:50:04Z
dc.date.copyright	2021
dc.date.issued	2021
dc.date.updated	2021-09-21T21:30:36Z
dc.description.abstract	Artificial neural networks are one of the most popular and promising approaches to modern machine learning applications. They are based on a mathematical abstraction of the intricate processing mechanisms in the human brain, remaining sufficiently simple for efficient processing in conventional computers. Despite efforts to mimic the capabilities of the brain, however, they are limited in their contextual understanding of concepts and behaviours. With the aim to explore ways to overcome these limitations, this thesis endeavours to investigate alternatives that are closer to the original biological systems, with a focus on processing auditory and visual signals. Inspired by the functioning of human hearing and vision and by the brain’s capabilities to dynamically integrate newly perceived information with previous experiences and knowledge, this thesis presents the hypothesis that mimicking these processes more closely could lead to an enhanced analysis of such signals. The framework that was developed to investigate this hypothesis consisted of three separate but connected projects that looked into biologically inspired computational processing of auditory, visual, and combined audio-visual signals, respectively. One aim of designing the framework was to largely preserve the spectral, spatial, and temporal characteristics of the original signals through tonotopic and retinotopic mapping. For the auditory processing system, an encoding and mapping method was developed that could transform sound signals into electrical impulses (“spikes”) by simulating the human cochlea, which were then fed into a brain-shaped three-dimensional spiking neural network at the location of the auditory cortices. For the visual system, the method was developed analogously, simulating the human retina and feeding the resulting spikes into the location of the visual cortex. A key advantage of this approach was that it facilitated a straightforward brain-like combination of input signals for the analysis of audio-visual stimuli during the third project. The approach was tested on two existing benchmark datasets and on one newly created New Zealand Sign Language dataset to explore its capabilities. While the sound processing system achieved good classification results on the chosen speech recognition dataset (91%) compared to existing methods in the same domain, the video processing system, which was tested on a gesture recognition dataset, did not perform as well (51%). The classification results for the combined audio-visual processing model were between those for the individual models (76.7%), and unique spike patterns for the five classes could be observed. Even though the models created in this work did not exceed the statistical achievements of conventional machine learning methods, they demonstrated that systems inspired by biological and neural mechanisms are a promising pathway to investigate audio-visual data in computational systems. Increasing the biological plausibility of the models is expected to lead to better performance and could form a pathway to a more intuitive understanding of such data. To broaden the applicability of the model, it is suggested that future work include the addition of other sensory modalities or signals acquired through different brain recording and imaging methods and to perform further theoretical and statistical analysis of the relationship between model parameters and classification performance.	en_NZ
dc.identifier.uri	https://hdl.handle.net/10292/14526
dc.language.iso	en	en_NZ
dc.publisher	Auckland University of Technology
dc.rights.accessrights	OpenAccess
dc.subject	Spiking neural networks	en_NZ
dc.subject	Speech recognition	en_NZ
dc.subject	Gesture recognition	en_NZ
dc.subject	New Zealand sign language	en_NZ
dc.subject	Machine learning	en_NZ
dc.subject	Signal mapping	en_NZ
dc.title	Brain-Inspired Audio-Visual Information Processing Using Spiking Neural Networks	en_NZ
dc.type	Thesis	en_NZ
thesis.degree.grantor	Auckland University of Technology
thesis.degree.level	Doctoral Theses
thesis.degree.name	Doctor of Philosophy	en_NZ

Files

Original bundle

Now showing 1 - 1 of 1

Name:: WendtA.pdf
Size:: 19.82 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 889 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Doctoral Theses