Acoustics, Analysis and AI (3AI)

The Acoustics, Analysis and AI (3AI) group at the IT:U Austria in Linz explores how mathematics, acoustics, and artificial intelligence can work together. We study interpretable, hybrid AI systems for sound analysis—combining deep learning with mathematical structure and human insight. Our research ranges from signal processing and hearing models to bioacoustics and mathematical foundations, aiming to bridge human and machine understanding of acoustic phenomena.

Connecting mathematics, acoustics, and artificial intelligence

Nowadays, artificial intelligence is everywhere — impacting our daily lives and transforming nearly all research domains, including acoustics. From speech recognition and noise reduction to spatial hearing and bioacoustics, AI-driven models are changing how we understand, simulate, and process sound. Deep neural networks and other learning-based models have shown enormous potential to benefit society and have become powerful tools for scientific discovery. At the same time, concerns remain that AI may replace or overshadow human reasoning and perception. The grand challenge for the future is therefore to understand how human and artificial intelligence can cooperate and complement each other—especially in the analysis and understanding of sound.

This interaction is explored scientifically at the intersection of acoustics, mathematics, and machine learning. Our goal is to build rigorous mathematical foundations and develop interpretable, hybrid AI systems that serve human understanding—particularly in the study of auditory phenomena and the design of intelligent acoustic technologies.

This group cooperates closely with the Acoustics Research Institute of the Austrian Academy of Sciences.

The group’s research combines theoretical mathematics, signal processing, numerical acoustics, and modern AI, for example in the following complementary project lines:

Hybrid Data-Processing:
We study how human expert knowledge and data-driven machine learning can be combined effectively. In audio processing, this concerns whether to pre-process input signals using human-designed time-frequency representations or to use fully end-to-end models. Building our recent work on hybrid filter banks, we pursue new mathematical and computational approaches that merge the strengths of both paradigms.

Hearing for and with AI:
We develop interpretable models of human hearing that use AI to advance psychoacoustic understanding. In the opposite direction, we integrate insights from auditory perception to enable machines to “hear” and analyze sounds more effectively bridging cognitive science and computational acoustics.

Bioacoustics with AI:
Animal vocalizations and natural soundscapes contain valuable information about ecosystems and behavior. Using deep learning and signal analysis, we study the acoustic communication of species such as mice and elephants and develop AI-based measures for biodiversity and environmental monitoring.

AI for Maths – Maths for AI:
This bidirectional line connects mathematical analysis and machine learning. We investigate how mathematical tools can help explain and predict the behavior of AI models—focusing on understanding structure, not merely approximations. Conversely, we explore how AI can assist in mathematical reasoning and support human learning in mathematics.

The 3AI research group builds bridges between mathematical theory and real-world acoustics, and between human and artificial intelligence. Students and researchers joining the group can expect an interdisciplinary and open research environment that values analytical precision, creativity, and a collaborative spirit across the boundaries of mathematics, acoustics, and AI.

IT:U PhD Projects Acoustics, Analysis and AI (3AI)

Here we describe six PhD projects envisioned for the group 3AI of fellow professor P. Balazs.

The PhD curriculum at IT:U follows a 4-year path. In the first year, emphasis is placed on focused research lab modules.

The first year concludes with the PhD Proposal Presentation. Over the next three years, the content of the PhD thesis will be developed. This is accompanied by interdisciplinary research seminars and work as a project assistant. The PhD program concludes after the 4th year with the submission and defence of the PhD thesis.

A call will be sent out for three positions, where candidates can apply for any of the projects. (This means that for some projects we will have no candidate.)

Building Improved USV Boxes

Home-cage monitoring of mice and other laboratory animals are increasingly being used in translational research, but several technical changes are needed in conventional cages to be able to study acoustic communication. In this project we will design and construct a new type of mouse cage with integrated automated control to monitor, record, and playback sonic and ultrasonic vocalisations. The cage will feature integrated video recording alongside an improved acoustic environment, computer-controlled doors, and a microphone–speaker array capable of recording and playing back ultrasonic vocalizations with high fidelity. The video data will complement the microphone array, enabling simultaneous behavioral analysis, enhancing audio source separation, supporting advanced noise reduction, and allowing precise detection of individual vocalizations. Alongside the hardware development, the project will implement signal processing and machine learning algorithms for real-time detection, classification, and synthesis of mouse vocalizations, enabling closed-loop behavioral experiments. Special emphasis will be placed on co-design: optimizing the acoustic environment, minimizing handling of the animals, and ensuring robust, low-latency interaction between software and hardware. The resulting platform will support reproducible, high-throughput studies of communication, learning, and social behavior in laboratory mice.

This project will be conducted together with D. Penn and S. Zala (VetMed Vienna). . It will be supported by PostDocs at ARI (R. Abbasi).

It will be conducted together with the IT:U HANS – Digital Production Lab and/or the IT:U Design Lab.

See e.g.

* S. M. Zala, D. Reitschmidt, A. Noll, P. Balazs, D. J. Penn, “Sex-Dependent

Modulation of Ultrasonic Vocalizations in House Mice (Mus musculus musculus)”, PLOS ONE(2017)

Optimizing Artificial Neural Networks to Probe the Functional Role of Spatial Hearing

Why is spatial hearing important? Is its primary role to help us localize objects in space, or rather to segregate a sound of interest from competing sources—such as during a cocktail party? Artificial neural networks (ANNs) offer a powerful framework to address such “why” questions about brain function. If an ANN trained on a particular task spontaneously reproduces a phenomenon observed in humans—while training on other tasks does not—this suggests that the phenomenon may emerge from optimizing the brain for that same task.

A striking example in spatial hearing is the history-dependent adaptation of absolute space perception. This has been speculated to reflect an adaptation that supports source segregation rather than precise localization.

This project will test that hypothesis by training ANNs on different objectives:

sound source segregation (“what?”),
spatial localization (“where?”), or
both combined.

We will then examine under which training conditions human-like perceptual phenomena emerge. In addition, we will investigate the role of ecological validity by comparing networks trained on naturalistic, reverberant multi-source environments with those trained in simplified, anechoic conditions.

Ultimately, the project will shed light on whether and how representations of what and where interact in spatial hearing, offering insights into the functional relevance of this fundamental ability.

This project will be conducted together with R. Baumgartner (ARI Vienna).

See e.g.

* Meijer, D., Barumerli, R. & Baumgartner, R. How relevant is the prior? Bayesian causal inference for dynamic perception in volatile environments. eLife 14, (2025).

Machine Learning and Signal Processing for Bioacoustic Vocalizations

This project aims to develop signal-processing and machine learning tools for automated segmentation, classification, and parameter estimation of animal vocalizations. Building on recent advances in computational bioacoustics, spectrogram- or waveform-based feature extraction, combined with convolutional, recurrent neural network architectures (and possibly other ML approaches) will be used to detect and classify call types, and extract acoustic parameters such as fundamental frequency contours, harmonics, or temporal structure under real-world

conditions with varying noise and domain shifts. Special attention will be paid to generalization: training and evaluating models across different recording conditions and species, designing effective data augmentation schemes, and deploying pretrained models and embeddings. The software and analysis tools will be open- source and interoperable, allowing reproducibility and seamless integration with diverse experimental platforms. The overall goal is to develop of a robust and interpretable framework that enables biologists to scale up quantitative bioacoustic analysis.

This project will be conducted on mice, budgerigars, elephants and potentially other animals, together with M. Höschele, A. Stöger and/or A. Baotic (ARI Vienna) depending on the animal taxa to be worked on. It will be supported by PostDocs at ARI (R. Abbasi, D. Haider).

See e.g.

* R. Abbasi, P. Balazs, A. Marconi, D. Nicolakis, S. Zala, D. Penn, “Capturing the songs of mice with an improved detection and classification method for ultrasonic vocalizations (BootSnap)”, PLOS Computational Biology, Vol. 18( 5): e1010049, preprint on biorxiv (2022)

Building Improved Bioacoustic Monitors

This project addresses the joint challenge of building energy-efficient acoustic monitoring hardware and developing embedded machine learning algorithms is addressed for wildlife applications. Specifically, the goals are to design and prototype low-power acoustic monitors — either stationary sensors for zoos and field sites, or collar-mounted devices for large mammals such as giraffes or elephants — and equip them with advanced on-device signal processing and learning capabilities. The research will mainly focus on co-designing the hardware and algorithms: optimizing microphones, developing efficient power supply solutions, designing data transmission pipelines for long-term deployment, and implementing lightweight machine learning models for event detection, call classification, and active learning directly on the device. The overall goal is a comprehensive system that enables autonomous bioacoustic monitoring and real-time tracking with minimal energy consumption.

The developed system will be used for elephants, giraffes, cheetahs, gibbons, or general wildlife monitoring, e.g. in Austrian woods, together with A. Stöger and/or A. Baotic (ARI Vienna) depending on the animal taxa.. It will be supported by PostDocs at ARI (D. Haider), and a PhD student at ARI (J.Zeleznik).

It will be conducted together with the IT:U HANS – Digital Production Lab and/or the IT:U Design Lab.

See e.g.

* Zeppelzauer, M., Hensman, S., & Stoeger, A. S. (2014). Towards an automated

acoustic detection system for free-ranging elephants. Bioacoustics, 24(1), 13–29. https://doi.org/10.1080/09524622.2014.906321

An Interpretable Hearing Model

We will develop an interpretable (human) hearing model, building on our research on stable neural networks, and aimed at a continuous learning approach predicting human behavior in psychoacoustic experiments. In contrast to existing concepts, for the first task, we will create an interpretable invertible neural network-approach with learned filterbanks. The pipeline of a first model could be a hybrid filterbank based on an AUDlet, followed by a temporal convolution network (tCN) to model temporal integration in hearing, and general INNs, with a task-specific decision network at the end. The level of complexity will be increased by including tensors, recursive, and concatenated filterbank approaches to incorporate published modeling attempts such as multiple looks, level dependency and dual pathway approaches. We will learn and test our methods with a transfer learning approach on existing psychoacoustic datasets, predicting the results in published papers and integrate the results in the auditory modeling toolbox, hosted at ARI.

This project will be conducted together with B. Laback (ARI). See e.g.

Saddler MR, McDermott JH. Models optimized for real-world tasks reveal the task- dependent necessity of precise temporal coding in hearing. Nat Commun. 2024 Dec 4;15(1):10590. doi: 10.1038/s41467-024-54700-5. PMID: 39632854; PMCID:

PMC11618365. https://www.nature.com/articles/s41467-024-54700-5

Interpretable Audio-Preprocessing

An ongoing debate in the design of machine learning models, particularly for applications involving acoustic data, is how input signals should be pre-processed and analyzed. That is, what is the best approach to feature extraction for audio? The most common approach uses fixed time-frequency representations as a human- engineered way of representing audio to the model. An alternative approach is to “let the machine learn the representation” by using the raw audio signal directly as model input. The aim of this project is to explore the connections between concepts from signal processing and machine learning from a theoretical and practical point of view, and develop explanations as well as novel designs of feature extractors that leverage knowledge from both fields. An existing line of work within the project concerns a “hybrid” approach that combines classical filterbank design with an integrated learning step by the model. A future goal here is to embed this construction into auditory research and design and use learned modulation filterbanks for feature extraction. Additional topics include the interpretation of recurrent convolutional layers as filterbanks on infinite sequence spaces and the link of gradient boosting to empirical mode decomposition (EMD).

This project will be conducted together with V. Lostanlen (LS2N CNRS Nantes) and will be supported by PostDocs at ARI (D. Haider).

See e.g.

* D. Haider∗ , F. Perfler∗ , V. Lostanlen, M. Ehler, P. Balazs, “Hold Me Tight: Stable Encoder–Decoder Design for Speech Enhancement”, Interspeech 2024, Kos Island (2024)

Potential Projects

Potential Master and Off-Lab Projects:

Building a 3D Audio Recorder from scratch

Building a 3D Audio Sound system from scratch

Building a Measurement System for HRTFs from scratch

Improve 3D Audio in other IT-U labs

Build an Audiometry (from scratch)

Do a sub-project of PhD projects focusing on certain details, e.g.

Development of bioacoustics and movement monitoring collar mounted sensors for African Savannah elephants (Loxodonta Africana) – MSc thesis project

Supervision: Prof. Dr. Peter Balazs  & Prof. Dr. Angela Stöger
Co-supervision: Dr. Daniel Haider & Jure Železnik

Collar-mounted wildlife sensors enable continuous, minimally invasive monitoring of behavior, habitat use, and inter-individual acoustic communication. Yet, attributing specific vocalizations to the correct individual in free-ranging groups remains challenging when audio is collected without tightly synchronized motion and position data. The candidate in this project will design and prototype a field-ready elephant collar that integrates wide-band acoustic recording, high-rate tri-axial accelerometery (X/Y/Z), and GPS positioning in a time-synchronized, low-power package. The system will feature precise clocking and sensor fusion pipelines so that subtle body movements and collar vibrations can be aligned with acoustic onsets, while GPS trajectories provide spatial constraints, together enabling robust focal-caller attribution. Emphasis will be placed on the ability to detect focal callers as well as power budgeting for longer deployments, rugged enclosure and microphone placement to minimize environmental noise, and serviceable data logging. The outcome will be a validated hardware prototype with open documentation, calibration and synchronization procedures, and a thesis evaluating focal-caller detection accuracy from combined sensors versus audio-only baselines.

Development of sound event annotating tool with incorporated few-shot and active learning – MSc thesis project

Supervision: Prof. Dr. Peter Balazs  & Prof. Dr. Angela Stöger
Co-supervision: Dr. Daniel Haider & Jure Železnik

The use of long-term autonomous recorders has been steadily increasing over the past few decades in the field of bioacoustics. This means that wildlife researchers and bioacousticians are collecting thousands of hours of recordings, after which they manually extract sound events of their interest. Although the development and use of machine learning algorithms for automatic sound event extraction and analysis are promising, these generally only work well on large well-defined curated datasets with previously identified sounds events that can be used for training and validating. To make this available to the dataset at hand, the candidate in this project will design and prototype a modern annotation platform for audio and bioacoustics research that can couple an intuitive labelling UI with state-of-the-art machine learning. The tool will provide waveform/spectrogram views, rapid keyboard/mouse labelling, and flexible schema management, facilitating few-shot learners to “cold-start” models from just a handful of examples per class. An active-learning loop will continuously surface the most informative or uncertain clips for review, maximizing annotation efficiency and model quality with minimal human effort. Emphasis will be placed on the development of a tool which can be used for wildlife monitoring and/or environmental soundscape analysis. The outcome will be a usable, extensible open-source tool, and a thesis evaluating gains in labelling speed/accuracy versus conventional workflows.

:overview