AKOS: Audio Correlation and Synthesis
(2022 – 2026)
The project aims at developing a model that correctly correlates and aligns separated audio channels. The use of deep learning techniques for this purpose requires large amounts of annotated training samples, which are unavailable. Therefore, a second model is developed to generate synthetic data that can be used to train the correlation model.
Tasks and goals
Due to the increased availability of deep learning models, processing audio can now be achieved with greater effectiveness than in the past. However, the accuracy of these models depends on the amount and quality of training data. Collecting and annotating training data is time-consuming, which is why synthetic data will be used instead.
A large amount of representative, real audio data covering a variety of characteristics was collected in the first phase of AKOS. This data was used to train a neural network capable of generating artificial data with the same characteristics. The synthetic data will be used to train a second model implementing the correlation and alignment of matching audio channels.
The final model will be evaluated with a set of annotated, real data samples. The procedure described will be adapted and repeated until the evaluation results demonstrate the effectiveness of the approach.