top of page

Progress Report

Introduction

Inspired by the importance of audio and video recording over electronics today, our task focuses on the removal of hearable noise from audio signals. Noise is often mixed with human voice in the audio and video recordings, and they have different features that could be extracted using DSP skills that are learned both in and outside of EECS 351 class. Our task focuses on recordings of human voice, such as audio or videos, that are mixed with certain types of noisy components, such as train noise, people talking in the background, or machine working. And we are to use DSP skills such as frequency analysis, frequency filtering, wave decomposition, independent component analysis, etc. to separate, remove, and reconstruct the audio signal with clearer human voice.

Dataset

The dataset we are using is various types of recordings that represent different voice and noise mixture. It includes readings in a café shop recorded by Raj, and also other noisy audio segments adapted from the internet, including a student and professor's conversation. We are also looking at the dataset provided by Kaggle on our independent component analysis model. The dataset could be found at: https://www.kaggle.com/datasets/wiradkp/mini-speech-diarization

This plot would be a spectrum plot of the coffee shop recording before and after the initial process of the data using simple filtering and wave decomposition methods. The vertical axis is the frequency and the horizontal axis is the time. After processing the signal using wave decomposition and frequency filter, the human speech frequency range (approximately 300Hz - 3400Hz) is preserved, and noise within that frequency range is also reduced by wave decomposition. The original coffee shop recording:

 

Coffee shop recording after simple filtering:

Independent Component Analysis:

For ICA, we have analyzed two piece of recordings from different person speaking. One is a professor and another is a student. Here are the original recordings:

Student:

Professor:

 

 

The first figure shows the random mix of a student audio signal with a professor’s voice. The second figure shows the original professor audio signal over time as well as the unmixed audio after Independent component analysis. The third figure shows the original student audio file over time and its unmixed audio after ICA.

It is clear that the original audio has been recovered from the mixed file with a small noise and a higher amplitude because of the random mixing matrix. In general, the ICA works great when separating two voices.

I tried to use ICA to separate noise from speech and it didn't work. I think the ICA treats the audio with the noise as signal audio instead of two separate ones. Therefore I think for our project, we can pass the filtered audio signal to this ICA and extract the voice we need.


Method

First Method: The first method that we looked at was frequency analysis and filtering. Noise such as train sound or street noise are often of different frequencies compared to human voice. Therefore, a bandpass filter design would be capable of eliminating noise that contains the aforementioned features. Through investigation, we found that human voices usually are concentrated within ~2000 Hz of frequency range, with each person having a slightly different center of frequency based on their voice. Under such circumstances, We have implemented a content aware frequency filter, which analyses the frequency component with the highest amplitude of the recording that is inputted into the system, and then determines the passband accordingly. This frequency filtering technique works on different voices with different centers of frequencies, and could eliminate all noises that are not within the frequency range of the human voice. The speech after the new filter process is as follow:

 

Second Method: The speaker identification machine learning method can be put to use to amplify the results obtained from the first method, In further detail, multiple different set of audio files can be used to train the algorithm. This is done segmenting each audio file in smaller size because speech signals are stationary on short time scale. During this short window, the pitch and mel frequency cepstral coefficients (MFCC) 

will be unitlized to estimate a pitch value and further to determine if the pitch is a human pitch or silence or unvoiced pitched through examining the energy threshold. After acquiring the segments out of each audio file, the algorithm will go through the a training set to build intensity and accuracy to deliver what the algorithm is supposed to do and then will be tested using additional audio recording to see how well it behaves with signals that were not used to train it.

 

Speech Diarization: We also looked into the speaker diarization method. Modern diarization involves complicated machine learning and neural networks. It is very difficult to implement them in matlab given the time constraints. Later on, we researched previous studies on Independent component analysis. It is an old-fashioned but classical method for solving cocktail party problems. It is easier to implement and there are many existing algorithms that we can use and modify upon.

 

So far, we have tried the Reconstruction ICA (rica) in matlab. It generates a model involving an inverse matrix to separate individual voices. To test its functionality, we have mixed two audio signals together with a random matrix and a random offset. Then, we ran the rica command on this mixed signal to create a rica model. After that, we extracted two features from the mixed signal using the model (transform function). The unmixed signals are well constructed and the original signal is audible with a slight / negligible noise. I also found a fastICA function which we will dig into later.


 

Challenge

One challenge that we encountered was the difficulty of implementing the Wiener filtering method on noise elimination. Our initial thought was that we can treat the noise as a system, where the input to that system is a clear human voice and the output is the human voice mixed with noise. Under such circumstances, we could compute the impulse response and frequency response of such systems by taking a clear voice and mixed sound and dividing them in the frequency domain. However, after obtaining the frequency response of several recording pieces, we found that they are not adaptive and could not be implemented to other mixed voices whose clear voice is unknown. If force is implemented, then the output signal would become complete noise in which no voice component can be identified. If we can use this method, or if yes how could we use this method to determine and eliminate noise would be one of the challenges that we are facing as of now.

 

The biggest challenge with ica is that the ica function itself doesn’t recognize noise. So, the possible solution is to filter out the noise using frequency or other methods and then pass the signal into this ICA function. Eventually, we can select the desired signal to be outputted.

 

Another challenge is that the output from ica function doesn’t guarantee the order of the signal, therefore it’s hard to select the correct signal. I would implement a comparison function that compares the unmixed signal and the original desired signal. The closest one will be chosen.

 

Future Plan

  1. Select the best noise filter algorithms

  2. Integrate the filter with ICA

  3. Test the system on different signals / real life signals.

The second method has not been implemented yet. The layout of the this method would be to include.

  1. Build the train model and test with synthetic audio signals.

  2. Gather additional raw audio

  3. Test the algorithm with the newly collected audio signal to generate unbiased results. 

New Discovery

Independent Component Analysis:

Reconstruction ICA is a new DSP tool that I am trying to use right now. It is very helpful as described above. The ICA will separate different independent voice signals to a great extent. It is relevant to the project because we are trying to remove the noise and select the desired voice signal and ICA will help us filter out other different signals.

屏幕截图 2022-03-28 170024.png
125835.png
125852.png
125903.png

This is a Preliminary Report. For most updated information, please go to the main website.

bottom of page