Data
The data we are using consists of various types of audio signals that have both human voice and noise. This includes a recording of reading in a coffee shop by Raj Patel, and a recording of reading in an activity center by Qianxu Li and Tianwei Liu. These audios are collected to represent different situations where noise needs to be eliminated - one is a person speaking only, and another is two people speaking interactively. The time-domain representation and the audio recording of the coffee shop recording are as such:
Figure: The Coffee Shop Reading Recording in time domain © Tianwei Liu
Audio: The Coffee Shop Recording © Raj Patel
From the time domain, one can see that there are not only human reading signals that are represented by the high peaks of a wave, but also noise that is rectangular and mixed with the speech, which is present not only when the speaker is speaking but also when the speaker is not speaking. That is a good indication of which part of the signal is the noise for our program to eliminate.
More Information regarding the noise can be found in its frequency domain representation:
Figure: The Coffee Shop Recording in frequency domain © Tianwei Liu
In this frequency-domain graph, one can easily see that there exist multiple peaks, one with a higher magnitude at around 500Hz, and another with a lower magnitude at a lower frequency. Since only one person is speaking in this recording, and the speaking sound is the majority focus, it is obvious that the frequency component with the highest magnitude is the frequency of the speaker's voice, and the other components are majorly frequency components of noise.
The second audio piece we are analyzing is a reading recorded by Qianxu and Tianwei, in which the speaker speaks alternatively during the recording. This is a demonstration of tackling multi-speaker recordings in that each speaker may have a different frequency band of voice. The recording in time and frequency domain representation is as follows:
Figure: Two Speakers Recording in time domain © Tianwei Liu
Figure: Two Speakers Recording in frequency domain © Tianwei Liu
Audio: Two Speakers Recording © Qianxu Li & Tianwei Liu
Using those two pieces of recording, we would demonstrate how our system reduces and removes noise from a noisy audio recording of different characteristics while leaving the human voice still able to be heard.