Result
We have two demonstrations that show the outcome of our system. One of such is the coffee shop recording by Raj. In this piece, since only one person is speaking, there is no need to use the ICA algorithms to separate different features. Therefore, we run this piece through the filtering and wavelet transform analysis system directly, which gives the following result. The original audio is also listed below to compare with.
Audio: Coffee Shop Recording before any process © Raj Patel
Audio: Coffee Shop Recording after filtering © Tianwei Liu
Audio: Coffee Shop Recording after filtering & wavelet transform analysis © Tianwei Liu
As shown above, when only applying the filter, some noise but not all is eliminated, while the human voice is reserved due to the passband determined by the frequency of the human voice. The noise is not completely eliminated because there is still noise that falls within the passband of the filter. After the wavelet transform analysis, however, most noises are eliminated due to their relatively low volume and long time period of existence, while human voices are somewhat distorted because we operate within the frequency range of the voice. These results are the same as what we expected from our analysis.
When it comes to multiple speakers, ICA would come to place to separate each component before deciding the passband of frequency filters. Using the two speaker speech recorded by Qianxu and Tianwei, we run through this piece on Deep Learning Speech DIarization algorithm, separate the speakers, and process through frequency filter and wavelet transform analysis individually. This can make sure that there is only one peak of frequency in either recordings, and makes it easy to decide the passband accordingly. In the end, two audios are then merged together to form the final result, which is similar to the effect being applied to the first audio above. The intermediate pieces can be found in the method section, the result can be found below:
Audio: Two Speaker Recording before any process © Qianxu Li & Tianwei Liu
Audio: Two Speaker Recording after process © Qianxu Li & Tianwei Liu
For demonstration or view the entire algorithm, including filter, wavelet transform, ICA, Speech Diarization, and MFCC, please refer to the README file in the following GitHub repository.
https://github.com/Jerry-lqx/Noise-Cancellation-And-Speaker-Diarization
Link: GitHub repository for our algorithm © Qianxu Li
Future Work
For the future plans, our group would like to explore and spend more time on the librosa package of music and audio analysis. Understanding this package, especially the inverse mfcc related material, would help us understand where our mistake was during implementation and learning about related libraries and functions and how they were engineered. Furthermore, more research needs to be done to construct an inverse mfcc package in MATLAB with implementation instructions to build and achieve a preliminary algorithm which utilizes MFCC and ICA to achieve noise reduction and cancellation. Lastly with more time, we would also like to test the builtin Deep Learning Models in MATLAB’s to achieve our goal of perfect noise cancellation system.