Dzmitry Kasinets | Software Engineer

Concatenative Resynthesis for Extracting Bass Parts from Songs

Machine Learning / Music Information Retrieval (MIR)

This project is based on an existing system that can remove noise from speech recordings. It uses a collection of clean speech signals and a deep neural network to resynthesize clean speech from noisy observations. This approach is called Concatenative Resynthesis and can produce extremely high-quality enhancements. The goal of the current study is to create a system to take a song as an input and return the song’s bass part as output. To do so, we use Concatenative Resynthesis, which requires creating new features and finding a configuration that can successfully resynthesize the bass part.

Our experiments are performed on the MedleyDB dataset, a corpus of over 100 multitrack recordings. This means that each song is provided as the final product, along with all of the recordings of individual instruments that were combined to make it. This lets us choose a test song with good bass and train the network on more songs with bass to resynthesize the test song’s bass part. To do this, we tried Gammatone Filterbank feature, and later created a custom one.

We evaluate the results by comparing them to the actual bass recordings. We plan to evaluate the system by creating a survey that compares it to other approaches. Preliminary listening tests suggest that it is able to resynthesize bass with high audio quality, rhythm and pitch accuracy if it is trained on parts of itself in addition to different songs. It can resynthesize bass with good audio quality, rhythm and pitch accuracy if trained only on different songs.

This project is valuable because it has the potential to create much higher quality separations than current approaches, which could enable new musical applications of this promising source separation technique that has so far been applied only to speech.