Although January provides us with the opportunity to embrace the New Year and toss the events of 2016 behind us, we’re still savouring the remaining traces of the holiday season and our festive spirits before they’re gone for good.
A key feature of the holiday season are the carols. While you may picture a choir singing, a research group at U of T has a very different idea in mind.
Hang Chu, a first-year Ph.D. student in the Machine Learning Group at U of T’s computer science department, in collaboration with his advisors, professors Sanja Fidler and Raquel Urtasun, initially worked to train neural networks to produce generic songs from an input image. They soon discovered that their creation could be applied to holiday carols.
The project, called Sound from Pi, involves music generation using artificial intelligence. In layman terms, Chu explains that the team is working to generate music using different tracks of melody, chord, and drum. The team realized that they could apply this method of music generation to create “neural story singing.”
“The goal is to generate a song from just an image. It is using a series of different deep learning neural network technologies and it is analyzing the contents in the image, then using the automatically-recognized image content to generate lyrics. [The next step is to] generate music that accompanies the lyrics and string everything together,” Chu says.
A neural network is a popular algorithm that is used for many tasks, such as image and speech recognition, as well as applying rhythm to music generation.
The researchers began their project by simply crawling the internet for symbolic music sheets. They searched about 100 hours of music, and used that as their training data for the music generation.
But what’s music without a little dancing?
Another part of the team’s project includes generating a stickman that will dance along with music. Chu explains that for this application, the team utilized the game Just Dance to train the robot with one hour of dancing data.
To train the neural network to create these computer-generated holiday jingles, the team worked through multiple stages.
“First of all, we have the input image. Then we have a neural network that analyzes the content of the image. So it takes something like a Christmas tree in the middle of your living room, and it generates captions,” Chu explains. “The captions will be phrases such as ‘Christmas tree’, or maybe ‘flowers,’ ‘gifts,’ and it generates a passage of captions. This is the first stage.”
In the second stage, another neural network transfers the output of the first stage, the passage of captions, to generate lyrics. And given the lyrics, in the third stage, a different neural network takes those lyrics as its input and generates a music melody, drum, and chord that go along well with the lyrics.
“We are currently working on making this a lab demo, so it can become an app or a webpage so that everyone can upload their own photo and get a song mixed with that photo,” Chu says.
Research on this “neural story singing” motivates the team to take the technology into many future directions. The demo is one immediate application, but they hope to potentially use this technology as an intelligent assistant to musicians. Sometimes the composition of chord or drum within music becomes tedious, so Chu believes this can be used as a future application to reduce their workload.
“For the future directions, we were actually thinking of making the music even better. Music has different emotions, so we were thinking of if we can even incorporate that concept into our music generation. So, generating more rich emotions in that.
“For a second direction, we were thinking to make the dancing component better. We only have one hour of dancing, and that’s definitely not enough data for a robot to learn what dancing is, so we are thinking of improving that,” Chu says.
Interestingly, Chu played the accordion during childhood, and learned to play the guitar and piano later in life.
“I have been playing music basically my entire life. So this is also my passion: to do something in the machine learning scale that also involves music.”
It certainly shows that you never know what combining your passions may lead to—maybe even something as interesting as computer-generated Christmas carols.