Technion researchers in computer science use artificial intelligence to generate musically appealing personalized jazz solos that match human-specific preferences
A paper titled, “BebopNet: Deep Neural Models for Personalized Jazz Improvisation” was recently presented at the 21st International Society for Music Information Retrieval Conference (ISMIR 2020) and received the Best Research Award. Authored by M.Sc. students Nadav Bhonker (already graduated) and Shunit Haviv Hakimi, and by their advisor, Professor Ran El-Yaniv at the Henry and Marilyn Taub Faculty of Computer Science at the Technion-Israel Institute of Technology, the paper indicates that it is possible to model and optimize personalized jazz preferences.
Learning to generate music is an ongoing AI challenge. An even more difficult task is the creation of musical pieces that match human-specific preferences. In the BebopNet project, Bhonker and Haviv Hakimi, both amateur jazz musicians, focused on personalized, symbol-based, monophonic generation of harmony-constrained jazz improvisations. To tackle this objective, they introduced a pipeline consisting of several steps: supervised learning using a corpus of solos (a language model), high-resolution user preference metric learning, and optimized generation using planning (beam search). The corpus consisted of hundreds of original jazz solos performed by saxophone giants including Charlie Parker, Stan Getz, Sonny Stitt, and Dexter Gordon. They presented an extensive empirical study in which they applied this pipeline to extract individual models as implicitly defined by several human listeners. This approach enables an objective examination of subjective personalized models whose performance is quantifiable.
A plagiarism analysis was also performed to ensure that the generated solos are genuine rather than a concatenation of phrases previously seen in the corpus. “While our computer-generated solos are locally coherent and often interesting or pleasing, they lack the qualities of professional jazz solos related to general structure such as motif development and variations,” said the authors.
Prof. El-Yaniv hopes to overcome this challenge in future research. Preliminary models based on a smaller dataset were substantially weaker, and it is possible that a larger dataset would make a substantially better model. In order to obtain such a large corpus, it might be necessary to abandon the symbolic approach and rely on audio recordings which can be gathered in much larger quantities.
“Perhaps one of the main bottlenecks in AI art generation, including jazz improvisation, is how to evaluate quality meaningfully. Our work emphasizes the need to develop effective methodologies and techniques to extract and distill noisy human feedback that will be required for effective quality evaluation of personalized AI art. Such techniques are key to developing many cool applications,” noted Prof. El-Yaniv.
Click here for the paper
Click here for Audio Samples