A primer on generative machine learning models for music production

Tribe

Could a software engineer with a background in artificial intelligence and no theoretical or practical knowledge about music make an album? This was the challenge that led me into a multi-year exploration into computational creativity. During that process, I released an album and learned a lot about the field of machine learning for music production.

In this article, I’ll cover some recent advances in AI for music production with a focus on tools that you can incorporate into your music-making workflow today. For readers interested in the theoretical side of things, I’ll cover a few pointers for additional resources at the end.

Music Composition

When people think about AI and music, they invariably think about a future where machines are performers who can auto-magically create whole music compositions. Those machines will make original songs – using new concepts and ideas – personalized to sound good to you, the listener. While this is the holy grail of the field, it’s still very far from reality. But we are getting closer.

Deepmind’s MuseNet is probably the closest thing that exists to the vision of fully AI-driven music creation. It’s a deep learning model that can generate MIDI songs with 10 different instruments in many different styles. The most distinguishing characteristic of this project is that instruments play well together: drum kicks will hit in synchronicity with the bass line and pianos complement your strings. Their main models are biased toward classical-sounding music, so they tend to create powerful orchestral music. However, with some tweaking and searching, you can find music in any style.

This song was made in a single MuseNet session in less than one hour, using Serum presets. Notice the complete switch mid-song while keeping concordance with the previous musical structure.

Although the project has not yet been released as an open-source or commercial product, DeepMind gives a demo in their blog post that you can use to experiment with their model. This demo is minimal, but luckily some people were able to create UIs that provide access to the full spectrum of options in the model. My favorite is MuseTree, an interactive system that allows you to generate songs iteratively.

Google’s Magenta Studio, while older and more limited, provides tools for the generation, continuation, and interpolation of simple melodies and drum beats. Those tools are available as standalone apps or Ableton Live plugins that you can easily incorporate into your music-making workflow.

On the commercial side, one of the most compelling products is AIVA. It can create full-length songs in several music styles with many instrument options. Songs can be edited and tweaked directly on their tool and later exported as MIDI or audio using their preset-sounding synths. There’s a free plan, so it's a good option if you want a low-commitment way to get started.

Now, I couldn’t finish a section on music composition without touching on the most ambitious project in this area: DeepMind’s Jukebox. This project skips MIDI generation to produce actual music that you can listen to. The results are still very far from being enjoyable without a lot of experimentation and tweaking. However, as it is, you can still use this tool to generate interesting musical ideas.

Sampling & Remixing

Have you ever imagined being able to sample those sick vocals from your favorite Miley Cyrus song? Me neither 🙂 But the technology is out there today, it’s pretty cool, and it’s not constrained to Miley Cyrus or even vocals.

Recent advances in Sound Source Separation have generated some impressive open-source projects. You can take your favorite samples and extract drums, bassline, vocals, and more, almost as if you had access to the original master stems. The current problem lies with that “almost.” Separations generally have noticeable artifacts, like bits of vocals on the snares of your drums. Those artifacts are typically hard to repair, but if you can wrap your mind around them, they actually give a lot of character to your samples.

UMXL separations of Emilíana Torrini’s Unemployed In Summertime

Currently, the three best projects in this field are Spleeter from Deezer Research, Demucs from FB Research, and OpenUnmix from INRIA/Sony. All provide trained models that you can use directly in your code or through command-line utilities. Fortunately, there are also some UIs available, such as Spleeter Web.

Some companies are developing proprietary source separation models, such as LALALA.AI. Their models are high quality and can also extract specific instruments, such as synths and guitars. Since it features a free plan, it’s definitely worth a try.

FX

In a world where it feels like every new Top40 melody is just a rehash of some song from last year’s Top 40, audio effects have gained traction as a way of making your music stand out from the crowd. In this new world of A E S T H E T I Cs, why not go full futuristic and use some AI-based FX plugins?

GuitarML is one of the most interesting projects in this area. It’s a community of developers trying to replicate the acoustics of some well-known physical guitar pedals using AI. The idea is simple: you present a model with examples of how a guitar sounds with and without the FX, and let the algorithm learn how to transform one into another. This model can later be used as an FX plugin that approximately replicates the original guitar pedal. Models are available as VST or even as Raspberry PIs to which you can connect your instruments directly.

Bassline from the previous separation passed through GuitarML’s Chameleon

Another interesting area is humanization: you can create a sentimental piano melody in your DAW using the most texturized synth preset ever, but it will still sound mechanical in some way. The intensity and micro-randomness of a piano player pressing the keys is hard to replicate in modern music-making software. 

As you can imagine, there are AI tools to give back that human vibe. For example, from the previously mentioned Google Magenta, we have Groove. Trained with drummer’s performances, it will adjust the timing and velocity of drum patterns to produce the “feel” of the original performances. VirtuosoNet takes a similar approach, but directioned towards music scores.

Science

For anyone interested in digging into the theoretical foundations of these tools or those with the technical skills that allow you to work with bleeding-edge open-source software, this section will give some pointers for future exploration.

Music Composition with Deep Learning: A Review” and “A Comprehensive Survey on Deep Music Generation” (arxiv) give a good overview of the current state of music composition.

The Music Demixing Challenge (MDX) is the best place to stay on top of developments in sound source separation. Every year, all the main research labs and startups compete here. If you want to get deep into the details of implementing an algorithm, this tutorial is an excellent starting point.

This list provides an extensive collection of open-source repositories for many tasks, from interactive piano composition to audio style transfer. It’s a fantastic resource, and by digging around, you’ll definitely find some exciting ideas. Dear reader, as we all know, you already are a fantastic musician, one of the most creative minds of our generation. I sincerely hope you can incorporate some of these tools into your music-making process on your path to stardom. See you at a future AI Song Contest?


Pedro Oliveira is a machine learning engineer and data scientist at Tribe AI. An expert in knowledge graphs, NLP, recommender engines, and computational creativity, he’s also doing research on the intersection of AI and electronic music production.

Related Stories

Applied AI

An Actionable Guide to Conversational AI for Customer Service

Applied AI

How to Evaluate Generative AI Opportunities – A Framework for VCs

Applied AI

Key Takeaways from Tribe AI’s LLM Hackathon

Applied AI

How 3 Companies Automated Manual Processes Using NLP

Applied AI

No labels are all you need – how to build NLP models using little to no annotated data

Applied AI

Understanding MLOps: Key Components, Benefits, and Risks

Applied AI

Key Generative AI Use Cases From 10 Industries

Applied AI

AI in Banking and Finance: Is It Worth The Risk? (TL;DR: Yes.)

Applied AI

AI in Private Equity: A Guide to Smarter Investing

Applied AI

How to build a highly effective data science program

Applied AI

10 ways to succeed at ML according to the data superstars

Applied AI

How the U.S. can accelerate AI adoption: Tribe AI + U.S. Department of State

Applied AI

How data science drives value for private equity from deal sourcing to post-investment data assets

Applied AI

Everything you need to know about generative AI

Applied AI

8 Ways AI for Healthcare Is Revolutionizing the Industry

Applied AI

A Deep Dive Into Machine Learning Consulting: Case Studies and FAQs

Applied AI

AI in Construction in 2023: Use Cases and Benefits

Applied AI

A Guide to AI in Insurance: Use Cases, Examples, and Statistics

Applied AI

Machine Learning in Healthcare: 7 real-world use cases

Applied AI

What the OpenAI Drama Taught us About Enterprise AI

Applied AI

Why do businesses fail at machine learning?

Get started with Tribe

Companies

Find the right AI experts for you

Talent

Join the top AI talent network

Close
Tribe