Stems for DJs and More: Here’s How A New Format Will Work


The most important thing to know about Stems, a new multitrack specification for audio, is that it’s simple by design. That simplicity means that it could really take off as a way of sharing music with multiple tracks, for DJing or live-remix applications.

Stems won’t solve every problem of file exchange and sharing. It’s not a multichannel spatialization format. It’s not a sophisticated project format for storing metadata. I say that, because after we covered Stems at the beginning of this week, I found my inbox flooded with every use case for every file format imaginable, and complaints that Stems didn’t solve them. Some went as far as to get into video.

I get it: you have problems in search of solutions. Just be aware, solving every use case imaginable gets complicated fast. Take the industry standard on which Stems is based – MPEG-4. Covering everything from codecs to files, video to audio, the MPEG-4 spec has 31 parts and 15 levels, each containing still more specs inside the specs. There are international trade treaties that are simpler. And to anyone saying that there are already standards for complex project interchange and sophisticated multichannel audio – you’re absolutely right. That’s why Stems isn’t trying to be any of those things.

Instead, Stems is really a format for releasing music, and it’s intended to be as simple as possible.

Following the announcement of Stems, it seems there was some misinformation about the specs of the format. Some of this was simply technically wrong – like a report that Stems uses “MP3 files” (it uses AAC-encoded audio), or doesn’t support lossless audio (it does). And a lot of people tried to read into the future of Traktor – that’s fair, but it misses part of the point of Stems, which is to try to bring other developers onboard.

Since at least some of those developers are reading CDM, alongside producers and DJs, let’s take a look.

The “Stems” file itself is an MP4 container. Careful here – people use “MP4″ interchangeably to refer to file formats and encoding, and they’re not the same thing. When you see a Stems file, you’ll see a single MP4 container – think of a box that can have different stuff in it. Technically, we’re talking MPEG-4 layer 14, but what’s important about this is that any software or hardware that can read an MP4 file can read a Stems file, and play it just like a normal stereo track. That includes iTunes or a CDJ, for example.

As I wrote last week, Stems also uses ID3 for adding metadata. That means the track itself can have all the usual cover art and bpm information and so on, but also each individual stem can also be titled so you know what it is. On controllers with displays, this means a DJ/performer can see what they are, as well.

Saying something is a “container” doesn’t say how the file is encoded or what’s in it. Let’s look at that separately:

“Stems” includes five separate stereo tracks – four stems and one stereo mixdown. Remember, the idea is to have individual parts for your track. So “Stems” specifies four parts, each one stereo. (Note: stereo, not mono – this is 4 x 2 channels each.)

There is additionally a stereo mixdown – this is your normal stereo master, in other words. Let’s assume you’ve set up a track with a bass line, drums, synth lead, and vocals. You would bounce each of those to a separate stereo track, and additionally export the complete mix as you usually would.

Which you hear depends on what you’re using for playback. For software without Stems support, you’ll simply hear the stereo mixdown.

Software (and, possibly, hardware) with Stems support will mix the four parts together. The user will then have control over the level of the parts. That’s why the simplicity of four stereo tracks is necessary: DJ software can always count on Stems tracks to have the same four-part arrangement, and so can create a consistent interface. (I imagine some Stems-compatible software might also give a user a choice of whether to use the individual stems or ignore them.)

(CC-BY-SA) jf1234. That's Russia's DJ Artyom

(CC-BY-SA) jf1234. That’s Russia’s DJ Artyom

You’ll master your tracks as you always did – with some added work. Since there’s a separate stereo track, you’ll master Stems tracks for stereo the way you always have. And that will almost certainly involve some processing on the master bus (or the stereo mixdown file you’ve given to a mastering engineer).

For the individual parts, however, you’ll apply dynamics processing and the like individually. Obviously, you’ll want the mix of those parts to be in the dynamic range you want, with each also sounding good on its own. There was a lot of discussion of this, but it’s not a huge task; a mastering engineer ought to be able to handle it if you can’t yourself. That’s another reason to keep this to four tracks; the process is more manageable.

The default encoding is 256k VBR AAC. The encoding for Stems is the same audio compression as iTunes Plus: 256kbps variable bit rate AAC. The idea is to get high quality sound, with optimized file sizes.

Remember, you have five stereo tracks, not just one, so file size is important. (Variable bit rate means that you get that file size as small as possible without compromising quality.)

Frankly, I think 256k AAC is just fine for listening and DJing, even on club systems. That’s part of why I was critical of the claims made by the new Tidal streaming service – and as many of you found, it’s very, very hard to tell any difference. Given DJs want to carry large libraries with them, this format seems optimal for the situation.

Also, because each stem is encoded separately, NI pointed out to me that the difference should be even less noticeable than it would for just a stereo bounce (assuming mastering the stems has gone well). That makes some sense; it’ll actually be fun to play with these. (This week has left me with the overwhelming impression that we should schedule some blind listening tests. Hmm, Funktion One will have a room at Messe, I know…)

Lossless is an option. Stems allows for lossless in the spec. Now, having admitted that you probably can’t hear the difference in most listening use cases, I think this is useful for another reason – it means you could conceivably use Stems as a format for simple file interchange. In those cases, you might want the lossless format – not because you’ll be able to hear the difference, necessarily, but because you may want to retain full lossless content if a file will be fed through additional processing.

Let’s say you have a drum machine on the iPad. You could export a four-track, lossless project to work on further on your studio machine. It’s lossless, but it’s still smaller than an uncompressed PCM stream like WAV/AIFF – with exactly the same audio quality when played back. (That means it also takes up less space on your tablet.)

I’m saying this out loud as I hope some developers are listening.

Use cases:

First, let’s thing about how important simplicity is in the things we already do. How many times was the easiest way to share music simply to bounce to an AAC MP4, then upload to SoundCloud or WeTransfer or Dropbox? How many times was the best way to get a remix done of a new track just sharing four stereo stems? Or mastering with nothing other than a WAV bounce?

Keeping Stems to just four stereo tracks I think is key. It forces the producer of the track to think through what parts are essential, which are logical to group together. That’s relevant if you’re making Stems only for yourself. It also means you can have consistent hardware controller mappings or consistent software interfaces.

This opens up a number of interesting possible applications:

Creative DJing with stems. Yes, bad mash-ups are one use case. But the format also allows you take apart tracks more creatively. In genres like techno, I think that will allow the use of more tracks as “building blocks” – including with hybrid live sets. In fact, I hope this stops a somewhat disturbing trend of releasing techno tracks that already sound like stems. (Ugh.) This can also couple with features like Traktor’s Remix Decks, obviously, but why not in Serato or on a CDJ, too?

Use of recognizable hooks. Another DJ friend pointed this out to me, while I was musing about subtle drum machine combinations. Outside genres like techno and more generally in DJing, this will clearly be a way to delight crowds with some recognizable bits of popular tracks – if those tracks embrace Stems, anyway.

Mobile remix apps. As I wrote before, this also opens up some new possibilities for, say, a label app that lets listeners remix tracks on their own. It could even allow some clever, standard means of messing about with track stems in games.

Unique live solutions. I like the idea of this for more left-field possibilities, too. For instance, I’ve lately been building environments in Pure Data that let me remix and “DJ” from my own tracks alongside live elements. Now, one problem has been the absence of a consistent way of exporting… well, stems. So I can absolutely see this as being a mechanism of exporting from a production environment to the live environment, and taking that on mobile. (Custom Raspberry Pi performance hardware? Sure.) Obviously, these won’t be widespread applications, but they’ll be brilliant hackday fodder, and some of us will find ways of using them.

What’s next? Well, now we wait. I’m told command-line and graphical tools for producing the formats are coming, in advance of a site going up in June.

Naturally, CDM will cover this both from the end user perspective – for producers and DJs – and the developer side, as well.

If all goes to plan, it isn't just for this. (CC-BY) magnetismus.

If all goes to plan, it isn’t just for this. (CC-BY) magnetismus.

The post Stems for DJs and More: Here’s How A New Format Will Work appeared first on Create Digital Music.

You Might Also Like