How Audio-over-IP Evolved

David Meyer | Nov 16, 2022
Audio Over IPWhenever I embark on researching or writing about a given technology, I find myself starting by exploring its history, reflecting on how things were and how we got where we are. Perhaps that’s just the untamable geek in me, but I feel it really helps in providing grounding and context, also helping to explain many key terms and concepts. That was very much the case during the writing of the recent CEDIA white paper, Designing Timing-Aware Networks for AV, which focusses on audio networking, or audio-over-IP. 

There was a time when we used to talk about a coming convergence of AV and IT. I clearly remember the sun-filled day, circa 2010, I was sitting in an iconic Sydney pub with an Australian technology journalist when he said: “Everything is moving to IP.” I must acknowledge the foresight of that simple statement that now seems obvious, and can’t help but wonder if even he realizes just how right he would be? IP underpins so many technologies now, very much including audio, with or without accompanying video. Professional audio has largely transitioned to IP distribution, SMPTE has transitioned post-production workflows to IP distribution with their ST 2110 suite of standards, and both audio- and AV-over-IP have grown in prominence in the home. And need I mention the extraordinary proliferation of streaming media from the internet? Mike drop. 

The concept of using Internet Protocol (IP) to stream audio signals can be traced back as far as 1979 when the Internet Stream Protocol was first proposed by James Forgie from MIT Lincoln Laboratory. But it wasn’t until the 1990 release of the “experimental” version 2 of the protocol in the RFC 1190 standard that things really starting cooking. By then, Internet Protocol version 4 (IPv4) was already well established, best known for giving us the IP address structure that all home networks still use today. Then there’s IPv6 with its hexadecimal addressing with so many combinations that every human cell on Earth could each have its own address… with plenty to spare!  

But did you ever wonder what happened to IPv5?  

Well, IPv5 was essentially the abovementioned Internet Stream Protocol v2, proposing the use of SP packets as an extension to IP packets, envisioning voice-over-IP (VoIP) as the main application. As networking technology further developed through the 90s to improve reliability and reduce data collisions and latency, it eventuated that such streaming could be managed over a regular network using IP. As such, IPv5 and its separate SP packets became redundant, relegated to the annals of obsolete 20th century technology.  

As we entered the 2000s, it became apparent that there was great potential to use the IP network to distribute audio signals well beyond just voice — full frequency, high resolution, multiple channels or tracks (such as for separate instruments and vocals). But that’s where things actually started to diverge. Standards were progressively developed to bring timing awareness to networks for real-time applications, but not necessarily in time — excusing the pun — for manufacturers to jump on board. Some leapt ahead of the curve and instead developed proprietary approaches and protocols to manage the packetization and synchronization of audio data, being the essence of audio-over-IP.  

That’s essentially where we still are today, with the market fragmented to some degree between audio-over-IP approaches that harness embedded IEEE standards-based network technologies and timings, namely Audio-Video Bridging (AVB) and Time Sensitive Networking (TSN), versus proprietary tech such as Dante, with both thrown into a mixing pot of other standards including AES67. It’s this divergence (with some overlap) that the white paper explores, along with what it means for integrators in designing networks for AV distribution. 

Part of understanding any technology is getting your head around the terminology. Audio-over-IP and audio networking are quite self-explanatory — they’re simply two ways to describe the same thing: the conversion of audio signals into data packets that are compatible for sending over an IP network. Add video and the terms change to AV-over-IP (or AVoIP for short), media-over-IP, or even HDMI-over-IP. The one thing you generally won’t see is video-over-IP because that would shorten to VoIP, which circles back to the acronym for where it all started: voice-over-IP. 

To complement the white paper and extend the conversation, I’ll personally be moderating a panel discussion at CEDIA Expo 2022 titled Deathmatch: Audio Networking/Audio-over-IP Technologies. The panel of experts will discuss the opposing approaches of AVB/TSN versus Dante and others, while also considering things like audio performance and how broad standards such as AES67 fit in. Spoiler alert, this won’t be a winner-takes-all gladiatorial spectacle, but it should be a really informative debate about the different approaches, what’s needed in network design and devices to help integrators in navigating the options for best outcomes.   

Access the Designing Timing-Aware Networks for AV white paper at cedia.net/whitepapers. It’s FREE for members!