Category Archives: Audio Processing and CODECS

The idea behind perceptual CODECs

Question: If you were to drop a rock and a tiny pin onto a table, and both land at the same time…do they both make a sound?

Yes they do.

Can you hear them both?  Probably not…


Logic would dictate the reason why is because the sound of the rock is so loud that it would “cover” those of the tiny pin.  This concept is known as masking.  The really loud sound of the loud rock masked the tiny sound that the pin made.


 The Rock And The Pin



This is a nice introduction to how perceptual audio coding works.  In order to make things like Internet Radio and iPods possible, we need some technology to make normally huge digital audio files smaller.

How is this done?  It’s pretty complicated, but really interesting once you get the hang of it.  Don’t worry. I’ll just explain it in really simple form.  For those wanting to know exactly what’s happening, you’ll have to do some searching, or wait for me to get around to writing about it! <evil grin>

A linear audio file, say the raw data from a CD, is quite huge.  It contains a LOT of audio information.  The premise behind encoders such as AAC and MP3 is that most of those sounds we don’t even notice because we are so focused on what’s going on in the foreground (the loudest sounds in the recording).  So these “unnoticeable” sounds are removed, and the “hole” left behind is partially covered over by the loud sounds in the recording, and by the CODECs (enCOder / DECoder) ability to smooth over what you might otherwise hear when masking is insufficient.

The techniques used by .mp3 and AAC are quite good, and the vast majority of people run around totally unaware that they are only listening to 10-20% of what was originally in the linear digital sound files!

Unlike Gunzip or Winzip files, the audio encoding process when using AAC and MP3 is destructive, and permanent!  This means that there is no way to get back to the original full quality using the encoded copy.

There are encoding methods out there (such as flac) that create an additional file that contains a description of what was removed to get the file size down, and it uses this file to reconstruct the audio to its full glory from the encoded copy if the end user so chooses to go that route.  This also assumes the end user still has this descriptor file handy!

While the perceptual codec method works fairly well, it isn’t perfect by any means.  The process of destructive perceptual coding creates unintended side effects to the recovered audio.  These side-effects are commonly called “coding artifacts”.  This is why radio stations whose music library is composed mostly of Mp2, MP3’s and even AAC file sources have a challenge in sounding their best on the dial.

For audible examples of these artifacts, check out the “Listening for coding artifacts” article.

(Photo by Barry Mishkind)

Audio Processing for HD Radio – Pt4

Part 4

By Cornelius Gould

The past two installments we have been focusing closely on some important operational functions of a typical modern day perceptual codec. The point of the previous articles was meant to get a better understanding of the delicate balancing act that must happen for these codecs to work in the first place.

This week, we bring audio processing back into the picture. We will investigate why it is really hard to use tried and true methods of audio processing for analog broadcasting in a new bit reduced world.

Let us start by examining why the use of a peak clipper is not a good idea for peak level control in a perceptual codec. In an analog broadcast environment, a device known as a peak clipper is typically employed for peak modulation control. Any momentary “program spikes” in the audio wave forms that can cause the transmitter modulation to exceed 100% in FM (or 125% positive, and 99% negative in AM broadcast) is simply “chopped off”. This “chopping” creates distortion, so most well designed modern audio processors employ distortion management techniques to hide this distortion from our ears.

One of the side effects of using peak limiting clippers is that you inevitably create lots of rich harmonics. These harmonics can cause a broadcast station to exceed their legal spectrum mask (the amount of space a station takes up on the radio band), so special filters are employed to keep stations nice and legal. On the surface, this seems like it would be OK for coded digital audio use too…but this isn’t the case.

The audio bandwidth of analog broadcast audio from any of the traditional broadcast services can fit nicely within the confines of a 44.1 kHz digital signal and sound no different than the original analog broadcast signal when played back. When perceptual coding is added to the digital mix, all kinds of strange things start to happen.

Let us assume for the sake of this article that the perceptual codec in our examples are using 44.1 kHz as the sample rate.

Let us go back a little bit, and take another look at the issue of harmonics caused by the use of clippers as peak limiters again. While strict filtering can be applied to limit the audio bandwidth to 10 or 15 kHz (well within the capabilities of a 44.1 kHz sample rate),
one issue stays behind that causes poor performance from the perceptual coder. That is the radically transformed harmonic makeup of audio components due to clipping. This effect can be easily heard. It is the signature sound we have come to expect as the “sound of radio”. For the past 30 years or so, we have become accustomed to the less than detailed sound of radio high end. In this “radio sound”, recordings of cymbals sound like repetitive “sippy sounds”, and the natural “crack” sound in a snare drum recording is transformed to something that sounds more like a steam locomotive chugging up a hill – chugging at the tempo of the song.

The reason for these odd characteristics is mainly due to the buildup of loudness over the past 25 or more years. Loudness in recent years is determined by the amount of clipping applied to the processed audio signal. The more clipping, the louder you sound on the dial. Along with this loudness comes more distortion.

Illustration A
Illustration B

Caption: Clipping is when audio that exceeds a pre-determined level is simply “chopped off” to maintain absolute level control. Illustration “A” shows a tone with no clipping. Illustration “B” is the same tone, but with clipping to maintain the absolute signal level.

While the out of band distortion components are removed by filtering the clipped audio, and most of the harmonic content we think of as the sound of distortion is hidden by special processing algorithms within the processor, there is still a lot of actual distortion left behind. It is this remaining distortion that gives us the illusion of loudness.


A 6 kHz tone without any clipping.


When you clip a signal, you create a lot of harmonic content that wasn’t originally part of the recording.

While harmonics outside the target bandwidth of broadcast audio are removed, there is still an awful lot of “garbage” within the desired audio. This added content is referred to as “spectral spreading”. What was once a simple set of pure tones to make up a sound becomes a dizzying array of overtones spreading out over a large chunk of audio spectrum.

The reason perceptual codecs have such a hard time with this material is because it makes it very difficult to find audio elements to remove without seriously altering the audio for the worse, but it still MUST remove a large amount of data to meet the target bit rate. What this means is instead of judiciously removing audio most of us would not perceive, the perceptual codec has to discard large amounts of data without prejudice, and that is not a pretty sound. It almost literally has to throw the baby out with the bath water!

Another thorny issue is the AM and FM pre-emphasis curve. Pre-emphasis boosts the extreme upper end of the audio spectrum. This is necessary for proper operation on the analog broadcast service. This pre-emphasis is employed before the clippers, so the spectral spreading issue is much more pronounced at higher audio frequencies.

As we learned, perceptual coding algorithms are a delicate balancing act to achieve audio that sounds normal to most people while possibly removing large amounts (if not most) of the original digital audio content. For HD Radio, we are indeed removing most of the original data, and leaving behind a reconstructed facsimile of the original digital audio content. This reconstructed audio is built up of tiny bits of data and a lot of “smoothing over” at the decoder end. The more channels we force out of the system, the more data we are asking the perceptual codec system to remove from the audio. This causes the handling of audio by the station processing to be come more and more critical.

Because of this, there is little (if any room) wiggle-room in terms of efficient use of what few bits are available to reproduce what sounds like full range digital audio.

Not all clipping is bad. If we were to clip the low frequency area of the audio spectrum (bass), most of its harmonic content stays in the area that is easy for a codec to reproduce.

A light amount of filtering can be added to the output of the low frequency clipper to remove harmonics that gives the “raspy” or “blown woofer” sound is not only pleasant to the ear, but also makes things even easier for the codec. With today’s recordings a bass clipper has been a useful tool used by processing manufacturers to preserve the sound of today’s bass heavy recordings while maintaining absolute control over bass levels.

It is for these reasons that the best processing designs aimed at digital broadcasting are able to keep these bass clippers as part of their signal topology.

Since we cannot use clippers for overall peak control, audio processors designated for digital broadcast are designed using these days is a device called a look-ahead limiter.

Look-ahead limiters operate by watching audio levels going into the limiter, and with the clever use of a delay line, it can predict the peaks before they make it to the output.

When a peak is about to occur, the look-ahead limiter almost instantaneously lowers audio levels to accommodate the peak, and returning the audio to the previous level just about as quickly.

More advances in the processing art form

For a good part of 10 years now, anyone involved with audio processing and digital broadcasting (including myself) has dreamed of a solution where the audio processor and the perceptual codec are merged into one to provide optimum codec performance. This concept is similar to the one that led Bob Orban to design the Optimod 8000 back in the early 70’s, where the FM stereo generator is intergrated with the overall audio processing design. This provided a quantum leap in loudness and FM stereo performance when it was introduced to the market place.

While this kind of tight integration of the audio processor and codec has not happened yet, many of us in this area have not been held back. Actually, now that a digital broadcast chain can be designed with pure digital connections in and out of everything, we are able to get as close as we will probably ever get to the integrated processor / perceptual coder concept.

The latest research by all in this particular area of audio processing involves some form of “codec conditioning”.

Imagine if you will, an audio processing technology that can assist the codec in choosing the best material to remove to make its operation more efficient. Codec conditioning can be something as simple as user defined lowpass filters to help remove troublesome audio components in the program audio to much more complicated processes that can do so much more.

In the past 15 years or so since the first perceptual codecs went into widespread use in broadcating, we have not only seen coding technology evolve in ways we can only dream of in terms of quality vs. data size, but we are also all witnesses to the next chapter of audio processing technology meant to deal with mixing audio processing and coded audio. And it’s all happening right in front of our eyes now!


Audio Processing for HD Radio – Pt3

Part 3 – Masking the Dropped Bits
By Cornelius Gould

Getting the best sounding audio from digital transmission requires learning some new techniques. Cornelius Gould continues laying the groundwork for understanding the new generation of audio processors.

Audio coding technology has come a long way from the simplest form of bit reducing technology, as represented by Huffman coding. Of course, once we have a general idea behind how it works, we can take our understanding to the next level.


In order to make a useful audio data-size reduction system, we need to have some pretty sophisticated algorithms in addition to Huffman Coding to really get somewhere. The first step towards an effective audio bit-rate reducing codec is to break the audio into smaller components.

If you split the audio spectrum into a bunch of bands, you can analyze the individual bands and their relationship with each other, and obtain a tremendous amount of useful information. The algorithm used to do this frequency splitting function is called a “filter bank.”

On modern codecs, such as AAC, the filter bank splits the audio into about 500 bands for analysis.



The meaningful analysis of data from the filter bank can determine the amount of spectral masking needed to reduce the size of a digital audio file or stream.

To explain this portion of the discussion, let us draw some visual parallels. For example, on an average day while driving down the road, your eyes are “seeing” a lot of things. However, what we actually (mentally) see is an extremely filtered view of the world.

Unless there is something in the visual landscape that changes just enough to take our attention away from whatever we are concentrating on, we will never realize it is there. Our brains will simply throw away this extra “data” since it has little, if any, relevance to our survival at the moment.

It is this process of weighing the relevance of various stimuli from moment to moment that is the key to taking advantage of how much we can reduce the required amount of data in an audio (or video) file before it becomes just enough for us to notice that something is “not right.”

This “just enough” point is the threshold of our sensory perceptual mask. The same thing is true with audio. Frequencies can be easily masked by our auditory system.


Here is an example to keep in mind as we move into the explanation of frequency (auditory) masking.

If two tones really close in frequency are playing at the same time, and are equal in level, we hear a combination of the two. If one of the tones is decreased just slightly in level, we will only hear the louder tone, and no trace of the quieter one will be heard.

If you have the equipment and know how to conduct the above experiment, try it. It is a very interesting exercise in auditory masking!

This tone example I have described is the core concept behind how to use audio masking techniques to reduce the amount of data in an audio file. The filter bank analysis algorithm within the bit-reduction codec looks at the spectral content, decides where the “just noticeable difference threshold” is for its human subjects, and operates just above that.

This is accomplished mainly in two different steps.


The first step is temporal masking. This is where audio levels across the frequency domain are analyzed for the loudest and quietest elements in the program material.

If a loud signal and a really quiet one arrive really close in time with each other, chances are, we will not even perceive the quiet sound. The coding algorithm is written to assume this, so that the quiet sound is removed from the data.

Whether or not that little sound can be successfully removed depends on the quiet sound having a similar frequency makeup as the louder one or the louder sound having enough spectral energy in the frequency range of the smaller one to mask it. (You should notice that this is our tone experiment used in a more complicated situation.)



A quick glance at our handy illustration here, you will see that we have provided a visual representation of frequency and temporal masking. The red line represents sounds at really high frequencies (treble region) while the brown and blue bars represent sounds at lower (near bass) region. The blue bar is louder than the brown one, therefore sounds represented by the blue bar will mask those of the much quieter brown one.


A thing to remember is, in reality, there is not any masking going on. The masking is all in our heads, so to speak. The “auditory algorithm” in our brains assumes that the louder sound is more important than the quieter one, so we never “hear” the little sound.

To remove even more data, bit-reduction codec schemes will typically combine various data reduction algorithms.

Random noises, such as electronic hiss in recordings, are used in interesting ways. First of all, most of it can be removed and regenerated in the decoder portion of the system as random noise that falls in the frequency spectrum of interest. Locally generated noise is also useful in “smoothing over” some of the rough edges caused by the removal of so much audio data by the encoder.

This little technique also has the advantage of making a contribution to the reduction in the overall size of the digital audio data.


Now let us see how we can make Huffman Coding more effective for digital audio bit reduction.

One option is to add Huffman Coding to the “outputs” of the individual frequency bands of the filter bank after noise, frequency and temporal masking is used to remove audio that we would not normally hear.

Let us go back to last month’s example of an audio file with a 60 Hertz hum in it. As you recall, I mentioned how Huffman coding can be used to eliminate the need to reproduce the 60 cycle hum in the background of the recording, thus removing a large amount of data. I then mentioned that just sending that entire audio clip through Huffman coding would do little, if any good without something more being done.

In this case, the low frequency filter banks (after all the masking reduction is performed) would only contain components of the 60 cycle hum. The Huffman coding algorithm can be presented with only this information, and is now free to do what it does best – remove the repetition and replace it with a much simpler descriptor.

A modified form of the Huffman Coding concept is used in a new breed of High Efficiency codecs. These codecs include HEAAC (AAC Plus), MP3 Pro, and the codec used in HD Radio (which I suspect is at least a close relative to HEAAC).


Before we move on, I should point out something about perceptual codecs.

Sometimes, it is best to run perceptual codecs at a sample rate lower than the source. The advantage of running codecs at lower sample rates is that it greatly reduces the amount of potential “artifacting” in the decoded audio, but at the cost of lesser clarity.

This is why when using a standard mp3 codec, for example, dropping the sample rate from 48 or 44.1 kHz to 32 kHz sometimes provides the best quality for music. The drop in quality is minor, but the gains can be pretty dramatic.

With that out of the way, let us get back to this new technology.


A relatively new coding scheme has been developed by Coding Technologies to meld two encoding processes. This scheme is called Spectral Band Replication (SBR).

SBR is basically used in this way: the main codec (I like to think of it as the “base codec”) is run at half the normal sample rate of the desired target. For example, if you want to reproduce full 44.1 kHz digital audio (this is standard “CD Quality” audio), you would run the actual sample rate of the base codec at 22.05 kHz.

This will give you quality a little bit better than NRSC AM audio bandwidth from the codec with better overall codec performance. SBR technology is then used to recreate the energy lost by the lower sample rate.


The first step in the SBR process is to look at the high frequency makeup of the audio material on the encoder side of the bit-reduction system and create a series of simple descriptors, (or cues) for use later in the decoder. This simplified data is sent out on a data stream embedded into the encoded digital audio stream.

Then the decoder looks at the spectral content of the fundamental tones in the high frequency area of the base codec and combines this data with the cues sent by the SBR encoder to refine the process.

Random noises present in the high frequency area are regenerated by the decoder and are also used to “smooth over” the overall regenerated high frequency content.

At this point, I think you may be beginning to see how the incorrect choice of audio processing can upset the delicate balance achieved by these perceptual codecs and provide a less than stellar example of the capabilities of HD Radio and web streaming to your listening audience.

This is where we will pick things up in next month’s installment!


– – –

Listening for Coding Artifacts

Listening for coding artifacts
Written by Cornelius Gould
When you hear audio engineers talk about coded audio, inevitably you’ll hear us talk about “coding artifacts” in one way or another.What are “coding artifacts”, and what exactly is it that we are hearing?Listening for artifacts involves knowing where to listen.  In this case, one must listen int he background.  It is also easier to hear artifacts if you have a non-coded version of the audio compare with.

Let’s turn to our examples.

No coding (Linear)

Coded Audio

Still can’t hear it?  One way to hear what the trained ear can pick up in the “coded” version of the song, you can take the file, take out all mono information.  What you have left behind is audio that is *not* mono.   In technical terms, this audio is known as the difference between left and right.  Also referred to as “Left Minus Right” (L-R).  It is called so because you are subtracting the left channel from the right channel.  This cancels out all mono information.  This is how basic karaoke boom-boxes work.  Typically when songs are recorded, the lead singer’s voice is in the middle of the left and right sound fields (or is in mono).  Generating L-R means that the lead vocal is one of several things in a typical stereo recording that is removed.   If you do this to mono audio, you’ll get no audio at all.

Anyway, since coding artifacts have no relationship with anything in the recording mix (because those artifacts weren’t originally part of the music) generating L-R on coded audio reveals these artifacts “as plain as day” for all to hear.

Here’s the L-R mix of our song with *no* coding.

L-R Linear:

Here’s the same snippet, but this time the audio is coded, and then turned into difference (L-R) audio

L-R coded:

Now, knowing what you are supposed to be hearing, if you listen in the background of our first two clips, you will eventually start to learn how to hear what those of us who work with coded audio closely can hear in these demonstrations.

Happy listening!

Audio Processing for HD Radio – Pt2

PROCESSING GUIDE (Radio Guide Magazine)

Audio Processing for HD Radio –

Part 2

By Cornelius Gould

Getting the best sounding audio from digital transmission requires learning some new techniques. Cornelius Gould continues laying the groundwork for understanding the new generation of audio processors.

We are living in a unique time in communications history. For better or worse, mass entertainment continues to change from what was to what will be – and most of these changes revolve around the word “digital.”


Virtually all “digital” entertainment media use bit-reduced or perceptual coding; such delivery to the general public will be a fact of life for radio and TV, as well as those new forms of mass communication that have yet to be invented.

It has been shown in studies that the use of perceptual coding is “acceptable” to the vast majority of the population simply because very few people pay attention to things they are not supposed to hear. And that is the trick behind how these systems work!

A basic understanding of how this all works is key to knowing how the best way to pre-condition your program audio for HD Radio and web streaming. To do this, we will look at some of the common components used in today’s audio bit-rate deduction schemes used such as HE-AAC, and MP3.

Of course, a complete discussion on these processes is quite complex and beyond the scope of this series. Our purpose here is not to be so much an “all encompassing guide to codecs” as to help anyone new to audio processing for coded audio to understand how to best use their processing tools to gain impressive results from HD Radio, web streams, and Podcasts.


Since is impossible to broadcast all of the digital information available in linear digital audio due to strict bandwidth restrictions, much of the data has to be selectively discarded in order to make it “fit” in a manner that is as transparent as possible to the majority of “listening ears.”

There are, of course, definite limitations as to how far you can take this data reduction idea. Anyone who has listened to a dialup quality Internet stream can attest to this!

Since the exact nature of the CODEC used for HD Radio is unknown due to the proprietary nature of the system, we will need to focus on a coding scheme that can best mirror the performance of HD Radio. This why I decided to focus on the HE-AAC codec, since its performance does indeed comes closest to what we get from HD radio. For simplicity’s sake, I will have to describe this all in what appears to be a step-by-step process. It is far from it.


The first and most basic step of almost all bit reduction schemes is something called Huffman coding. Welcome to our Department of Redundancy Department.

Originally developed by an MIT student in 1952, Huffman coding looks for repetitive information and replaces it with a much smaller, simpler “description.” The most common means of data reduction for generic computer data, and we all use Huffman coding every day in the form of “.zip” files.

As an example, consider a digital audio file that contains a 60 Hertz hum component at –20 dB. This hum in our recording never changes. It is just there in the background as a constant.

Now this hum can take up awful lot of repetitive data just to reproduce it digitally, especially if it is a really long file. What the Huffman coding scheme for digital audio basically brings to the table is the ability to replace all that with a simple descriptor.


What this descriptor does is to tell the decoder: “In the background of this entire file, there is a 60 Hertz waveform, you need to re-create this.” It would also relay to the decoder that this 60 Hertz hum is -20 dB down, and to keep generating the “hum audio” until told to stop.

The decoder would then generate the appropriate hum at the prescribed specifications as part of the background of our recording. The encoder does not necessarily send the actual “hum audio” data.

All that is sent in our case is a description of the “hum audio” for the decoder to regenerate locally. Bam! – An awful lot of the data is removed and we have now made the audio file much smaller.


Whenever I talk to anyone about bit-reduced audio, I inevitably fall back to describing bit-reduced video.

This is because our western society is so visually oriented there are lots of descriptive words for things we see, but very few to describe what we hear. In fact, most of us in western lands will typically notice strange things visually before we notice things aurally.

I can use this to advantage as an illustration in terms of bit-reduced audio vs. video, as there are a lot of parallels with video bit-reduction. Visually, I can describe an entire series of phenomena (including the exact visual parallels with audio) in a couple of paragraphs, and the majority of individuals will understand what I am talking about. On the other hand, I could write an entire book on just one coding phenomena using audio terms and very few would even have a clue as to what I am talking about even after reading the entire book!

Therefore I will describe how Huffman coding is used for video in a visual way and we can go on to draw parallels with audio from there.


A form of Huffman coding can be seen every day on digital cable services including the popular cable services over “dish” type systems.

In these systems, the need to give the consumer more and more channels has resulted in removing more and more data from all the existing channels to “squeeze” addition things into the existing bandwidth. The side effects of doing this are subtle to most people as they happen outside their normal realm of perception.

A JPEG picture with some artifacts from data reduction

This is also true for audio coding. It can be heard, but you need to know where and how to listen in order to hear the side-effects. More on that later. For now, to see Huffman Coding in action on video, it is simply a matter of knowing where to look. These systems are called perceptual coding for the way they use “kind-of a slight of hand trickery” to accomplish their goal.


In video services, its action can be observed best by watching background images. Most of what we see on TV is static, with only a small portion of the TV screen containing actual changing (moving) images. Since our brains are wired to pay attention to moving things, we will typically only concentrate on the moving parts of the picture.

For example, a person is talking on the TV screen. We, as humans, immediately focus on the fact that his mouth is moving, then read the person’s facial expressions and listen to the words to tell the rest of the story. Very few people even notice what is happening around the actor on the screen.

With this in mind, Huffman coding schemes for video is quite interesting. It all happens with managing background images. The Huffman Coding part of bit-reduced video encoder scheme communicates with the decoder in this way: “Here is the data that makes up the background of this scene. Paint it once, and keep repeating it until I send an update for it…”

The description conversation continues: “If the change in video is small, then I will only the data that makes up that specific moving image, and the portions of the background that need to be updated.” From there, the decoder makes the appropriate changes.


There are limitations here. For example, think about what happens to video when the entire screen has to change (update) rapidly, such as when the camera is panning around quickly, or is shaking around a lot. Just try to make out any kind of details in the pictures. You typically cannot. It is usually a jumble of blocky images and bright jagged pixel squares for any bright images that zip across the screen.

If you were to compare this to the original, you would most likely notice that the original does not have these annoying artifacts. It is highly likely that you could easily see all the images clearly on the original, even though there is a lot of camera movement. Now please remember this as we go along with our discussion on digital audio bit reduction.


I highlighted the word “artifacts” above. In the bit-reduced audio/visual world, artifacts refers to the side-effects incurred from throwing away so many bits, that what is left can no longer be reproduced in a transparent manner.

The same thing can be observed when taking images from your digital camera and, for example, manipulating these images to make the file size smaller for use on web site pages, or for e-mail.

The JPEG (.jpg) image format is the still picture equivalent to perceptual audio/video coding. You can remove amazing amount of digital data from a picture and have it still look the same – or at least very close to the original. Remove too much data, and the image starts to have strange looking things happen with color transitions, and with sharpness.

When large changes happen, the artifacts become very apparent

These “strange things” are the still-frame version of the audio/video artifacts I will be referring back to repeatedly in this series.

As you have seen, the two JPEG compressed images show a visual parallel to audio artifacts. The first picture shows the typical quality of an image ready to post on a website. The second is the same picture at the same size, but with more data compression, which results in lots of visual artifacts. The more data you force the bit-reducing algorithm to throw away, the harder it is for the decoder to hide what it is removing from the original.

Audio can also contain just as much “artifacting” resulting in many strange sounds and noises that were not part of the original recording.


Now, in the descriptions above I mentioned something very important. I stated that if the individuals directly compared a copy of the program material that was not encoded with the results of the encoded material, the changes would be more obvious.

With that in mind, let us return to our audio recording. In that example, what we described is an unrealistic picture to present to a Huffman encoder. Using Huffman coding alone on any audio material would not work very well.

The reason it would not work is because Huffman coding alone would not remove enough information to make any appreciable reduction in the amount of data needed to reproduce this audio file. This is why we cannot simply “zip” our audio on the fly to a decoder to “unzip it back to normal.”

Why is that? Because there are a lot of other things going on in the recording, such as the principal audio we are really trying to catch in the recording, room noises, etc. All of these other sounds are random in nature and do not lend themselves easily to Huffman coding technology. Other techniques will have to be exploited to make Huffman Encoding a more effective tool.

As we will see as we move along in our series, the audio must be broken up into smaller pieces, which allows the use of other data reduction tools (and in some cases, cascading these tools) to effectively remove enough data to create a smaller file while leaving a trail of vitally necessary “digital bread crumbs” behind for the decoder to reconstruct something that sounds pretty close to the original audio.


What we basically learned this time around is that what makes perceptual coding possible is the relationship between the bit-reducing COder and the DECoder (CODEC).

The Encoder’s job is to decide what information to throw away, what information to simplify, and what information to keep. The Decoder’s job is to take this information, and present it to the end user in a way that the processes used by the encoder is as inaudible as technically possible.

As we move deeper into what is going on between the encoder and decoder, it should become a bit clearer how to (and how not to) process your audio for bit reduced media.


Added note for my website: You can train your ears to hear some of these coding artifacts by going here:


Audio Processing for HD Radio – 1

Audio Processing for HD Radio –
Getting the best out of bit-rate reduced transmission systems.
(Originally in Radio Guide’s Processing Guide magazine)
Part 1 – Changes
By Cornelius Gould

2005 will be remembered as the year digital transmission exploded onto the scene. Many engineers worked tight schedules modifying existing transmitter sites or in some cases completely rebuilding them in order implement this new technology.

As with any new enhancement to the broadcast service, there is always some learning curve to overcome to get the most reliable results from this new service.

With HD Radio technology, this learning curve is pretty intense as it involves a completely separate transmission system which makes completely separate radio waves that exist around your main AM or FM signal.


Unlike previous advancements to the broadcast medium, this time around we are faced with a brand new technology whose inner workings are shrouded in mystery and some broadcast engineers are faced with a scenario where their best efforts fall short due to some mysterious process within their digital transmitters – and the resulting audio might not be very nice sounding.

Others are following the guidelines presented by audio processing manufacturers and are having good results, but they still want to have a better grasp as to what is going on “under the hood” to better understand the HD Radio beast they have to handle.

While driving around my section of the country with an HD Radio receiver, I find it interesting to listen to how HD Radio is being implemented by broadcasters. One thing that jumps out at me is how many systems that still need work in many basic areas.


The most common problem is a lack of diversity delay on the analog channel. This is most apparent when a listener with an HD Radio drives in and out of the optimum reception conditions for the digital carriers.

Essentially what happens is they find them selves jumping back by as much as eight seconds in time when the HD carriers are decoded, only to shoot ahead as many seconds when the radio rolls back to analog service. As a result the listener can totally miss entire sentences in a conversation.

Other issues involve audio quality and audio consistency.

I have heard stations with as much as a 12 dB level difference between the digital and analog services. Another annoyance comes from stations whose digital transmitters are fed with the clipped, pre-emphasized FM analog processed audio.

Even if the de-emphasis is turned on (to make it flat again), the resulting audio heard from the digital service is still be very unpleasant to the ear.


The above examples show there is a lot that needs to be learned by the broadcast engineering community about this technology. If this system is to be successful, proper adjustment and implementation is essential.

Of course, I am aware there are many people out there who feel HD Radio should not have been allowed to be used due to its use of the spectrum within what has been traditionally considered the “guard bands” of AM and FM signals. While there is considerable debate as to the validity of many aspects of this new service, I will not be addressing these issues.

The point of this series is not to convince anyone to change their opinions on the validity of this system one way or another, but rather to help point broadcasters in the right direction to get the best audio performance from what we have to work with today. Since my specialty is audio processing, naturally my focus is in that area.

I will pick apart what is going on (as best as anyone outside the iron gates of Ibiquity can) and with these tips my hope is that you will be able to get the best audio performance on your digital transmission system from understanding both the audio processing and the system in general.


Broadcasting with HD Radio technology is an entirely different beast. From a technology point of view, it shares almost nothing in common with the legacy broadcasting technology with which we have become accustomed.

The biggest difference – and the hardest concept for many broadcasters to grasp – is that HD Radio is not a “linear” transmission system.

Analog broadcasting can be thought of as a linear process. That is, every sound that leaves the audio processor and enters the transmitter will be sent over the air with very little change.

On the other hand, HD Radio is not a linear process. Only a portion of the audio you feed into the HD Radio system actually makes it to your listeners. The art of deleting large amounts of audio data while preventing the human ear from “hearing” it – for the most part – is called “Perceptual Coding.”

In fact, most of the audio data is thrown away and, through some neat ear trickery and the proper use of technology, very few people will ever know!


What we are describing here is not a difference between digital audio and analog audio. Digital audio can be linear too.

Analog audio is given its name because the entire process by which it works is by literally electrically copying sound waveforms onto some medium, making a literal copy of the sound image onto the medium of choice.

For our discussion here, this medium is a radio wave. As the sounds from the mouths of your announcers strike a microphone, their voice is instantly turned into an electrical signal which travels through your audio chain to change the radio signal directly in proportion to the sound at the studio microphone.

In the case of digital audio, the sounds of your announcers are still picked up by a microphone and the electrical signals are turned into digital data. This is done in a device known as an Analog to Digital converter. (The reverse happens in a Digital to Analog converter.)


There is one major problem with the basic concept of such a linear digital transmission system: assuming the process is meant to be of “CD Quality” – whatever that is – we find the system takes an enormous amount of data to accomplish its task when compared to the analog system.

By way of comparison, the analog system can create the same sound quality of digital with only 20 thousand Hertz of electrical space. Digital, on the other hand, requires almost 1.5 million Hertz of electrical bandwidth per second to reproduce the same kind of audio. (While this watered down explanation is not entirely technically accurate, it is meant to get the point across to as many readers as possible.)

If the quality is the same, and digital is not as efficient as analog, why even bother with digital?


The advantage digital audio has over analog is that the process of converting audio into digital bits is inherently immune to noises present in any transmission or storage medium. In other words, for all its disadvantages, the main thing you gain is the ability to make endless copies of the data and still have it sound as good as the original.

Please note that this benefit assumes you are not changing the data in any way during the copying process. This is an important factor to remember for reasons that will become apparent very soon.

There is no way to broadcast full linear CD Quality audio to listeners with the transmission systems in use for the past 80 or 90 years. Remember: it takes about 1.5 million Hertz of electrical space to reproduce linear digital audio; the most electrical bandwidth any Digital Audio Broadcast (DAB) service in existence has to work with is about 256 thousand Hertz of space.
For broadcasters using IBOC (HD Radio), the space available is even less: about 96 thousand Hertz of space. And this is assuming there is only one digital program service; there is even less space available if the secondary (or tertiary) channels are used in “Multicasting.”

The digital audio needs to “fit” in about 1/16 its “normal” bandwidth

So, how do you squeeze 1.5 million Hertz of data into 96 thousand Hertz of space?


Digital Audio Broadcast services have to use methods to permanently, and destructively, discard most of the digital audio data in order to make it all “fit” within the tight spectrum constraints.

The method of throwing away this “excess” data is commonly called “bit reduction” – where varying amounts of digital bits of data are discarded to make what is left fit within signal bandwidth constraints.

Now, remember what I said before: Perfect copies of digital audio data contain no noise nor errors so long as there is no change in the digital data across many copies. Bit reduced audio involves major changes to the digital data right from the first copy and, as a result, the decoded audio has very little resemblance to the original source.

The trick is for the decoded bit reduced audio to be perceived to be a “good enough” (if not close to a perfect) copy of the original. Audio processing becomes extremely important in this area as having optimum audio performance can enhance what is left of the audio and can even make or break the entire process.


Over the past nine or so years, this is the area in which I have been working. How can an audio processor enhance this process for the better? What new processes can be developed specifically for this new technology?

As my wife and friends can tell you, I am obsessed with these questions. What I could not have realized back then is how much of what I learned over the years of doing this is paying off now in such a major way.

I got involved with mixing audio processing with bit-rate-reduced perceptual coding technology back in 1996 when a friend and I decided to start up a 24/7 Internet radio station. Of course, the big thing that stuck out at me was the quality of the coded audio.

It was not good, of course, and I set out to see just how far I could take improving audio quality. What started out sounding like a gravely telephone-grade programming rapidly evolved into something that sounded more like AM radio broadcasts within a month of intense audio processing work.


Along the way I became intrigued by these perceptual audio CODECs and how you can use audio processing to get the most out of their performance.

It also did not hurt that this interest took hold when I started to work for Telos Systems – one of the leaders in the handling of coded audio for broadcast applications. If I ever need to know why certain CODECS behaved the way they did, the answers were in a thick deep technical reference book somewhere in their library!

Since that time, with every new CODEC that is released, I anxiously jump on board to see what it can do – and then immediately after that what I could do with it audio performance-wise.


I had not done much research work with AM or FM audio processing in quite some time. With my normal day job and dealings with small non-commercial stations I still spend lots of time adjusting what I call “legacy broadcast audio processing” on a regular basis.

But, by far, most of my research fun comes from learning how I can make perceptual CODECs “play” at peak performance through the use of external audio processing. To make bitrate reduced perceptual CODECs work at their best level, I find it necessary to research as much as possible about the technology in question.

This is also the same sort of information the broadcast engineer in the field needs to understand to make HD Radio technology play at its best.

After all, just how good would your ability to adjust your legacy AM or FM station to sound its best if you did not understand certain fundamental things such as the internal design of the transmitter, the choice of the transmission line and antenna, and the way all of that can have an effect on your audio processing efforts?


The major audio processing manufacturers have been doing a great job at staying ahead of the curve for you. Each of them have come up with decent presets that will work acceptably right out of the box, but you and I know that the best results come from hand-tailoring your processing to your facility and market.

Doing this with bit reduction CODECs requires some knowledge of what is going on under the digital radio transmitter hood. As this series progresses, my goal is to shed some light on this and point you in the right direction to learn more as you need it.

For example, while the exact nature of the CODEC used for HD Radio is a complete mystery to anyone outside Ibiquity corporate circles, a reasonable guess by many (including me) is that it is either the HE-AAC CODEC or some derivative closely related to it.


During my audio processing experimentation with both the HE-AAC / aacplus technology and HD Radio, I find the results to be extremely similar – close enough that I can test ideas at home in my workshop using aacplus and implement them the next day through the HDRadio system with virtually identical results.

With that correlation in mind, I plan to base our discussions around making HE-AAC / aacplus sound its best with audio processing. To start this series, we need to look at how the HE-AAC bit reducing CODEC operates.

In a previous article (Radio Guide, Septemer 2003), I have already touched upon the basics of perceptual coding, although it was somewhat outside the scope of that series of articles. However, if you want to go back into the archives, the article was titled “The Rock and The Pin” – it is a simplified discussion on perceptual audio coding, but will make a nice foundation as we start this series.