Sound Representation

Signals

When sound is transmitted or stored it may need to change form, hopefully without being destroyed.

A signal is an undulating curve

Sound moves fast: in air, at 340 m/sec = 750 miles per hour. Its two important characteristics are Frequency (aka pitch) and Amplitude (aka loudness). Frequency is measured in Hz or cycles per second. Humans can hear frequencies between 20 Hz and 20,000 Hz (20 KHz). Amplitude is measured in deciBels (we will see later that it is approximated with "bit-resolution").

Consider music:

Sound is simply pressure waves in air, caused by drums, guitar strings or vocal cords
Converted to electrical signals by a microphone
Converted to magnetism when it's put on a master tape and edited
Converted to spots on a CD when the CD is manufactured
Converted to laser light, then electricity when played by a CD player
Converted back to sound by a speaker

A similar kind of story can be told about visual images (sequences of static images) stored on videotape or DVD and played on your home VCR or DVD player.

Degradation

Any time signals are transmitted, there will be some degrading of quality:

signals may fade with time and distance
signals may get combined with interference from other sources (static)
signals may be chopped up or lost

When we continue to transmit and transform signals, the effect is compounded. Think of the children's game of "telephone." Or think about photocopies of photocopies of photocopies...

Example

This is the transmitted signal:

and this is the received signal (dashed) compared to the transmitted signal:

The horizontal axis here is time. The vertical axis is some physical property of the signal, such as electrical voltage, pressure of a sound wave, or intensity of light.

The degradation may not be immediately obvious, but there is a general lessening of strength and there is some noise added near the second peak.

There doesn't have to be much degradation for it to have a noticeable and unpleasant cumulative effect!

Analog Signals

The pictures we saw above are examples of analog signals:

An analog signal varies some physical property, such as voltage, in proportion to the information that we are trying to transmit.

Examples of analog technology:

photocopiers
old land-line telephones
audio tapes
old televisions (intensity and color information per scan line)
VCRs (same as TV)

Analog technology always suffers from degradation when copied.

Digital Signals

With a digital signal, we are using an analog signal to transmit numbers, which we convert into bits and then transmit the bits.

A digital signal uses some physical property, such as voltage, to transmit a single bit of information.

Suppose we want to transmit the number 6. In binary, that number is 110. We first decide that, say, "high" means a 1 and "low" means a 0. Thus, 6 might look like:

The heavy black line is the signal, which rises to the maximum to indicate a 1 and falls to the minimum to indicate a 0.

Degrade/Restore Digital Signals

The signals used to transmit bits degrade, too, because any physical process degrades. However, and this is the really cool part, the degraded signal can be "cleaned up," because we know that each bit is either 0 or 1. Thus, the previous signal might be degraded to the following:

Despite the general erosion of the signal, we can still figure out which are the 0s and which are the 1s, and restore it to:

This restoration isn't possible with analog signals, because with analog there aren't just two possibilities. Compare a photocopy of a photocopy ... with a copy of a copy of a copy of a computer file. The computer files are (very probably) perfect copies of the original file.

The actual implementation of digital transmission is somewhat more complex than this, but the general technique is the same: two signals that are easily distinguishable even when they are degraded.

Summary of Digital Communication

The main point here is that digital transmission and storage of information offers the possibility of perfect (undegraded) copies, because we are only trying to distinguish 1s from 0s, and because of mathematical error checking and error correcting.

Converting Analog to Digital

analog going in, digital
coming out If digital is so much better, can we use digital for music and pictures? Of course! To do that, we must convert analog to digital, which is done by sampling.

Sampling measures the analog signal at different moments in time, recording the physical property of the signal (such as voltage) as a number. We then transmit the stream of numbers. Here's how we might sample the analog signal we saw earlier:

Reading off the vertical scale on the left, we would transmit the numbers 0, 5, 3, 3, -4, ... (The number of bits we need to represent these numbers is the so-called bit-resoluton. In some sense it is the sound equivalent to images' bit-depth.)

Converting Digital to Analog

Of course, at the other end of the process, we have to convert back to analog, also called "reconstructing" the signal. This is essentially done by drawing a curve through the points. In the following picture, the reconstructed curve is dashed

In the example, you can see that the first part of the curve is fine, but there are some mistakes in the later parts.

The solution to this has two parts:

the vertical axis must be fine enough resolution, so that we don't have to round off by too much, and
the horizontal axis must be fine enough, so that we sample often enough.

In the example above, it's clear that we didn't sample often enough to get the detail in the intervals. If we double it, we get the following, which is much better.

In general, finer resolution (bits on the vertical axis) and faster sampling, gets you better quality (reproduction of the original signal) but the size of the file increases accordingly.

The Nyquist Sampling Theorem

How often must we sample? The answer is actually known, and it's called the Nyquist Sampling Theorem (first articulated by Nyquist and later proven by Shannon). Roughly, the theorem says:

Sample twice as often as the highest frequency you want to capture.

For example, the highest sound frequency that most people can hear is about 20 KHz (20,000 cycles per second), with some sharp ears able to hear up to 22 KHz. (Test yourself with this Online tone generator or this hearing test.) So we can capture music by sampling at 44 KHz (44,000 times per second). That's how fast music is sampled for CD-quality music (actually, 44.1 KHz).

Exercise on the Nyquist Theorem

If the highest frequency you want to capture is middle C on the piano, at what frequency do you have to sample? 2*261.626 Hz
If you want to record piano music (the highest key on a piano is C8, at what frequency do you have to sample? 2*4186 Hz
If you want to record Whale sounds, at what frequency do you have to sample? 2*10,000, or 20KHz
If you want to record sounds that bats can hear, at what frequency do you have to sample? 2*100,000 Hz, or 200 KHz

File Size

The size of an uncompressed audio file depends on the number of bits per second, called the bit rate and the length of the sound (in seconds).

We've seen that there are two important contributions to the bit rate, namely:

sampling rate (horizontal axis), and
bit resolution (vertical axis)

As the sampling rate is doubled, say from 11KHz to 22KHz to 44KHz, the file size doubles each time. Similarly, doubling the bit resolution, say from 8 bits to 16 bits doubles the file size.

As we've seen, the sampling rate for CD-quality music is 44KHz. The bit-resolution of CD-quality music is 16: that is, 16-bit numbers are used on the vertical axis, giving us 2¹⁶=65,536 distinct levels from lowest to highest. Using this, we can actually calculate the bit rate and the file size:

bit rate (bits per second) = bit-resolution * sampling rate
file size (in bits) = bit rate * recording time

For example, how many bits is 1 second of monophonic CD music?

16 bits per sample * 44000 samples per second * 1 second = 704,000
Therefore, 704,000 / 8 bits per byte = 88,000 bytes ≈ 88 KB

That's 88 KB for one second of music! (Note that there are 1000 bytes in 1KB, so 88000/1000 is 88KB.)

Channels

And that's not even stereo music! To get stereo, you have to add another 88KB for the second channel for a total bit-rate of 176KB/second.

An hour of CD-quality stereo music would be:

176 KB/sec * 3600 seconds/hour = 633,600 KB ≈ 634 MB

634 MB is about the size of a CD. In fact, it is not accidental that a CD can hold about 1 hour of music; it was designed that way.

Exercise: Bit Rate & File Size

If you record an hour of piano music (the highest key on a piano is C8) in mono at 16 bits, what is the bit rate and file size? bit-rate is 16*2*4186, and file size is bit-rate*3600
If you record an hour of bat-audible sounds in stereo at 32 bits, what is the bit-rate and file size? bit-rate is 32*2*2*100,000, and file size is bit-rate*3600

Exercise: Form for Bit-Rate and File Size

Consider the following form to compute bit-rate and file size. Fill in the missing function definitions.

Choices

What are the practical implications of various choices of sampling rate and bit-resolution?

If you're recording full-quality music, you'd want to use the 44 KHz and 16-bit choices that we've done some calculations with.
If you're recording speech in your native tongue (when your brain can fill in lots of missing information), you can cut many corners. Even a high-pitched woman's voice will not have the high frequencies of a piccolo, so you can probably reduce the numbers to 11KHz and 8-bit. Furthermore, you will only need one channel (monophonic sound) not two channels (stereo). Thus, you've already saved yourself a factor of 16 in bit-rate.
Speech in a foreign language is harder to understand, so it might make sense to use 22KHz and 16-bit resolution. Of course, you would still use one channel not two. This is still a factor a 4 decrease in the bit-rate.
Music before about 1956 wasn't recorded in stereo, so you would only need one channel for that.

Compression

Bandwidth over the internet cannot compete with the playback speed of a CD. Think of how long it would take for that to be downloaded over a slow modem.

So, is it impossible to have sound and movies on your web pages? No, thanks to sound compression techniques. We have seen how GIF and JPG manage to compress images to a fraction of what they would otherwise require. In the case of sound and video, we have some very powerful compression file formats such as Quicktime, AVI, RealAudio and MP3. Read more about the history of MP3 (or history of MP3).

The tradeoffs among different compression formats and different bit rates are explained well in this 2007 article on audio formats from the New York Times. (This article is available only on-campus or with a password.)

A discussion of the technology behind these compression schemes is beyond the scope of this course. They are similar in spirit to the JPEG compression algorithm, in that they are lossy compression schemes. That is, they discard bits, but hopefully the bits that least degrade the quality of the music?

Some compression algorithms take advantage of the similarity between two channels of stereo, so adding a second channel might only add 20-30%.

What do you think?

Discussion Topics

Modems
Does vinyl sound better than CD? (see this discussion of vinyl versus CD on answers.google.com)
CD versus MP3
DVD audio representation that samples at 192 KHz with 24 bits per sample (see this Wikipedia description)
Digital TV
Digital cell phones

Summary

Music, video, voice, pictures, data and so forth are all examples of signals to be transmitted and stored.
Signals inevitably degrade when transmitted or stored.
With analog signals, there's no way to tell whether the received signal is accurate or not, or how it has been degraded.
With digital signals, we can, at least in principle, restore the original signal and thereby attain perfect transmission and storage of information.
We can convert analog signals to digital signals by sampling.
We can convert digital signals back to analog signals by reconstructing the original signal. If the original sampling was good enough, the reconstruction can be essentially perfect.
Wellesley's LTS has a nice page on digital audio

A condensed version of these notes can be found here.

Even More Info & Examples -- OPTIONAL

Note that beyond here is information that we think you might find interesting and useful, but which you will not be responsible for. It's for the intellectually curious student.

Error Detection

Suppose we have a really bad burst of static, so a 1 turns into a 0 or vice versa. Then what? We can detect errors by transmitting some additional, redundant information. Usually, we transmit a "parity" bit: this is an extra bit that is 1 if the original binary data has an odd number of 1s. Therefore, the transmitted bytes always have an even number of 1s. This is called "even" parity. (There's also "odd" parity.)

How does this help? If the receiver gets a byte with an odd number of 1s, there must have been an error, so we ask for a re-transmission. Thus, we can detect errors in transmission.

You can see some examples of parity using the following form. The parity bit is the last (rightmost) one, with the red outline.

Decimal

Binary

data parity

Exercise on Parity

Assuming even parity, what is the parity bit for each of the following:

00001011₂ 1, because there are 3 ones in the number
01001011₂ 0, because there are 4 ones in the number
23₁₆ 1, because there are 3 ones total in the number (convert it from hex to binary)
FF₁₆ 0, because there are 8 ones total in the number (convert it from hex to binary)
DEADBEEF₁₆ 0, because the ones 33233334, for a total of 24, which is even

Error Correction

With some additional mathematical tricks, we can not only detect that a bit is wrong, but which bit is wrong, which means we can correct the value. Thus, we don't even have to ask for re-transmission, we can just fix the problem and go on.

Try it with the following JavaScript form. Type in a number, and it will tell you the binary code to transmit. Then, take the bits and add any single-bit error you want. (In other words, change any 1 to 0 or any 0 to 1.) If you click on "receive," it will tell you which bit is wrong and correct it. If you think I'm cheating, you can type the bits into another browser!

Note: for technical reasons, the parity bits are interspersed with the data bits. In our example, the parity bits are bits 1, 2, 4 and 8, numbering from the left starting at 1. (Notice that those bit position numbers are all powers of two.) So, that means the seven data bits are bits 3, 5, 6, 7, 9, 10, and 11.

Transmit

Binary Code

Receive

Exercise on Error Correction

Type in a number and click on the transmit button
Change one of the bits (either a data bit or a parity bit).
Click on the receive button.
Gasp in amazement that it figures out what bit you modified
Repeat

What if more than one bit is wrong? What if a whole burst of errors comes along? There are mathematical tricks involving larger chunks of bits to check whether the transmission was correct. If not, re-transmission is often possible.

How Hamming Codes work

The error correcting code we saw above may seem a bit magical. And, indeed, the algorithm is pretty clever. But once you see it work, it becomes somewhat mechanical.

Here's the basic idea of this error-correcting code. (This particular code is a Hamming code. The Hamming (7,4) code sends 7 bits, 4 of which are data. The (7,4) code is easy to visualize using Venn diagrams. The general idea is this:

If we have 11 bit positions total, number them from left to right as 1-11.
Write down the 11 numbers for the bit positions in binary. For example, position 5 (which is a data bit, since 5 is not a power of 2) is expressed as 0101.
The bits that are "true" in the binary number expressing the position define which parity bits check that position. For example, position 5 (0101) is checked by the parity bit at position 1 and the one at position 4.
The positions of the parity bits that are "wrong" add up to the position of the wrong bit. So if the parity bits at positions 1 and 4 are wrong, that means bit 5 is the one that is wrong.

For more detail, see this general algorithm

Solution to Exercise with Form for Bit-Rate and File Size

Solution to the Exercise with Form for Bit-Rate and File Size is here.