How information is represented

Every description of a computer needs to explain how the computer handles information: numbers, text, pictures, sound, movies, instructions.

The computer is an electronic device. Each of its wires can either carry electric current or... not carry current. So, like a light switch, it understands only two states. It turns out that this is enough to make the whole idea work. In fact, any system that can represent at least two states can represent information. Take, for example, the Morse code that is used in telegraphy. Morse is a sound transmission system that can carry a short beep (represented by a dot) and a long beeeeeep (represented by a dash). Any letter or number can be represented by a combination of these two symbols. Click here to see a Morse translator.

Similarly with computers. To represent a number, we use the binary arithmetic system, not the decimal number system that we use in everyday life. In the binary system, any number can be represented using only two symbols, 0 and 1. (Morse is almost, but not quite (due to the pauses between letters) a binary system. A system closely related to Morse is used by computers to do data compression (more about this later). Here is how the binary numbers correspond to our decimal numbers:

DecimalBinary
00
11
210
311
4100
5101
6110
7111
81000
91001
101010
111011
121100
131101
141110
151111

And so on. Both systems are positional: a great idea that we owe to Arab mathematicians, because before them, counting in Roman was tough (DCCCLXXXII + CXVIII = M, you know...) and counting in Greek was almost impossible (omega pi beta + rho iota eta = alpha).

Positional means that the position of each symbol within the number determines its value. Thus, 3 has a different meaning in the rightmost position than it has in the immediate left (30). We do it without thinking, but we all know that the meaning of 1492 is:

1492 = 1*1000 + 4*100 + 9*10 + 2*1

Notice how on the far right we have 1 = 100, then going left 10 = 101, 100 = 102, and 1000 = 103. Thus the meaning of every digit is defined by the power of 10 in that position.

Similarly, number 10011 in binary means 19 because

19 = 1*16 + 0*8 + 0*4 + 1*2 + 1*1

The numbers 16, 8, 4, 2, 1 are, as you know, powers of 2, starting on the right and moving left: 1 = 20, 2 = 21, 4 = 21, 8 = 23, and 16 = 24.

The decimal system is also called "base 10" and the binary "base 2", because every digit in a number contributes to the whole based on the power of 10 (or power of 2) with which it is multiplied.

Of course, we can have positional systems on different bases, like base 12 (AKA "a dozen") and base 7 (AKA a week).

Below is a form that can help you convert between bytes (8-bit binary numbers) and decimal numbers easily. Feel free to play with it.

Binary:

128 64 32 16 8 4 2 1

Decimal:

What's so grand about a positional system? Arithmetic calculations are much easier than in non-positional systems, Can you imagine what second grade would be like if you had to calculate that XLVIII + LXVII = CXV?

Fundamental Relationship

One of the key themes of this course is about representations. Computers represent lots of interesting things, such as colors, pictures, music, videos, as well as mundane things like numbers and text, or even complex things like programs. Ultimately, you know that, at the lowest level of all those representations, there are bits.

One of the important aspects of these binary representations is the relationship between the number of bits and the power of the representation. Just a few decades ago, engineers could have devised a representation for, say, videos, but no one would have used it because the representation required too many bits, and disk space, and memory sizes were so much less than they are now.

Let's be concrete for just a minute. You saw above that with one bit, you can number two things: the one labeled "zero" and the one labeled "one," because there are two possible patterns: {0, 1}. With two bits, you can number four things, because there are four possible patterns: { 00, 01, 10, 11 }.

Here's a table of the relationship between the number of bits and the number of patterns:

Number of BitsNumber of Patterns
12
24
38
416
532
664
7128
8256
...
16≈ 65,000
...
24≈ 16 million
...
32≈ 4 billion
...
N2N

This fundamental relationship is important and is also unintuitive, because it is exponential. That is, if you double the number of bits, you don't get twice as many patterns, you get the square. (Compare 2 bits with 4, or 3 bits with 6.)

We'll see this relationship come up in, for example, our discussion of indexed color, because the limits we place on the number of bits in the representation results in a limit on the number of colors. This is a key idea.

Text Representation

Text is represented with the so-called ASCII code. Years ago, the manufacturers of early computers decided to represent every possible character (visible or invisible, like the space or the newline) with a number. The result was (partially) the code you see below.

The ASCII character set
The ASCII character set table. Click for bigger image.

Every cell in table holds three pieces of information: the character (e.g. SP for space, the digits 0 to 9, or uppercase letters A to Z), its ASCII code (a decimal number from 0 to 127), and their binary representation with 8 bits.

For example the letter E has the ASCII code of 69 and a binary representation of 01000100. The 8 bits are depicted in groups of four, because four bits are used to represent a single digit in the hexadecimal system, that we will discuss later.

The first two rows of the table represent so-called control characters, characters that are not visible, such as backspace (BS), escape (ESC), CR (carriage return - an old word for enter), etc. If you are interested in all acronyms, the AsciiTable website explains them in detail.

Unicode

The early ASCII system had space for 256 symbols, enough to represent all English characters, punctuation marks, numbers etc. It turns out that there are other languages on Earth besides English, (;-) and recent software is being written to accommodate those, too, via a much larger code called Unicode. We have already been using unicode when putting this tag in our HTML:

    <meta charset="utf-8" >

Your Head First HTML & CSS book talks about unicode on page 239.

Bits and Bytes

Groups of bits that are used to represent characters came to be known as a byte. Remember that a byte is 8 bits. That Wikipedia page also discusses the history of the word and the names for larger groups of bytes, such as

A note on the standard abbreviations

Historically, computer scientists have often used kilobyte to mean 1024 bytes, because 1024 is pretty close to 1000 and because bytes often come in chunks whose size is a power of 2, and 1024=210. For example, if you buy a 4 GB flash drive, it won't hold exactly 4 billion bytes, but a bit more because of this difference. Hard drives and network speeds, on the other hand, are usually measured in powers of 10 rather than powers of two. In practice, it rarely matters.

Computers these days come with huge amounts of storage space on the hard drive (often hundreds of GB), but they are usually able to process only a few GB of them at a time (their main memory or RAM). We will use these symbols often in future lectures.