CS110: How Computers Work

A computer is a complex electronic machine, but its operations can be understood sufficiently in terms of a few interconnected components:

In the picture note that the components are connected through a set of wires called the bus.

This is called the fetch/execute cycle. Note that the processor is so much faster than the I/O devices, that I/O doesn't appear prominently here. A modern processor is able to execute millions of instructions while waiting for an I/O device, even a fast device like the disk or the network.

The main point here is that the computer doesn't know how to do anything "automatically": there's always some program code (set of instructions) telling it how to do something.

The input/output unit is connected with the usual peripherals such as keyboard, mouse, the various "drives" (such as hard drives, floppy drives, zip drives, DVD drives, CDROM drives, etc), monitors, printers etc. So, the components and operations of a computer are remarkably simple. Complication enters only for performance reasons, but any computer that you are likely to see these days contains the above components.

Kinds of Memory

Looking at the picture above, you see a big yellow box labeled Memory, but you'll also see, if you look closely, hard drive attached to the bus. What's the difference? Does it matter?

Yes, it does. The memory we describe there is super-fast memory chips that are right on the circuit board, just a few inches from the processor, what is sometimes called RAM or Random Access Memory. The most important properties of RAM are

If you've ever lost a document you were working on using your computer because the power went out or your battery ran out, that's because the stuff you were working on was in volatile memory. The files that are saved on the hard drive weren't lost.

The hard drive of your computer is (usually) a spinning disk with magnetic coating that stores the zeros and ones of your files as a pattern of magnetization. The exact representation isn't important, but two properties of hard drives are crucial:

There are other kinds of drives, such as solid-state drives, but the same properties appear. These properties mean that your computer mostly works by copying information (both programs and data, such as your files) from long-term storage on the hard drive into fast memory in the RAM, and that's where it stays while the computer is running. But you should save your stuff in RAM to hard disk on occasion, in order to avoid losing your work should someone trip on your power cord and pull the plug, or your battery unexpectedly die.

How information is represented

To complete this simple description of a computer we need to explain how the computer handles information: numbers, text, pictures, sound, movies, instructions.

The computer is an electronic device. Each of its wires can either carry electric current or... not carry current. So, like a light switch, it understands only two states. It turns out that this is enough to make the whole idea work. In fact, any system that can represent at least two states can represent information. Take, for example, the Morse code that is used in telegraphy. Morse is a sound transmission system that can carry a short beep (represented by a dot) and a long beeeeeep (represented by a dash). Any letter or number can be represented by a combination of these two symbols. Click here to see a Morse translator.

Similarly with computers. To represent a number, we use the binary arithmetic system, not the decimal number system that we use in everyday life. In the binary system, any number can be represented using only two symbols, 0 and 1. (Morse is almost, but not quite (due to the pauses between letters) a binary system. A system closely related to Morse is used by computers to do data compression (more about this later). Here is how the binary numbers correspond to our decimal numbers:

Decimal	Binary
0	0
1	1
2	10
3	11
4	100
5	101
6	110
7	111
8	1000
9	1001
10	1010
11	1011
12	1100
13	1101
14	1110
15	1111

And so on. Both systems are positional: a great idea that we owe to Arab mathematicians, because before them, counting in Roman was tough (DCCCLXXXII + CXVIII = M, you know...) and counting in Greek was almost impossible (omega pi beta + rho iota eta = alpha).

Positional means that the position of each symbol within the number determines its value. Thus, 3 has a different meaning in the rightmost position than it has in the immediate left (30). We do it without thinking, but we all know that the meaning of 1492 is:

The decimal system is also called "base 10" and the binary "base 2". You probably have not realized it, but you have been using the binary system when you deal with old-fashioned units of measure: 8 ounces in a cup, two cups in a pint, two pints in a quart, four quarts in a gallon, and so forth.

There are lots of other old-fashioned units that are rarely used. Here's a complete set:

2 Jacks	1 Gill
2 Gills	1 Chopin
2 Chopins	1 Pint
2 Pints	1 Quart
2 Quarts	1 Pottle
2 Pottles	1 Gallon
2 Gallons	1 Peck

2 Pecks	1 DemiBushel
2 DemiBushels	1 Bushel
2 Bushels	1 Kilderkin
2 Kilderkins	1 Barrel
2 Barrels	1 Hogshead
2 Hogsheads	1 Pipe
2 Pipes	1 Tun

So, a gallon contains 8 pints and a tun contains 256 gallons.

Of course, we can have positional systems on different bases, like base 12 (AKA "a dozen") and base 7 (AKA a week).

Below is a form that can help you convert between bytes (8-bit binary numbers) and decimal numbers easily. Feel free to play with it.

What's so grand about a positional system? Arithmetic calculations are much easier than in non-positional systems, Can you imagine what second grade would be like if you had to calculate that XLVIII + LXVII = CXV?

Exercise on Binary

It's important to be comfortable with binary as a number system. Try the following conversions:

Decimal	Binary

Fundamental Relationship

One of the key themes of this course is about representations. Computers represent lots of interesting things, such as colors, pictures, music, videos, as well as mundane things like numbers and text, or even complex things like programs. Ultimately, you know that, at the lowest level of all those representations, there are bits.

One of the important aspects of these binary representations is the relationship between the number of bits and the power of the representation. Just a few decades ago, engineers could have devised a representation for, say, videos, but no one would have used it because the representation required too many bits, and disk space, and memory sizes were so much less than they are now.

Let's be concrete for just a minute. You saw above that with one bit, you can number two things: the one labeled "zero" and the one labeled "one," because there are two possible patterns: {0, 1}. With two bits, you can number four things, because there are four possible patterns: { 00, 01, 10, 11 }.

Here's a table of the relationship between the number of bits and the number of patterns:

Number of Bits	Number of Patterns
1	2
2	4
3	8
4	16
5	32
6	64
7	128
8	256
...
16	≈ 65,000
...
24	≈ 16 million
...
32	≈ 4 billion
...
N	2^N

This fundamental relationship is important and is also unintuitive, because it is exponential. That is, if you double the number of bits, you don't get twice as many patterns, you get the square. (Compare 2 bits with 4, or 3 bits with 6.)

We'll see this relationship come up in, for example, our discussion of indexed color, because the limits we place on the number of bits in the representation results in a limit on the number of colors. This is a key idea.

Text Representation

Text is represented with the so-called ASCII code. Years ago, the manufacturers of early computers decided to represent every possible character (visible or invisible, like the space or the newline) with a number. The result was (partially) the code you see below.

To find the code for a particular character in the table, add its row number on the left (32, 48, etc.) to its column number on top. So, for example, A is represented by decimal number 65 (i.e., 64 + 1) or binary number 01000001. The greeting "Hi!" is represented by the sequence 72 105 33 or in binary 010010000110100100100001. Of course some care must be taken to recognize when we are looking at a number and when we are looking at a string of characters. But that's not difficult.

Control Characters

You'll notice that the table above starts with ASCII code 32, which is for the space character; yes, even the space character (sometimes denoted SPC) needs to be represented. The actual code starts at 0 (which is the null character, sometimes used to represent the end of a string), but the first 32 characters are "control" characters, because they were used to control the early printers. For example, the TAB character is ASCII code 9. Since those characters are not interesting in the context of this class, we've omitted them from the table.

Line Endings

If all we had to worry about was characters, text representation would be pretty straightforward. However, text is organized into lines, and for historical reasons, one of the subtle differences among Windows, Mac and Unix/Linux is how line endings are represented. In the olden days before Windows, Macs and Unix, the early teletype printers used two control characters at the end of each line: the carriage return character to move the print head back to the left, and the linefeed character to move the paper up by one line.

The Mac represents the end of a line with a carriage return character (CR, which is ASCII code 13). Linux uses a line feed character (NL, which is ASCII code 10). Windows uses both, just like the olden days.

Usually, when you transfer a text file from system to system, the FTP program (Fetch, WinSCP, or whatever) substitutes the appropriate line ending. The "text mode" of transfer says to make these substitutions; "binary mode" makes no substitutions and is more suitable for non-text files, such as images or programs. Note that HTML and CSS are both kinds of text. Most FTP programs have a "guess which mode" setting that usually works pretty well, but can occasionally make a mistake.

Unicode

The early ASCII system had space for 256 symbols, enough to represent all English characters, punctuation marks, numbers etc. It turns out that there are other languages on Earth besides English, (;-) and recent software is being written to accommodate those, too, via a much larger code called Unicode. You may want to read up on that in your spare time.

Instruction Representation

Once you can represent numbers and characters, you can also represent instructions! It was this observation that led von Neumann and his collaborators to create a general purpose, reprogrammable computer. Again, one needs to keep track of when you are looking at instructions versus a string of characters.

In the future we will learn how the computer represents images, sound and movies.

Bits and Bytes

Groups of bits that are used to represent characters came to be known as a byte. Remember that a byte is 8 bits. That Wikipedia page also discusses the history of the word and the names for larger groups of bytes, such as

A note on the standard abbreviations:

Uppercase B is used for bytes while lowercase b is used for bits. Network speeds, where bits go across a wire one at a time, are usually measured in bits per second, or bps. File sizes are always in bytes, hence kB or MB.
The abbreviation for kilo is a lowercase k, hence kB for kilobytes. The abbreviations for the other prefixes (mega, giga, tera, peta ...) are all uppercase: M, G, T, P ....

Historically, computer scientists have often used kilobyte to mean 1024 bytes, because 1024 is pretty close to 1000 and because bytes often come in chunks whose size is a power of 2, and 1024=2¹⁰. For example, if you buy a 4 GB flash drive, it won't hold exactly 4 billion bytes, but a bit more because of this difference. Hard drives and network speeds, on the other hand, are usually measured in powers of 10 rather than powers of two. In practice, it rarely matters.

Computers these days come with huge amounts of storage space on the hard drive (often hundreds of GB), but they are usually able to process only a few GB of them at a time (their main memory or RAM). We will use these symbols often in future lectures.

How Computers Work

Computer Components