Encryption

Today, we'll talk about how machines can communicate in a secure way. A little while ago, we talked about analog and digital signals, and this lecture will remind you of that one, but we're talking about a higher level: not just how information gets from point A to Point B, but how it can get there without anyone in between knowing what is being said.

Lots of us had fun with secret codes and such when we were kids. We'd get secret decoder rings in our cereal boxes and make up messages to send to each other. Could be useful if they were intercepted by the teacher! We hope that today's lecture evokes that sense of fun.

A condensed version of these notes can be found here.

The Cast of Characters

When we discuss encryption, it's traditional to name the people involved, just so we can keep things straight. The cast of our little drama is:

Alice, who wants to say something privately to Bob
Bob, who wants to hear from Alice
Eve, the person who is trying to eavesdrop on their conversation.

The story is that Alice is sending Bob a message through some insecure medium (not whispering in his ear): think postcard or radio. Alice might be a field agent and Bob is her contact at HQ. Alice and Bob might not even know each other: perhaps Alice is trying to persuade Bob to defect, and Eve is trying to find traitors.

Media

The messages might be sent by radio, or phone, or email, or carrier pigeon. All media are potentially insecure. Radio waves can be picked up by other receivers, phone wires can be tapped, as can computer networks. Even carrier pigeons can be captured.

Ciphers

One approach is to put the message in code. Humans have invented lots of codes in history, and code-making and code-breaking have often been pivotal. Indeed, Alan Turing and his code-breakers in England may have done more to win WWII against Hitler than General Patton, because they were able to break the codes sent with the Enigma machine used by the Germans. Similarly, the Navajo Indians, who used their native language as an unbreakable code in the war in the Pacific, contributed greatly to the war effort without ever firing a shot.

Terminology:

plaintext: (n) the message we want to send, in readable form
ciphertext: (n) the message in coded form, unreadable without special information
encrypt: (v) to turn plaintext into ciphertext
decrypt: (v) to turn ciphertext back into plaintext
cryptanalysis: (n) cracking a code; the attempt to decrypt without the required special information
cryptography: (n) The study of codes and code-breaking.

Caesar Codes

Caesar codes are named because Julius Caesar used them to communicate with his troops, so they're hardly new. They're also not particularly hard to break, but if you have to write your message on a postcard, it's better than nothing. The idea is letter substitution. A popular and easy way of doing it is rotating: Imagine the letters of our message are each printed on a wheel, and we simply turn the wheels so each letter is replaced by one that is n letters before or after it in the alphabet. For example, we replace each letter with the one preceding it in the alphabet (and “A” is replaced with “Z”).

Plaintext:

IBM

Ciphertext:

HAL

Plaintext:	IBM
Ciphertext:	HAL

(You may remember that the computer in Arthur C. Clarke's classic science fiction novel 2001: A Space Odyssey was the HAL 9000. Many people believe this was a thinly veiled reference to IBM, using a -1 rotation code.)

Caesar actually used a rotation of 3, so A was written as D, B as E, and so forth. Thus, “ATTACK” would be sent as “DWWDFN”.

Another popular Caesar code is called rot13 and has been used for many years on the Internet as a way to hide things from inadvertent reading, such as movie spoilers or dirty jokes. You have to decode it using rot13 to read the spoiler or the dirty joke. Rot13 is just a Caesar code incrementing each letter by 13. This is convenient because you can decode using the same software as encoding, since rotating twice by 13 brings a letter back to where it started.

If you want to encrypt using a Caesar code by hand, you can use the following form. Choose an amount to rotate by, then write your message using the letter from the cipher alphabet instead of the one from the plain alphabet — in other words, the one below the one you really mean. For example, setting rotation to 1, you can see “B” below “A” and “C” below “B” and so forth. Setting it to 25, you can see “H” below “I” and “A” below “B” and “L” below “M”, giving “IBM” for “HAL”.

Exercise 1

Use the form above to translate some simple words into code. Translate the first two using Caesar's code (rotate by 3) and the last by rot13.

plaintext	ciphertext
BY SEA
BY LAND
ATTACK

You can check your answers and try out other Caesar codes using the following form. Note that this form converts the string to uppercase. Also, it only rotates the letters; it leaves the numbers, punctuation and other characters alone. It could be easily re-written to handle more stuff.

So, now you and a friend can exchange secret messages, say by email, just by encrypting them using this form. You just have to agree on a rotation amount (and keep it secret!).

ASCII Codes implementing Caesar Ciphers

Earlier, we ignored punctuation and anything other than a letter. Obviously, that makes for a bad code. What we really should do is treat all characters the same. How is this done? First, we have to say what a character is.

As you know, everything in a computer is just numbers. How does it deal with characters? Essentially, early computer designers agreed on a set of characters and a standard numbering for each one. For example, A is a 65. One such system was ASCII: American Standard Code for Information Interchange. Here's part of the ASCII code:

You read this table just like an addition table. Add together the row and column headers, and you find the numerical value (ASCII code) of the character at the intersection of that row and column.

Note: The ASCII code is not about encryption; it's just a standard for numbering characters. The existence of such a numbering means that we can do rotation codes numerically, like this:

   encoded_char = (plaintext_char + rotation_amount) % 128;

ASCII is now being supplanted by UNICODE, which is a vastly larger code, designed to handle all the world's languages.

Code Breaking

Caesar ciphers are relatively easy to break, as is any cipher based just on substitution of letters. For example, the most common letter in the ciphertext is probably the encypted form of a very common letter, such as “e” or “t” or “a”. With more sophisticated statistics, and trial and error, both of which computers are good at, it's fairly easy to crack a substitution cipher.

Vigenere Cipher

One way to spoil the statistics of a Caesar cipher is to use multiple Caesar ciphers. In other words, suppose we rotate the first letter of the plaintext by 2, the second letter by 0, and the third letter by 19. Then we repeat, rotating by 2, 0, 19, until we're done with the plaintext. Such a technique is called a Vigenere Cipher, named for its inventor, Blaise de Vigenere from the court of Henry III of France in the sixteenth century. It was considered unbreakable for some 300 years!

Another way to think of the Vigenere Cipher is as follows. Write down a keyword, such as “cat”. Then, use the index of each letter as the amount to rotate. For example, if the keyword is “cat”, then we will use a 2 rotation, a 0 rotation, and a 19 rotation (“t” is the 19^th letter of the alphabet if you start from zero: A is the 0^th letter). Write down the keyword above the plaintext and use it to select the correct rotation. It helps to have a table of all the rotations:

Here's the idea. To encrypt the message:

ATTACK AT DAWN

using the keyword “CAT”, we write the keyword above the message repeatedly:

keyword

CATCATCATCATCA

plaintext

ATTACK AT DAWN

ciphertext

Now, below, we compute each letter of the ciphertext by looking at the intersection of the row of the keyword and the column of the plaintext. (In fact, because of the way the table is set up, it doesn't matter whether the keyword character is the row or the column.) In our example:

keyword

CATCATCATCATCA

plaintext

ATTACK AT DAWN

ciphertext

CTMCCD AM DTYN

Notice:

The two T's in “ATTACK” were encrypted differently, unlike the simple substitution cipher. This is why we can't just do a simple frequency analysis to crack the code.
The letter “A” is encrypted four times, yielding only three different results, since there are only three letters in the keyword. If the plaintext were much longer, a sophisticated frequency analysis might be able to determine information about the keyword and plaintext. A longer keyword would give more variety and hence more security.

Exercise 2

Use the table above to encrypt “ATTACK” using the Vignere cipher and the keyword “CODE”. Notice how the A's and T's never have the same substitutions.

Because a Vigenere cipher uses more than one substitution alphabet, it's one of a bunch of ciphers known as polyalphabetic.

Enigma

The Germans in WWII used an encryption scheme based on a polyalphabetic cipher. They built machines to do the encryption and named them Enigma machines. They considered the encryption by the Enigma machine unbreakable, and relied on it to communicate with their U-boats in the North Atlantic. The Enigma machine looked like manual typewriters with keys and hammers, but internally the keys were attached to drums that did the substitutions by “re-wiring”: mapping the keys to different hammers. Actually, there were three (later four) rotors that did the substitutions and after each letter was typed, the rotors turned like a car odometer (the rightmost fastest, and so forth).

The British managed to steal one of the machines and figure out how it worked. (This was the story of the movie U-571, except that in the movie the Americans stole the machine, but in real life the British were the heros.) Unfortunately, they still needed the keyword (the settings of the rotors). A group of mathematicians led by Alan Turing (and including quite a few women) were able to analyze the transmissions and, eventually, crack the code. This was a major turning point in the war, and the Allies went to enormous effort and sacrifice to conceal the fact that they had cracked the code.

Here are some links about the Enigma machine:

Private Key Encryption

Now that we understand something about encryption, let's go back to our initial scenario and see how it works. Alice wants to send a message to Bob, without Eve or anyone else being able to read the message. She encrypts it using the secret key, and sends it to Bob. Even if Eve intercepts the message, she won't be able to read it. Bob uses the secret key to decrypt the message and read what Alice says.

Suppose that Alice is behind enemy lines and Bob is back at home base. She gets to a radio transmitter, gets out her secret code book, encrypts her message using today's key, and sends it to Bob. Bob has to get out the matching code book to decrypt the message. If Eve is able to capture the code book, disaster!

If Eve captures the code book, Alice needs to send a new secret key to Bob. She needs to send it securely, so that Eve can't read it. Thus, we're back at square one: Alice needs to send a key securely to Bob, so that she can send a message securely to Bob. Consider the secret codes we used earlier, when we said you could send email messages in code, just by agreeing on a rotation amount, but if you had to send the rotation amount by email, you're stuck!

For thousands of years, this was the essential paradox of encryption: you had to have a secure way of communicating in order to have a secure way of communicating. A real-life chicken-and-egg problem!

The kernel of the paradox is that all of the encryption methods that we have discussed so far are kinds of private key systems: they required a shared private key.

Public Key Encryption

In 1976, Whitfield Diffie and Martin Hellman solved the problem of deriving a shared private key over an insecure channel, an invention that came to be known as Diffie-Hellman key exchange. While it contains some cool ideas that are still in use today, we won't purse it further in this course. (If you're interested, there is a good Wikipedia article on Diffie-Hellman key exchange.) Instead, we will discuss the RSA cryptographic system, which has some very interesting properties.

The year after Diffie and Hellman came out with their cryptographic system, three MIT professors (Ron Rivest, Adi Shamir and Leonard Adleman) created the first published, practical implementation of public key cryptography. The main ideas of a public key cryptosystem are:

Instead of one key, you have two: one to encrypt and a different one to decrypt.
The encryption key can be public.
Knowing the encryption key doesn't help you figure out the decryption key.

How does this help? Very simple: Bob knows that Alice wants to send him a message, so he creates a pair of keys. He advertises his public key all over the world (maybe he puts it on his web page). Alice sees it, and so does Eve. Alice downloads it and uses it to encrypt her message. She radios it to Bob, and Eve intercepts it, but Eve can't decrypt the message — only Bob can. No one but Bob can decrypt the message, because only Bob has the secret key.

Imagine that we send information around in lockable trunks (rather than, say, paper envelopes).

With private key encryption, there's one key for each trunk. The same key is used to lock and unlock it. Therefore, Alice and Bob have to have the same key. If Alice wants to send a message securely to Bob, and he doesn't have the key, she needs to send him that first. That's a problem. Why?
With public key encryption , there are two keys for each trunk. If one key is used to lock the trunk, it can't be used to unlock the trunk; only the other key can unlock the trunk. They have to cancel each other out. If Alice wants to send a message securely to Bob, she first gets his public key, maybe off his web page or something. She then locks the trunk with the public key and sends it to Bob. He then unlocks it with the private key!

A pictoral comparison of Private and Public Key Encryption
Private Key	Public Key

Implementing Public Key Encryption

The preceding description is relatively straightforward, but the implementation is not. The solution, called the RSA method, involves advanced number theory that can cause grown men to burst into tears (or maybe it's just me), but the idea is very cool. (If you're interested, please consult the Wikipedia article on the RSA method, but be warned that the article assumes a good deal of knowledge of number theory, but there are good examples and references.)

The pair of keys are mathematically related, and
You can reasonably efficiently generate them and use them to encrypt and decrypt messages, but
To try to determine one key from the other involves solving a problem that is mathematically “hard.”

The last fact makes the RSA method difficult to crack. For their contribution, the members of the RSA team received the 2002 TURING AWARD, which is considered the Nobel-equivalent for Computer Science.

Breaking RSA

What does “hard” mean? Consider factoring a number. The state of the art is little better than trying all the possible factors to see if they divide evenly into the number. Suppose the number is 100 digits long. How many numbers would you have to try? All the primes up to the square root of the 100 digit number, and the square root will have 50 digits. So, it's about

1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000

Yes, but computers are fast, right? Suppose we have a fast computer that can try a billion factors every second. Then we only have to wait for

1,000,000,000,000,000,000,000,000,000,000,000,000,000,000

seconds. Since there are about 32 million seconds in a year, this will only take

31,000,000,000,000,000,000,000,000,000,000,000

years.

The centerpiece of the RSA method uses numbers that are the products of two very large prime numbers, forcing Eve to buy very fast computers and wait a very long time to crack the encryption.

Public Key Encryption in practice

When you access a secure web server, your browser and the server become Alice and Bob. They want to communicate securely, but they don't know each other and they didn't make prior arrangements to have a secret key. Instead, they use the idea of public key encryption. When the little padlock icon closes, your browser has established a cryptographically secure connection to the server, and so your credit card numbers and other private information is safe from the Eves on all the network connections between you and the server.

Is your information safe? Not entirely. Maybe someone can hack into the server. Maybe someone at the destination isn't trustworthy. Maybe the server is a complete fraud, put up by Eve. Maybe someone was looking over your shoulder when you typed. There is probably no such thing as perfect security. We all have to decide how much effort to put into security.

Encryption isn't just for web traffic. It could also be used for email and for telephone messages, including cell phones. Right now, anyone with a radio scanner and a little know-how can listen to cell phone conversations. Similarly, tapping a phone is very easy. Encryption lets us use any of these insecure media to send a private message.

Digital Signatures and Authentication

Suppose Alice wants to send Bob the message:

Call off the attack, it's a trap! Signed, Alice

She encrypts her message with Bob's public key and radios it to him. Meanwhile, Eve sends Bob the message:

Go on with the attack, it's all clear! Signed, Alice

She also encrypts the message with Bob's public key and radios it to him. She's pretending to be Alice! What is Bob to think?

There's a cool aspect to public key encryption that we haven't mentioned. We said that one key decrypts what the other encrypts. In fact, both keys can encrypt, and the two keys are opposites, which means that one can decrypt what the other encrypts.

So, here's what Alice does: she encrypts her message with her own private key. Bob gets it and successfully decrypts it with Alice's public key. (She has a web site with her public key listed.) He then realizes that only Alice could have sent this message, since only her private key can create a message that her public key can decrypt.

Thus, public key encryption can give us digital signatures . The purpose of a regular, real-life signature is that, presumably, only you can sign your name the way you do. By comparison to a known signature on file, your bank or any other interested party can verify that something has been signed by you.

The “signing” of digital certificates, which we discuss in the lecture on certificates, actually uses digital signatures just like this. It just involves a different use of public key encryption: a trusted third party signs the public key, using its own private key! This is confusing, so think about it for a while...

Similarly, if there was a public key on file for a person, they can use their private key to “sign” an electronic document, even an email.

Key Escrow

We've been taking the side of Alice and Bob, helping them keep information away from Eve. What if Alice and Bob are bad guys and Eve is the government? What's the point of getting a court order to tap a phone line if the conversation is encrypted? What can Uncle Sam do?

Throughout the 90s, the U.S. government did several things:

It categorized strong crypto systems as munitions, like napalm and cluster bombs, so that it was illegal to export them from the United States. That's why there were domestic versions of Netscape and international versions. The international version didn't have the high-powered crypto.
It established the Data Encryption Standard (DES) and required computer manufacturers to use it for any encryption.
The DES required that the government be given a copy of any key generated, so that law enforcement could decrypt any message that it deemed necessary to decrypt, like getting a court order to tap a phone.

There was a lot of resistance. Computer manufacturers didn't want to have separate international and domestic versions of software. Privacy advocates and civil libertarians worried about the government's power to invade privacy. (The keys would be split in half with the halves held by different agencies, to prevent a single agency running amok and violating civil rights.) Practical people pointed out that the bad guys wouldn't use crypto systems that the U.S. government had the keys to; they'd use foreign-made crypto systems. Eventually, the government caved in, but in our current climate, the desire to clamp down on cryptography will rise again.

Discussion Topics

How safe do you feel shopping online using secure connections? What about insecure connections?
What do you think of digital signatures?
Should the U.S. government allow private citizens to use encryption?
Should the U.S. government require private citizens to use key escrow?
Should the U.S. government forbid the export of strong encryption systems?

Lesson Summary

Encryption makes a message unreadable by those who lack the special key. On computers, this is always a symbolic/numerical operation, such as substituting letters or rotating the alphabet.
Ciphers (procedures for encryption) vary in how difficult they are to crack. Substitution ciphers and the like can be quite easy, depending on how much ciphertext there is.
There are two broad categories of ciphers: private key and public key.
Private key ciphers have the disadvantage that Alice and Bob have to have a secure way to tranfer the key. While this may be fine in many real-world applications, it doesn't work well across the internet.
Public key ciphers allow Alice to send Bob a message by encrypting it with Bob's public key, so that no one but Bob can read it.
Public key ciphers also allow Alice to sign her message by encrypting it with her own private key, so that when Bob is able to decrypt it using her public key, he knows it's from Alice.
Key escrow systems allow the government to decrypt messages. There is still a lot of debate about this. Privacy advocates, civil libertarians and computer industry executives are against it. Law enforcement officials and others are for it.
What do you think?

Solutions to Exercises

[an error occurred while processing this directive]

keyword	`CATCATCATCATCA`
plaintext	`ATTACK AT DAWN`
ciphertext	`CTMCCD AM DTYN`

rotation amount:
plaintext:
ciphertext: