Bcrypt

To have secure logins, we need a way to store users's passwords.

Because hackers have been known to penetrate into secure servers, we need extra layers of defense. In particular, we will not store the plaintext password, but rather a one-way hash of the password.

Hashed Passwords

How does that work? There are two phases to the process: creating an account and verifying a password when someone logs in.

Imagine that Neville Longbottom is creating an account and he wants to use dilligrout as his password.

Account Creation

Neville fills out the form, the server receives the username (nlongbottom) and password (dilligrout). It computes a hash of the password, and stores the username and hash:

username passwd
nlongbottom 591747ca7ce5608a92d17236d3b1d0d7a80968f764bc13288e4471d0bd30060b

The system then discards the plaintext password, not storing it anywhere.

Login

Later, when Neville logs in, he provides the same username and password (he carefully wrote the password down so he wouldn't forget it!), the system again computes the hash of dilligrout and gets 591..60b. That matches the stored hash, and Neville is logged in.

Hackers

If Malfoy hacks into the server and gets Neville's hashed password, he can't reverse the computation to get "dilligrout" and so he can't use the hashed password to login. His only hope is to run the one-way hash forward on many, many possible passwords, hoping that one of them will hash to the 591...60b value. If he hits on "dilligrout", it will, of course, match, and he'll know Neville's password.

Salt

The scheme above has two weaknesses. First, a user can choose a stupid password, one the attacker can easily guess. Secondly, two or more users might choose the same password, and the attacker needs to only guess one to get them all. The notion of adding salt to the password addresses these weaknesses.

First, let's understand the weaknesses. Suppose Goyle chooses '123' as his password, because he has trouble remembering a better password. If the attacker tries all strings less than 5 characters in length (there are fewer than a billion strings of lowercase letters and digits of length 5 or less) the attacker will certainly succeed. Even better, the attacker can pre-compute the hash values of common passwords like '123' and just compare to the stored hashes.

Furthermore, suppose Crabbe also uses that password. His hashed password is exactly the same as Goyle's, so when the attacker hashes '123' and searches the password database, they get two matches, so both accounts are compromised.

Instead, we create a random string, called salt at the time that each person creates their account, and store that salt along with the encrypted password. The password algorithm combines the salt and the password (essentially creating a longer password), hashes the longer string, and proceeds as usual.

username salt passwd
nlongbottom xocivu cdde1bed702518ef735ac8530915dff3689a4594687c9993254eb785ad6d91d6
crabbe Z98xbs 174c4a5799b09c0adae18bfe566c23684e68345fc4feca8b0b98bd1a1b5ce1c5
goyle v5bnws 0478d2c697c0dcdc276e44514069f44a369defee06bcf4a2a19c9e3020cc2e43

Thanks to the salt, now the hacker

  1. can't pre-compute the hashes, since they involve a custom salt, and
  2. can't match multiple users because even if they have the same password, they will have unique salt strings and therefore hashed values

Thus, salt is an important ingredient in password security. Most password schemes use salt, even the original Unix password encryption the 1970s used salt (although not very much). Bcrypt does too.

Reading

Please read/skim the following references, particularly if you'd like to understand the ideas behind using Bcrypt.

Practical Implementation

From here on, we leave theory and concepts behind and focus on creating a secure Flask login procedure, which we'll also do in class.

The following is a bit of an outline of the following sections:

  • Installing Bcrypt
  • Hashing with Bcrypt
  • Encoding strings and Decoding byte arrays
  • Bcrypt for Logins
  • Inserting new users
  • Dealing with Concurrency

Installing Bcrypt

We've installed Flask and PyMySQL to our virtual environments. To install bcrypt, we do a similar procedure. We'll do this in class together, but if you want to do it now, here's how:

source ~/cs304/venv/bin/activate
pip install bcrypt

Hashing An Array of Bytes

Here's the simplest way to use bcrypt. It's not yet ready for logins, but we can start:

import bcrypt

# on signup
passwd1 = b'secret'
salt1 = bcrypt.gensalt()
hash1 = bcrypt.hashpw(passwd1, salt1)

# later
login = b'secret'
hash2 = bcrypt.hashpw(login, hash1)
hash2 == hash1

The password check works iff that last comparison is true. Now, we have to learn some things before we can turn it into working code for our Flask apps.

The Dual Nature of hashpw

The hashpw function above is used in two different ways, and that's often confusing. The first argument is always the plaintext password. The second is either:

  • brand new salt for a new password or
  • the stored value from an existing password

So, in the second case, where is the salt? It's in the hashed value of the existing password. It's just the first 29 characters of the 60-character hashed value. The hashpw function just pulls out the salt and ignores the rest of the hashed string.

Here's a slightly more verbose example:

>>> salt1 = bcrypt.gensalt()
>>> hash1 = bcrypt.hashpw(passwd1, salt1)
>>> salt1
b'$2b$12$jRgsewn8B5Hz5DSPv9x/rO'
>>> hash1
b'$2b$12$jRgsewn8B5Hz5DSPv9x/rOMReSY5lvmBYLebW7TtDP4hsc64gbuwq'
>>> len(salt1)
29
>>> len(hash1)
60
>>> hash1[0:29]
b'$2b$12$jRgsewn8B5Hz5DSPv9x/rO'
>>> hash1[0:29] == salt
True

While this dual-use feature is confusing at first, it's useful because it means we only have to worry about storing the hashed password, since it includes the salt.

Encoding

The bcrypt algorithm (like other hashing algorithms) works on arrays of bytes. A Python string is similar to an array of bytes, but not quite the same.

(In fact, the primary difference between Python2 and Python3 is a clearer distinction between byte arrays and strings. In Python 3, strings are Unicode objects, not arrays of bytes. However, bcrypt needs and returns arrays of bytes, which means encoding and decoding.)

To convert a string into an byte array, we choose an encoding. An encoding is a data representation for characters. There are a number of encodings, but we will use UTF-8.

Try copy/pasting this into a Python shell

msg = 'summer (été) in France is fun! 😁'
print(msg)
enc = msg.encode('utf-8')
print(enc)
hello = '你好'
print(hello)
print(hello.encode('utf-8'))

The result looks like:

>>> msg = 'summer (été) in France is fun! 😁'
>>> print(msg)
summer (été) in France is fun! 😁
>>> enc = msg.encode('utf-8')
>>> print(enc)
b'summer (\xc3\xa9t\xc3\xa9) in France is fun! \xf0\x9f\x98\x81'
>>> hello = '你好'
>>> print(hello)
你好
>>> print(hello.encode('utf-8'))
b'\xe4\xbd\xa0\xe5\xa5\xbd'

The encoding of s in UTF-8 is just s or (in hexadecimal) [73], a one-byte sequence. But the encoding of é (formally known as latin small letter e acute is (in hexadecimal) [c3, a9], a two-byte sequence. The smilely emoji is encoded as [f0, 9f, 98, 81], a four-byte sequence. We don't have to learn more now; just know that UTF-8 has a byte sequence for pretty much every character in every language and all the official emoji.

The Chinese greeting encodes as [e4 bd a0 e5 a5 bd] a sequence of 6 bytes.

As it happens, encoding a string consisting just of letters commonly found in English (ASCII characters) in UTF-8 will look just the same as the original string. Try copy/pasting this:

msg = 'hi there, how are you?'
print(msg)
enc = msg.encode('utf-8')
print(enc)

The result looks like:

>>> msg = 'hi there, how are you?'
>>> print(msg)
hi there, how are you?
>>> enc = msg.encode('utf-8')
>>> print(enc)
b'hi there, how are you?'

But they are different datatypes:

>> type(msg)
<class 'str'>
>>> type(enc)
<class 'bytes'>

In fact, if you have sharp eyes, you can see the difference when we print:

>>> msg = 'hi there, how are you?'
>>> print(msg)
hi there, how are you?
>>> enc = msg.encode('utf-8')
>>> print(enc)
b'hi there, how are you?'

A bytes object has a b before it. You can see this difference in our first example of Bcrypt. The password was the bytes object b'secret' not the string 'secret'.

There is a companion method, .decode() that can convert a bytes object into a string, given an encoding, such as UTF-8. Since we store strings in the database, it's best to decode the hashed value before we store it in the database. (Actually, since the hash output of bcrypt is all ASCII, you won't see much difference and some automatic conversions might work, but it's better to be clear and precise.)

Because bcrypt requires bytes rather than str, we will have to encode and decode. Sorry about that. But we're now ready to use Bcrypt for our logins.

Using Bcrypt for Accounts

When a person "registers" with our app and creates an account, they'll give us a username and password. Whenever someone attempts to login, we need check that they know the user's password. So, there are two events we have to deal with: (1) storing a password and (2) checking a password. user

Storing A Password

Feel free to try this, but we'll also do it in class. Here, we have an assignment statement: passwd1 = .... In our Flask application, the passwd1 will be read from an "account request" HTML form that the user submitted when they apply for an account. I'm calling it passwd1 since it's the first time we've seen it.

import bcrypt

passwd1 = 'secret'
hashed = bcrypt.hashpw(passwd1.encode('utf-8'), bcrypt.gensalt())
stored = hashed.decode('utf-8')

This creates a hashed password that we can store in a database.

Checking a Password

To check a password, we'd read the previously hashed value from the database. Below, we'll use the same variables as above, so stored is still the stored hash (a string). I'll use passwd as the login password. In a Flask app, it would be read from a "login" HTML form they submitted when they login.

passwd = 'secret'

hashed2 = bcrypt.hashpw(passwd2.encode('utf-8'), stored.encode('utf-8'))
hashed2_str = hashed2.decode('utf-8')
hashed2_str == stored

The password is correct iff the last comparison, hashed2_str == stored is true.

In the above code, we would store the stored value in the database, and pull it out later to do the checking.

Inserting into the Database

Inserting into the database is, seemingly, pretty straightforward.

You need to create a table to store the usernames and passwords. You might also want an integer UID as a compact way to identify a user, for foreign keys and the like, since the username is likely to be variable length and might be silly things like allusernamesaretaken.

Here's a table we'll use in class:

create table userpass(
       uid int auto_increment,
       username varchar(50) not null,
       hashed char(60),
       unique(username),
       index(username),
       primary key (uid)
);

We specified above that usernames are unique: unique(username) and we'll build an index based on that: index(username), so that lookups by username will be fast. Then we'll create an integer UID for each username and use that number for foreign keys and the like.

Note that the output of bcrypt.hashpw is always a 60-character string, so char(60) is ideal for our hashed column.

So, to store a username and password, we just do the following:

    username = request.form.get('username')
    passwd1 = request.form.get('password1')
    hashed = bcrypt.hashpw(passwd1.encode('utf-8'),
                           bcrypt.gensalt())
    stored = hashed.decode('utf-8')
    print(passwd1, type(passwd1), hashed, stored)
    conn = dbi.connect()
    curs = dbi.cursor(conn)
    try:
        curs.execute('''INSERT INTO userpass(uid,username,hashed)
                        VALUES(null,%s,%s)''',
                     [username, stored])
        conn.commit()
    except Exception as err:
        print('something went wrong', repr(err))

We wrap the INSERT statement in a try/except in case anything goes wrong. But what could go wrong?

Concurrency

A real-world app will have thousands or millions of users. It'll run 24/7 and accept lots of account requests, possibly at the same time. That is, our apps will have to deal with concurrency. We'll talk more about the concept and how to handle it later in the course, but it rears its head now, so let's deal with this small piece of concurrency.

Here's the terrible scenario:

  1. Fred and George Weasley both decide to join our app.
  2. As twins, they sit down to do it together
  3. Being pranksters, they both enter the same username: 'wheezes'

Through the vagaries of fate, one of them will get inserted into the database first. Suppose that's Fred. When George's INSERT happens, there's already a wheezes in the database. Since usernames have to be unique, this gets an error.

So, there's the exception.

But wait, suppose we check first, and only insert the username if the search shows no results. Like this:

        try:
            curs.execute('''SELECT * FROM userpass 
                            WHERE username = %s''',
                         [username])
            row = curs.fetchone()
            if row is None:
                curs.execute('''INSERT INTO userpass(uid,username,hashed)
                                VALUES(null,%s,%s)''',
                             [username, hashed_str])
                conn.commit()
        except Exception as err:
            print('something went wrong', repr(err))

Ah, but the even worse scenario is this:

  1. Fred and George Weasley both decide to join our app.
  2. As twins, they sit down to do it together
  3. Being pranksters, they both enter the same username: 'wheezes'
  4. The App processes Fred up to the select and fetchone() and discovers that wheezes is an available username, and then
  5. The App processes George to the same point, discovering that wheezes is still available (because Fred's INSERT hasn't happened yet).
  6. The App switches back to Fred and insert wheezes
  7. The App switches back to George and gets the insert error.

Ick. What's the solution? We could just let the error happen and use try/except to catch it:

        try:
            curs.execute('''INSERT INTO userpass(uid,username,hashed)
                            VALUES(null,%s,%s)''',
                         [username, hashed_str])
            conn.commit()
        except Exception as err:
            flash('Sorry; that username is taken')

This technique is an excellent use of try/except.

(The technique also relies on the fact that INSERT is atomic, a concept we'll learn when we get to concurrency later in the course.)

Last Insert ID

Another thing we often want to do when a user has just been inserted, especially when there's an auto_increment field, is to find out the value that they were assigned: the value of the auto_increment field.

We could do a search, but there's a better way. A feature of MySQL (and some other databases) is a function called last_insert_id() which returns the most recent auto_increment value in the current connection. In other words, the database server deals with the concurrency issue. It doesn't tell you the most recent value for anyone, it tells you the most recent value for you.

So, we just need to do:

        curs.execute('select last_insert_id()')
        row = curs.fetchone()
        uid = row[0]
        flash('FYI, you were issued UID {}'.format(uid))

(The cursor above is a tuple cursor, not a dictionary cursor, so row[0] gets the first element of the tuple, which in this case is the only element of the tuple.)

Final Code

We've come a long way and taken some detours through encoding/decoding and dealing with concurrency, but here's our finished login code, in several parts.

Route to Join the App (Storing a Username and Password)

Here's the code to hash and store a username and password when they create an account. The account creation form is one of those where they ask for the password twice, to avoid problems with typos in the password field. There's code below to check that the two passwords are the same. Afterwards, we store some important values (username and UID) in the session. But most of it should be familiar from the code above.

@app.route('/join/', methods=["POST"])
def join():
    username = request.form.get('username')
    passwd1 = request.form.get('password1')
    passwd2 = request.form.get('password2')
    if passwd1 != passwd2:
        flash('passwords do not match')
        return redirect( url_for('index'))
    hashed = bcrypt.hashpw(passwd1.encode('utf-8'),
                           bcrypt.gensalt())
    stored = hashed.decode('utf-8')
    print(passwd1, type(passwd1), hashed, stored)
    conn = dbi.connect()
    curs = dbi.cursor(conn)
    try:
        curs.execute('''INSERT INTO userpass(uid,username,hashed)
                        VALUES(null,%s,%s)''',
                     [username, stored])
        conn.commit()
    except Exception as err:
        flash('That username is taken: {}'.format(repr(err)))
        return redirect(url_for('index'))
    curs.execute('select last_insert_id()')
    row = curs.fetchone()
    uid = row[0]
    flash('FYI, you were issued UID {}'.format(uid))
    session['username'] = username
    session['uid'] = uid
    session['logged_in'] = True
    session['visits'] = 1
    return redirect( url_for('user', username=username) )

Login Route (Checking a Username and Password)

Here's the code to check a person's username and password and log them in:

@app.route('/login/', methods=["POST"])
def login():
    username = request.form.get('username')
    passwd = request.form.get('password')
    conn = dbi.connect()
    curs = dbi.dict_cursor(conn)
    curs.execute('''SELECT uid,hashed
                    FROM userpass
                    WHERE username = %s''',
                 [username])
    row = curs.fetchone()
    if row is None:
        # Same response as wrong password,
        # so no information about what went wrong
        flash('login incorrect. Try again or join')
        return redirect( url_for('index'))
    stored = row['hashed']
    print('database has stored: {} {}'.format(stored,type(stored)))
    print('form supplied passwd: {} {}'.format(passwd,type(passwd)))
    hashed2 = bcrypt.hashpw(passwd.encode('utf-8'),
                            stored.encode('utf-8'))
    hashed2_str = hashed2.decode('utf-8')
    print('rehash is: {} {}'.format(hashed2_str,type(hashed2_str)))
    if hashed2_str == stored:
        print('they match!')
        flash('successfully logged in as '+username)
        session['username'] = username
        session['uid'] = row['uid']
        session['logged_in'] = True
        session['visits'] = 1
        return redirect( url_for('user', username=username) )
    else:
        flash('login incorrect. Try again or join')
        return redirect( url_for('index'))

Logout Route

Before we conclude, we should look at how to logout. Since we trust the information in the session, and the session records the fact that someone is logged in and their username and uid, all we have to do to logout is remove those values:

@app.route('/logout/')
def logout():
    if 'username' in session:
        username = session['username']
        session.pop('username')
        session.pop('uid')
        session.pop('logged_in')
        flash('You are logged out')
        return redirect(url_for('index'))
    else:
        flash('you are not logged in. Please login or join')
        return redirect( url_for('index') )

In fact, you can seee that this code determines whether someone is logged it by whether 'username' is a key in the session dictionary. When we pop that value out of the session dictionary, the person will be logged out.

Conclusion

This has been a long, complex journey, with detours to learn about encoding and a little about concurrency, but the result is that we've learned

  • how to joining an app
  • how to store passwords in encrypted form
  • how to check passwords on login
  • how to log users in and out of our app

Summary

BCRYPT

  • Passwords should never be stored in plaintext in your database; they should be one-way hashed with a cryptographically secure hash algorithm, such as SHA-256
  • Passwords can still be cracked by brute force.
  • To thwart a brute-force approach, we can use an extremely slow hashing algorithm, such as bcrypt
  • To use bcrypt, we have to encode strings as byte arrays. I suggest UTF-8.
  • The bcrypt algorithm also uses salt, which is an additional random string that means that two accounts that use the same password will have different hash values. The function is bcrypt.gensalt().
  • bcrypt stores all the info it needs in the hash value, so you only neeed to store that one value, which is a 60-byte array.
  • The byte array should be decoded (again using UTF-8)
  • Practical Code:
    • register: hashed1 = bcrypt.hashpw(register.encode('utf-8'), bcrypt.gensalt())
    • login: hashed2 = bcrypt.hashpw(login.encode('utf-8'), hashed1.encode('utf-8'))
  • A purported password matches if and only if the newly hashed value matches the old hashed value: hashed2 == hashed1

LOGINS

  • Your app can log someone in by putting their userid in the session
  • Your app can log someone out by removing their userid from the session
  • Because of concurrency, when registering someone new by inserting someone into a table with an auto_increment column, your app should use the MySQL last_insert_id() function to determine the ID; not anything involving, say, max.