Bcrypt and Login¶
To have secure logins, we need a way to store users's passwords.
Because hackers have been known to penetrate into secure
servers, we need extra layers of defense. In particular, we will not
store the plaintext password, but rather a one-way hash of the
password.
- Hashed Passwords
- Salt
- Reading
- Practical Implementation
- Installing Bcrypt
- Hashing An Array of Bytes
- The Dual Nature of hashpw
- Encoding
- Using Bcrypt for Accounts
- Storing A Password
- Checking a Password
- Inserting into the Database
- Concurrency
- Last Insert ID
- Final Code
- Route to Join the App (Storing a Username and Password)
- Login Route (Checking a Username and Password)
- Logout Route
- Requiring Login
- Conclusion
- Summary
Hashed Passwords¶
How does that work? There are two phases to the process: creating an account and verifying a password when someone logs in.
Imagine that Neville Longbottom is creating an account and he wants to
use dilligrout
as his password.
Account Creation¶
Neville fills out the form, the server receives the username
(nlongbottom
) and password (dilligrout
). It computes a hash of
the password, and stores the username and hash:
username | passwd |
---|---|
nlongbottom | 591747ca7ce5608a92d17236d3b1d0d7a80968f764bc13288e4471d0bd30060b |
The system then discards the plaintext password, not storing it anywhere.
Login¶
Later, when Neville logs in, he provides the same username and
password (he carefully wrote the password down so he wouldn't forget
it!), the system again computes the hash of dilligrout
and gets
591..60b
. That matches the stored hash, and Neville is logged in.
Hackers¶
If Malfoy hacks into the server and gets Neville's hashed password, he
can't reverse the computation to get "dilligrout" and so he can't use
the hashed password to login. His only hope is to run the one-way hash
forward on many, many possible passwords, hoping that one of them
will hash to the 591...60b
value. If he hits on "dilligrout", it
will, of course, match, and he'll know Neville's password.
Salt¶
The scheme above has two weaknesses. First, a user can choose a stupid password, one the attacker can easily guess. Secondly, two or more users might choose the same password, and the attacker needs to only guess one to get them all. The notion of adding salt to the password addresses these weaknesses.
First, let's understand the weaknesses. Suppose Goyle chooses '123' as his password, because he has trouble remembering a better password. If the attacker tries all strings less than 5 characters in length (there are fewer than a billion strings of lowercase letters and digits of length 5 or less) the attacker will certainly succeed. Even better, the attacker can pre-compute the hash values of common passwords like '123' and just compare to the stored hashes.
Furthermore, suppose Crabbe also uses that password. His hashed password is exactly the same as Goyle's, so when the attacker hashes '123' and searches the password database, they get two matches, so both accounts are compromised.
Instead, we create a random string, called salt
at the time
that each person creates their account, and store that salt along with
the encrypted password. The password algorithm combines the salt and
the password (essentially creating a longer password), hashes the
longer string, and proceeds as usual.
username | salt | passwd |
---|---|---|
nlongbottom | xocivu | cdde1bed702518ef735ac8530915dff3689a4594687c9993254eb785ad6d91d6 |
crabbe | Z98xbs | 174c4a5799b09c0adae18bfe566c23684e68345fc4feca8b0b98bd1a1b5ce1c5 |
goyle | v5bnws | 0478d2c697c0dcdc276e44514069f44a369defee06bcf4a2a19c9e3020cc2e43 |
Thanks to the salt, now the hacker
- can't pre-compute the hashes, since they involve a custom salt, and
- can't match multiple users because even if they have the same password, they will have unique salt strings and therefore hashed values
Thus, salt is an important ingredient in password security. Most password schemes use salt, even the original Unix password encryption the 1970s used salt (although not very much). Bcrypt does too.
Reading¶
Please read/skim the following references, particularly if you'd like to understand the ideas behind using Bcrypt.
- Dustin Boswell: How to handle Passwords. I like this presentation a lot. Unfortunately, that site has been taken down, but I found a copy archived by the Internet Archive (the WayBack machine) in April 2019: Dustin Boswell on How to Handle Passwords It gives a kind of play-by-play history of attacks and defenses, which I liked.
- Coda Hale: How to Safely Store a Password. You know where this person stands!
- bcrypt python module usage notes
Practical Implementation¶
From here on, we leave theory and concepts behind and focus on creating a secure Flask login procedure, which we'll also do in class.
The following is a bit of an outline of the following sections:
- Installing Bcrypt
- Hashing with Bcrypt
- Encoding strings and Decoding byte arrays
- Bcrypt for Logins
- Inserting new users
- Dealing with Concurrency
Installing Bcrypt¶
We've installed Flask and PyMySQL to our virtual environments. To install bcrypt, we do a similar procedure. We'll do this in class together, but if you want to do it now, here's how:
source ~/cs304/venv/bin/activate
pip install bcrypt
Hashing An Array of Bytes¶
Here's the simplest way to use bcrypt. It's not yet ready for logins, but we can start:
import bcrypt
# on signup
passwd1 = b'secret'
salt1 = bcrypt.gensalt()
hash1 = bcrypt.hashpw(passwd1, salt1)
# later
login = b'secret'
hash2 = bcrypt.hashpw(login, hash1)
hash2 == hash1
The password check works iff that last comparison is true. Now, we have to learn some things before we can turn it into working code for our Flask apps.
The Dual Nature of hashpw¶
The hashpw
function above is used in two different ways, and that's
often confusing. The first argument is always the plaintext
password. The second is either:
- brand new salt for a new password or
- the stored value from an existing password
So, in the second case, where is the salt? It's in the hashed value
of the existing password. It's just the first 29 characters of the
60-character hashed value. The hashpw
function just pulls out the
salt and ignores the rest of the hashed string.
Here's a slightly more verbose example:
>>> salt1 = bcrypt.gensalt()
>>> hash1 = bcrypt.hashpw(passwd1, salt1)
>>> salt1
b'$2b$12$jRgsewn8B5Hz5DSPv9x/rO'
>>> hash1
b'$2b$12$jRgsewn8B5Hz5DSPv9x/rOMReSY5lvmBYLebW7TtDP4hsc64gbuwq'
>>> len(salt1)
29
>>> len(hash1)
60
>>> hash1[0:29]
b'$2b$12$jRgsewn8B5Hz5DSPv9x/rO'
>>> hash1[0:29] == salt
True
While this dual-use feature is confusing at first, it's useful because it means we only have to worry about storing the hashed password, since it includes the salt.
Encoding¶
The bcrypt algorithm (like other hashing algorithms) works on arrays of bytes. A Python string is similar to an array of bytes, but not quite the same.
(In fact, the primary difference between Python2 and Python3 is a clearer distinction between byte arrays and strings. In Python 3, strings are Unicode objects, not arrays of bytes. However, bcrypt needs and returns arrays of bytes, which means encoding and decoding.)
To convert a string into an byte array, we choose an encoding. An encoding is a data representation for characters. There are a number of encodings, but we will use UTF-8.
Try copy/pasting this into a Python shell
msg = 'summer (été) in France is fun! 😁'
print(msg)
enc = msg.encode('utf-8')
print(enc)
hello = '你好'
print(hello)
print(hello.encode('utf-8'))
The result looks like:
>>> msg = 'summer (été) in France is fun! 😁'
>>> print(msg)
summer (été) in France is fun! 😁
>>> enc = msg.encode('utf-8')
>>> print(enc)
b'summer (\xc3\xa9t\xc3\xa9) in France is fun! \xf0\x9f\x98\x81'
>>> hello = '你好'
>>> print(hello)
你好
>>> print(hello.encode('utf-8'))
b'\xe4\xbd\xa0\xe5\xa5\xbd'
The encoding of s
in UTF-8 is just s
or (in hexadecimal) [73]
, a
one-byte sequence. But the encoding of é (formally known as latin
small letter e acute is (in
hexadecimal) [c3, a9]
, a two-byte sequence. The smilely emoji is
encoded as [f0, 9f, 98, 81]
, a four-byte sequence. We don't have to
learn more now; just know that UTF-8 has a byte sequence for pretty
much every character in every language and all the official emoji.
The Chinese greeting encodes as [e4 bd a0 e5 a5 bd]
a sequence of 6
bytes.
As it happens, encoding a string consisting just of letters commonly found in English (ASCII characters) in UTF-8 will look just the same as the original string. Try copy/pasting this:
msg = 'hi there, how are you?'
print(msg)
enc = msg.encode('utf-8')
print(enc)
The result looks like:
>>> msg = 'hi there, how are you?'
>>> print(msg)
hi there, how are you?
>>> enc = msg.encode('utf-8')
>>> print(enc)
b'hi there, how are you?'
But they are different datatypes:
>> type(msg)
<class 'str'>
>>> type(enc)
<class 'bytes'>
In fact, if you have sharp eyes, you can see the difference when we print:
>>> msg = 'hi there, how are you?'
>>> print(msg)
hi there, how are you?
>>> enc = msg.encode('utf-8')
>>> print(enc)
b'hi there, how are you?'
A bytes
object has a b
before it. You can see this difference in
our first example of Bcrypt. The
password was the bytes object b'secret'
not the string 'secret'
.
There is a companion method, .decode()
that can convert a bytes
object into a string, given an encoding, such as UTF-8. Since we store
strings in the database, it's best to decode
the hashed value before
we store it in the database. (Actually, since the hash output of
bcrypt
is all ASCII, you won't see much difference and some
automatic conversions might work, but it's better to be clear and
precise.)
Because bcrypt
requires bytes
rather than str
, we will have to
encode and decode. Sorry about that. But we're now ready to use Bcrypt
for our logins.
Using Bcrypt for Accounts¶
When a person "registers" with our app and creates an account, they'll give us a username and password. Whenever someone attempts to login, we need check that they know the user's password. So, there are two events we have to deal with: (1) storing a password and (2) checking a password. user
Storing A Password¶
Feel free to try this, but we'll also do it in class. Here, we have an
assignment statement: passwd1 = ...
. In our Flask application, the
passwd1
will be read from an "account request" HTML form that the
user submitted when they apply for an account. I'm calling it
passwd1
since it's the first time we've seen it.
import bcrypt
passwd1 = 'secret'
hashed = bcrypt.hashpw(passwd1.encode('utf-8'), bcrypt.gensalt())
stored = hashed.decode('utf-8')
This creates a hashed password that we can store in a database.
Checking a Password¶
To check a password, we'd read the previously hashed value from the
database. Below, we'll use the same variables as above, so stored
is
still the stored hash (a string). I'll use passwd
as the login
password. In a Flask app, it would be read from a "login" HTML form
they submitted when they login.
passwd = 'secret'
hashed2 = bcrypt.hashpw(passwd2.encode('utf-8'), stored.encode('utf-8'))
hashed2_str = hashed2.decode('utf-8')
hashed2_str == stored
The password is correct iff the last comparison, hashed2_str == stored
is true.
In the above code, we would store the stored
value in the
database, and pull it out later to do the checking.
Inserting into the Database¶
Inserting into the database is, seemingly, pretty straightforward.
You need to create a table to store the usernames and passwords. You
might also want an integer UID as a compact way to identify a user,
for foreign keys and the like, since the username is likely to be
variable length and might be silly things like allusernamesaretaken
.
Here's a table we'll use in class:
create table userpass(
uid int auto_increment,
username varchar(50) not null,
hashed char(60),
unique(username),
index(username),
primary key (uid)
);
We specified above that usernames are unique: unique(username)
and
we'll build an index based on that: index(username)
, so that lookups
by username will be fast. Then we'll create an integer UID for each
username and use that number for foreign keys and the like.
Note that the output of bcrypt.hashpw
is always a 60-character
string, so char(60)
is ideal for our hashed
column.
So, to store a username and password, we just do the following:
username = request.form.get('username')
passwd1 = request.form.get('password1')
hashed = bcrypt.hashpw(passwd1.encode('utf-8'),
bcrypt.gensalt())
stored = hashed.decode('utf-8')
print(passwd1, type(passwd1), hashed, stored)
conn = dbi.connect()
curs = dbi.cursor(conn)
try:
curs.execute('''INSERT INTO userpass(uid,username,hashed)
VALUES(null,%s,%s)''',
[username, stored])
conn.commit()
except Exception as err:
print('something went wrong', repr(err))
We wrap the INSERT statement in a try/except
in case anything goes
wrong. But what could go wrong?
Concurrency¶
A real-world app will have thousands or millions of users. It'll run 24/7 and accept lots of account requests, possibly at the same time. That is, our apps will have to deal with concurrency. We'll talk more about the concept and how to handle it later in the course, but it rears its head now, so let's deal with this small piece of concurrency.
Here's the terrible scenario:
- Fred and George Weasley both decide to join our app.
- As twins, they sit down to do it together
- Being pranksters, they both enter the same username:
'wheezes'
Through the vagaries of fate, one of them will get inserted into the
database first. Suppose that's Fred. When George's INSERT happens,
there's already a wheezes
in the database. Since usernames have to
be unique, this gets an error.
So, there's the exception.
But wait, suppose we check first, and only insert the username if the search shows no results. Like this:
try:
curs.execute('''SELECT * FROM userpass
WHERE username = %s''',
[username])
row = curs.fetchone()
if row is None:
curs.execute('''INSERT INTO userpass(uid,username,hashed)
VALUES(null,%s,%s)''',
[username, hashed_str])
conn.commit()
except Exception as err:
print('something went wrong', repr(err))
Ah, but the even worse scenario is this:
- Fred and George Weasley both decide to join our app.
- As twins, they sit down to do it together
- Being pranksters, they both enter the same username:
'wheezes'
- The App processes Fred up to the
select
andfetchone()
and discovers thatwheezes
is an available username, and then - The App processes George to the same point, discovering that
wheezes
is still available (because Fred's INSERT hasn't happened yet). - The App switches back to Fred and insert
wheezes
- The App switches back to George and gets the insert error.
Ick. What's the solution? We could just let the error happen and use
try/except
to catch it:
try:
curs.execute('''INSERT INTO userpass(uid,username,hashed)
VALUES(null,%s,%s)''',
[username, hashed_str])
conn.commit()
except Exception as err:
flash('Sorry; that username is taken')
This technique is an excellent use of try/except
.
(The technique also relies on the fact that INSERT is atomic, a concept we'll learn when we get to concurrency later in the course.)
Last Insert ID¶
Another thing we often want to do when a user has just been inserted,
especially when there's an auto_increment
field, is to find out the
value that they were assigned: the value of the auto_increment
field.
We could do a search, but there's a better way. A feature of MySQL
(and some other databases) is a function called last_insert_id()
which returns the most recent auto_increment
value in the current
connection. In other words, the database server deals with the
concurrency issue. It doesn't tell you the most recent value for
anyone, it tells you the most recent value for you.
So, we just need to do:
curs.execute('select last_insert_id()')
row = curs.fetchone()
uid = row[0]
flash('FYI, you were issued UID {}'.format(uid))
(The cursor above is a tuple cursor, not a dictionary cursor, so
row[0]
gets the first element of the tuple, which in this case is
the only element of the tuple.)
Final Code¶
We've come a long way and taken some detours through encoding/decoding and dealing with concurrency, but here's our finished login code, in several parts.
Route to Join the App (Storing a Username and Password)¶
Here's the code to hash and store a username and password when they create an account. The account creation form is one of those where they ask for the password twice, to avoid problems with typos in the password field. There's code below to check that the two passwords are the same. Afterwards, we store some important values (username and UID) in the session. But most of it should be familiar from the code above.
@app.route('/join/', methods=["POST"])
def join():
username = request.form.get('username')
passwd1 = request.form.get('password1')
passwd2 = request.form.get('password2')
if passwd1 != passwd2:
flash('passwords do not match')
return redirect( url_for('index'))
hashed = bcrypt.hashpw(passwd1.encode('utf-8'),
bcrypt.gensalt())
stored = hashed.decode('utf-8')
print(passwd1, type(passwd1), hashed, stored)
conn = dbi.connect()
curs = dbi.cursor(conn)
try:
curs.execute('''INSERT INTO userpass(uid,username,hashed)
VALUES(null,%s,%s)''',
[username, stored])
conn.commit()
except Exception as err:
flash('That username is taken: {}'.format(repr(err)))
return redirect(url_for('index'))
curs.execute('select last_insert_id()')
row = curs.fetchone()
uid = row[0]
flash('FYI, you were issued UID {}'.format(uid))
session['username'] = username
session['uid'] = uid
session['logged_in'] = True
session['visits'] = 1
return redirect( url_for('user', username=username) )
Login Route (Checking a Username and Password)¶
Here's the code to check a person's username and password and log them in:
@app.route('/login/', methods=["POST"])
def login():
username = request.form.get('username')
passwd = request.form.get('password')
conn = dbi.connect()
curs = dbi.dict_cursor(conn)
curs.execute('''SELECT uid,hashed
FROM userpass
WHERE username = %s''',
[username])
row = curs.fetchone()
if row is None:
# Same response as wrong password,
# so no information about what went wrong
flash('login incorrect. Try again or join')
return redirect( url_for('index'))
stored = row['hashed']
print('database has stored: {} {}'.format(stored,type(stored)))
print('form supplied passwd: {} {}'.format(passwd,type(passwd)))
hashed2 = bcrypt.hashpw(passwd.encode('utf-8'),
stored.encode('utf-8'))
hashed2_str = hashed2.decode('utf-8')
print('rehash is: {} {}'.format(hashed2_str,type(hashed2_str)))
if hashed2_str == stored:
print('they match!')
flash('successfully logged in as '+username)
session['username'] = username
session['uid'] = row['uid']
session['logged_in'] = True
session['visits'] = 1
return redirect( url_for('user', username=username) )
else:
flash('login incorrect. Try again or join')
return redirect( url_for('index'))
Logout Route¶
Before we conclude, we should look at how to logout. Since we trust the information in the session, and the session records the fact that someone is logged in and their username and uid, all we have to do to logout is remove those values:
@app.route('/logout/')
def logout():
if 'username' in session:
username = session['username']
session.pop('username')
session.pop('uid')
session.pop('logged_in')
flash('You are logged out')
return redirect(url_for('index'))
else:
flash('you are not logged in. Please login or join')
return redirect( url_for('index') )
In fact, you can seee that this code determines whether someone is
logged it by whether 'username'
is a key in the session
dictionary. When we pop
that value out of the session
dictionary,
the person will be logged out.
Requiring Login¶
For many apps, you will have endpoints where you want to require that a user be logged in. For example, an app where you allow people to post information should probably require that the user be logged in. For concreteness, let's suppose the endpoint is like this:
@app.route('/post_stuff/', methods=['GET', 'POST'])
def post_stuff():
uid = session.get('uid') # the user has to be logged in
conn = dbi.connect()
if request.method == 'GET':
return render_template('form_for_posting.html')
else:
helper.post_stuff(conn, uid, request.form.get('stuff_to_post'))
Now, you might object that people have to login when they go to the
homepage and so they literally can't navigate to the /post_stuff/
endpoint without going through login. However, remember that a
nefarious user can enter any URL into their browser at any time. So,
they can skip your login form, put /post_stuff/
in the location box
of the browser and try to post stuff without being authenticated. Or
they can post using a Python program or other client (as we will learn
later this semester).
We need to prevent this. How to do so? The easiest way is to just begin your handler function with a little incantation like this:
if 'uid' not in session:
flash('you must be logged in')
return redirect(url_for('login_page'))
The flashing is optional, but might be worthwhile; that's up to you. The resulting handler is like this:
@app.route('/post_stuff/', methods=['GET', 'POST'])
def post_stuff():
if 'uid' not in session:
flash('you must be logged in')
return redirect(url_for('login_page'))
uid = session.get('uid') # we know this will work
conn = dbi.connect()
if request.method == 'GET':
return render_template('form_for_posting.html')
else:
helper.post_stuff(conn, uid, request.form.get('stuff_to_post'))
Notice that because the three-line incantation uses return
, we don't
need to indent the rest of the function body within an else
.
There are fancier techniques that avoid copy/pasting that code into every handler function that needs it, and we can cover that later. But the idea above is sufficient.
Note that the code above doesn't care who is logged in; just that
someone is. (It's also important that the logout code has to
remove (that is. pop
) the uid
key from the session.) If the
endpoint requires a particularly kind of user to be logged in,
obviously the code gets more complex.
You definitely need to do something to defend your endpoints against usage by unauthorized people.
Conclusion¶
This has been a long, complex journey, with detours to learn about encoding and a little about concurrency, but the result is that we've learned
- how to joining an app
- how to store passwords in encrypted form
- how to check passwords on login
- how to log users in and out of our app
Summary¶
BCRYPT¶
- Passwords should never be stored in plaintext in your database; they should be one-way hashed with a cryptographically secure hash algorithm, such as SHA-256
- Passwords can still be cracked by brute force.
- To thwart a brute-force approach, we can use an extremely slow
hashing algorithm, such as
bcrypt
- To use
bcrypt
, we have toencode
strings as byte arrays. I suggestUTF-8
. - The
bcrypt
algorithm also uses salt, which is an additional random string that means that two accounts that use the same password will have different hash values. The function isbcrypt.gensalt()
. bcrypt
stores all the info it needs in the hash value, so you only neeed to store that one value, which is a 60-byte array.- The byte array should be decoded (again using UTF-8)
- Practical Code:
- register:
hashed1 = bcrypt.hashpw(register.encode('utf-8'), bcrypt.gensalt())
- login:
hashed2 = bcrypt.hashpw(login.encode('utf-8'), hashed1.encode('utf-8'))
- register:
- A purported password matches if and only if the newly hashed value
matches the old hashed value:
hashed2 == hashed1
LOGINS¶
- Your app can log someone in by putting their userid in the
session
- Your app can log someone out by removing their userid from the
session
- Because of concurrency, when registering someone new by inserting
someone into a table with an
auto_increment
column, your app should use the MySQLlast_insert_id()
function to determine the ID; not anything involving, say,max
.