I agree! It's good to know what the experts suggest, what the standards are, and why they are useful.
For sure. The goal is to thwart brute-force attacks. Since brute-force requires trying lots and lots of passwords, slowing the attacker down is critical.
If it takes a millisecond to hash a password, the attacker can try a million passwords per second (per processor).
If it takes a second to hash a password, the attacker can try one password per second (per processor).
Second question first: the output of the hashpw()
function is a 60-byte array, which we can convert (encode) into a
60-character string and then store in the database.
The char(60) versus varchar(60) makes
a tiny amount of difference: a bit less overhead (time and
space) for char(60) versus varchar(60). So
it might be worth doing, because in this class we are trying to think
about best practices, and the best practice is that if something is
fixed-length, use fixed-length fields.
But, honestly, it's negligible.
Convenience of the programmer. It's nice to have one return value from the hashpw function and only one thing to store in the database.
Great question! Yes, a mutex would solve the problem of concurrency. The weakness of mutex is that if we have several web applications running, all in separate processes, we then have to coordinate mutexes across multiple processes. That's much harder.
Since the DBMS is a single point of access, and the INSERT/EXCEPT technique even works across processes, let's do that.
Another great question! We could have a second table of old passwords (1:N relationship with the USER table), and we check that their "new" password doesn't successfully hash to any of those old values, then we can accept the new password.
I'm not sure why you would think so; there must be some misconception. I can think of two, but I could be wrong.
That's not the case. The hashed passwords are in the database; the contents of the database is a matter of security. It should be kept secret. But it's not being put in Github. Github only saves files in your directory.
Of course, if you mysqldump the contents
of the database to your directory and then commit and push
that dumped file to Github, that would be bad. So don't do that.
Historically, people did that: they created a "secure" algorithm and kept it secret. But often their algorithm wasn't as secure as they thought, and their data was compromised.
In part, this is because security experts never saw their algorithm. The experts could have warned them.
This approach is now derisively known as security through obscurity
The modern thinking is to have strong, public algorithms, designed and studied by experts, where the "private" stuff is either unnecessary (as with Bcrypt) or hidden by design, as with, say, your private key in public-key encryption. (The public key algorithm is in the clear, but to compromise the security, the attacker needs to know the private key, and they don't.)
Are either of these guesses correct?
Let's do it!