Episode: 1598 Title: HPR1598: Hashing and Password Security Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1598/hpr1598.mp3 Transcribed: 2025-10-18 05:37:49 --- It's Wednesday 17th on September 2014. This is an HPR episode 1,598 entitled, Hashing, and Password Security, and is part of the series, Privacy, and Security. It is hosted by Aukka, and is about 26 minutes long. Feedback can be sent to Wilnick at Wilnick.com or by leaving a comment on this episode. The summary is, Understanding Password Security begins with Understanding Hashing. This episode of HPR is brought to you by An Honesthost.com. Get 15% discount on all shared hosting with the offer code HPR15. That's HPR15. Better web hosting that's Honest and Fair at An Honesthost.com. Hello, this is Aukka. Welcome to Hacker Public Radio and another episode in our series on Security and Privacy. And what I want to do now is go off in a somewhat different direction. We've taken a fairly in-depth look at encrypting email in various ways. And we just put in an episode about a sensible security model based on some work of Bruce Schneier, who I greatly admire as an expert on the subject. So now what I want to do is get into some issues about passwords and start unpacking some of the technology involved with this. Now, the most common way of providing secure access to any kind of system these days is through the use of passwords. If you're like me, then you probably are asked to create a password for just about every website you go to these days if you want to do anything at all. Certainly e-commerce, but also for posting comments or posting to social media, any of these things. And it just becomes a requirement that you have a password. Now, the implication is that that's going to provide a level of security. Now, following on our last tutorial, we should ask a few questions about just how effective this measure is. Since someone posting in your name to Twitter is significantly different from someone accessing your bank account, although there are certainly examples of people causing a lot of trouble for other people by posting in their name. Now, since the assets being protected are very different, it would be reasonable to approach the problem of security somewhat differently in these cases. But given the ubiquity of passwords as the authentication for online accounts, we need to look at the security involved. Now, note that I'm approaching this from the standpoint of the owner of the site in question for this tutorial. And then we'll follow up with one that takes a look at it from the side of the user. Now, the asset you are trying to protect is access to your account on some website or IT system. Given that you are not the owner, you don't get to choose how this is done. The owner chooses and does so for their own reasons. If they do a bad job, there can be potential consequences, of course. But you will always have to agree to follow their rules if you use their system. Passwords are not the ideal way to protect this access. And I would not be at all surprised. In fact, I read about things almost every day that talk about major changes coming. And certainly, 10 years and I'd be very surprised if passwords is the primary way that we authenticate people. But for now, it's what we have. So, following on our last episode, the first question is, what are the threats? Now, one threat is you. If you can be tricked into revealing your password, then you have given away the access. And what are called social engineering attacks are frequently successful. In fact, they probably the most successful way for bad guys, let us say, to get into systems. Now, if you are asked by someone you believe to be legitimate to tell them your password, will you? One approach that often works in large organizations is to call someone on the phone and say something like, I'm so and so from the IT department and I need to check your password. Maybe they use words like verification as part of this. And instead of making a phone call, it could be an email asking you to click a link and it might look very official and have the right graphics from the company. If you can be tricked like this, they're in. Oh, generally, these kinds of attacks are examples of a targeted attack against a specific company or individual. If the attacker has a good reason to want to get in, it is worth putting in the effort. But you should recognize that this kind of attack does require a lot of effort to get a single password. And it is only worth doing if the payoff is great. If you're trying to steal the intellectual property of a competitor, for instance, it is worth doing. But the vast majority of cases are not like this. Most attackers are looking for a financial reward and that usually comes when you can get large numbers of passwords which can then be leveraged to get into bank accounts in the like. One of the big weaknesses of passwords as we saw last time is that really secure passwords are hard to remember. And since every site you go to now requires an account to do pretty much anything, we all end up with hundreds of passwords. And people being what they are, the odds are very good that most people are using the same password on a whole bunch of sites. Now, if that password is used to post a comment on someone's blog, you probably don't feel like it is a security risk. And in that context, it isn't. But if you also used that same password for your bank account, anyone who could hack the blog site and get your password would then try to use it at the bank and bingo, they now have access to all of your money. Of course, most hackers are not targeting blog sites because there are not large numbers of passwords to be harvested there. They go after the large sites that have millions of accounts in passwords. And when they get them, they can then try to use them at other sites. And if the other sites have a standard way of creating account names, or if you always use the same account name each time, combining an account name in password is a piece of cake. And with computers to automate the process, you can try large numbers of them every second. Even if only 10% of them work, that can be a huge payoff for a cyber criminal. So this is what you need to worry about. Now, we'll get into the steps you can take to protect yourself in the next episode, but for this one, I wanna look at how the site owner should protect your password. The worst case scenario is that they simply store your password in the clear in a database, which means they don't encrypt it in any way. They just take the text and stick it in some database field somewhere. All right, in that case, anyone who can get into the system has all of the passwords game over. The owner may try to interpose a password to access the database, but this is not sufficient security. A good clue that the owner is doing something like this is that they limit the length of the password. You see, they may be using a database that sets the field length on what they store, and you cannot exceed that field length. If you ever encounter a system that says your password cannot have more than eight or 12, or whatever number of characters be extremely suspicious. And for anything really important, test it. They may let you enter a 20 character password, but try logging in using only the first 19 characters. If that works, you have pretty compelling evidence that they not only store passwords in the clear, but that they throw away any characters over some limit. You may not care if it's someone's blog, but I would never do online banking that way. I would either change banks or just decide to never use the online function. And if you are not using the online function, you should see if there's any way to instruct the bank to ignore any attempt at online activity. Now in the US, two recent court cases are pointing towards an interesting standard here. They involve attackers who are able to fraudulently access bank accounts to send wire transfers to accounts outside the United States. In one case, the bank had urged the customer to adopt better security practices, essentially requiring multiple authorizations for any transfers. And the customer declined in writing to do so. The fraud occurred, the customer sued the bank for negligence and the court sided with the bank since it had urged better practice on the customer. In a very similar case, the customer prevailed in the lawsuit because there was no evidence that the bank had any good security around the process. The moral of the story is that you need to take advantage of any good security practices offered and that you should turn off anything you don't need to use. I personally would never allow any wire transfers to be initiated through online banking. I would require the bank to only accept such transfers if I show up in person with good identification. Or maybe even turn off the ability to do wire transfers if you don't need it. Most people don't. Now, back to passwords. At a minimum, passwords should be stored in an encrypted form to make it difficult for an attacker to get them, even if they have access to the database where they are stored. Note that I said, difficult, not impossible. A sufficiently motivated hacker has any number of tools available to get encrypted passwords. But anytime you can put a speed bump in their path, that is good. And that brings us to the concept of a hash. Hashing uses cryptography to generate an encrypted form of the password and employs a one-way function to do so. A one-way function is something we looked at before with public key cryptography and means a function that is easy to compute in one direction, but extremely difficult to compute in reverse. So here are the characteristics of a good hashing function. And if you check the show notes, I've given you a link to a Wikipedia article that addresses this. So what a good hash function should do, it is easy to compute the hash value for any given message. So we need to be computing hash values all the time if it takes up a lot of computer processing power, that becomes a problem. You need to do it quickly and easily. It should be infeasible to generate a message that has a given hash. All right? One of the things we do with hashing is we use it to attest to the authenticity of something. So if I could start with a hash and then work backwards to generate a message that had that hash, that would be a problem. So you want something that if you start with the hash value, you cannot create a message that is going to generate that or at least not at all easily. Third, it should be infeasible to modify a message without changing the hash. So this is very important again because the hash is used to attest to the authenticity of the message. And finally, it is infeasible to find two different messages with the same hash. Now, like all other forms of cryptography, advances in computer technology can make older forms vulnerable to brute force attack. As you can see from the Wikipedia article that I've quoted, the term we use is infeasible, not impossible. For instance, the MD5 algorithm was once used for this purpose. It was developed in 1991 by Ron Revest, who is the R of RSA, but a flaw was found in 1996 and it is no longer used for security. It is still used for verifying file integrity though since it meets condition three above. If even one bit gets flipped in a file, the MD5 hash will be completely different. So if you ever download ISOs on the internet, chances are they will come with an MD5 hash that can be used to validate that the file has not been corrupted. And for that purpose, it is perfectly good. But for security purposes, it should be avoided. I recently read an alarmist article on how every password was crackable. And noted that in this article, all of the passwords were hashed with MD5. Needless to say, I was skeptical of the conclusions. Now, when MD5 was replaced, the next replacement was SHA1, which stands for Secure Hash Algorithm number one. This was designed by the NSA and was required in many government applications. In 2005, weaknesses were found however, which led to SHA2. And recently, a competition led to a new replacement, which will be known as SHA3. SHA2 has never been found to have a weakness, but because it shares some features with SHA1, the government decided it wanted an alternative that used a very different approach. So for password security, you would want to use either SHA2 or SHA3. Now, since SHA3 is very recent, it's not much in use yet. So SHA2 is what you would hope to encounter. Now, note that SHA1 is still in use, but Microsoft, as an example, announced that it stopped accessing, it will stop accepting SSL certificates encrypted with SHA1 by the year 2017. So it's days are numbered as indeed they should be. So, looking at what a hash should do, what does it all mean? First, creating a hash is supposed to be easy. That is what one-way functions are all about. This is similar to what we saw with PGP, which is a different technology, where you could generate a key pair on a home computer. Generating a hash should be similarly easy and require very little computing power. And entropy is not a factor here, so you can generate thousands of hashes without any problem. The second criterion says that you cannot reverse the process easily. If you have a given hash, there is no way to generate the message that produced it, at least with the current technology. That is the other half of one-way functions. Now, technically, this is not impossible in the strict sense, given enough computing power, it is theoretically possible to try every conceivable message, compute the resulting hash, and compare it to the hash in question. This is the principle used in dictionary attacks, which we will discuss below. But this also means that hashing is not a good way to encrypt a message to someone, since there would be no way for them to decrypt the message. The third criterion says that any change at all to the message, even a single bit changed, would cause the hash to be completely different. And we do mean completely. The resulting hash would look totally different from the original. This makes it excellent for ensuring that the contents have not been altered, which is why hashes are used to validate downloads. Note that this is essentially the same function that digitally signing your email performs. It assures that the message you sent has not been altered in any way. The last criterion says it should be highly unlikely that two different messages would have the same hash. This is called a collusion in cryptography, and would allow an attack. Again, it is not impossible, just highly unlikely. Also, one characteristic of hashing functions that is not on the list but is worth knowing is that they generate hashes of the exact same length regardless of the original message length. This turns out to be useful in understanding password hashes. So, password hashes in use. In most cases, you enter a password on a login page for some kind of online site, and your password is transmitted in the clear to the server. That means you could be vulnerable to what is called a man in the middle attack, which is to say that if an attacker can get between you and the online site, they can see your password. For that reason, it is important to make sure you have a secure connection, generally one that uses SSL to establish an encrypted connection to the server. This is a whole topic in itself, and we will be getting to the discussion of SSL certificates. So, I won't go into it any further here. In any case, your password goes to the server, and the server employs a hashing function, hopefully a HA2 or better, and stores the hash in its database. Since all hashes are the same length, the database administrators can set aside a fixed field length to store the resulting hash, which tends to make DBAs happy. When you later try to log in and type in your password, the server repeats the hashing function on the password, and compares the hash it gets with what it stored in the database. And if they match, you are allowed in. And given the way that hashes work, any difference at all results in a totally different hash. So, there can never be a concept of close enough. You and I might accept something that is 95% the same, but for hashed passwords, that is not acceptable at all. Hashes stored in this way are not susceptible to a frontal brute force attack. There is no way you can take the hash into a computation that gives you the original password. But because the hashing function is generally well known and deterministic, there is an alternative attack that often does succeed. You can compute a so-called dictionary that contains the hashes for all known dictionary words, and all popular passwords, and then do a lookup of any given hash against this table. An attacker can get a lot of passwords this way, because so many people exercise poor judgment. If you use the word password as your password, or one, two, three, or let me in, all of these, by the way, are known to be frequently used. They will be found in this kind of an attack, and trying to use what is called elite speak to disguise your words, and that's where you use a number in place of a letter, such as a three in place of the letter E, or a one in place of the letter L, you will get caught. Attackers know all about that, and the dictionaries have all of those entries as well. So in essence, if an attacker can get a database of a million words, they can run million passwords, they can run all of the hashes against a dictionary, and in short order, they can get as many as 50% of the passwords, just from a deck comparison. And one thing that helps them is that a lot of people use the same password, and all of their hashes will be identical. You can see this in the periodically issued lists of the most common passwords. Now there is a countermeasure called a salted hash. The idea here is to add a random element to each password, so that the hash is harder to look up. And if any two people use the same password, the hashes will be different, because their salt is different. Of course, that random number, or random element, has to be recorded, so that you can log in each time, and that can mean an added field or perhaps table in the database. Now, if an attacker gets the database, they get the salt as well, but the computation gets exponentially more difficult. Suppose you have a password hash X, and an own salt of Y that was used to calculate it. The only way you can recover the password is to create a new dictionary that combines every possible password in your original dictionary with that known random number and compute the resulting hash. And if you succeed in this, all you have is one password. You would need to repeat this process for every password you have, which is what makes this computationally infeasible for most attackers. A good description of hashing with a discussion of how to deal it can be found at codeproject.com and I have a link in the show notes for this. This is an excellent and detailed discussion, which I recommend if you are interested. They also bring up another countermeasure that is worth combining with the salting, and that is a technique known as key stretching. Essentially, this means using a hashing algorithm that is notably slow to execute. For hashing a single password at the server level when a customer comes calling, the added time is not significant, but when you are trying to compute an entire dictionary of millions and millions of hashes, slowing down the attacker can make for a big difference. So, now we've seen how the site owner can make things more secure. Next time we need to look at your own responsibility. So, this is Ahuka, signing off for hacker public radio and reminding you as always to please support free software. You've been listening to Hacker Public Radio at HackerPublicRadio.org. We are a community podcast network that releases shows every weekday, Monday through Friday. Today's show, like all our shows, was contributed by an HBR listener like yourself. If you ever thought of recording a podcast, then click on our contributing to find out how easy it really is. Hacker Public Radio was founded by the digital dog pound and the Infonomicon Computer Club, and is part of the binary revolution at binrev.com. If you have comments on today's show, please email the host directly, leave a comment on the website or record a follow-up episode yourself. Unless otherwise status, today's show is released under Creative Commons, Attribution, Share a Light, 3.0 license.