|By Kalle Rosenbaum
This article discusses the basics of cryptographic hashes, which
You can think of a cryptographic hash as a fingerprint. A person produces the same fingerprint of her left thumb every time it’s taken, but it’s difficult to find another person with the same left thumb fingerprint. The fingerprint doesn’t disclose any information about the person other than her left thumb fingerprint. You can’t know what math skills she has or what eye color she has by looking at her fingerprint.
A fingerprint of a file is called a cryptographic hash. To create a cryptographic hash of a file, you send the file into a computer program called a cryptographic hash function. Suppose you want to create a cryptographic hash, a fingerprint, of your favorite cat picture.
Figure 1. Creating a cryptographic hash of a cat picture – the input is the cat picture and output is a big number of 32 bytes.
The hash in the picture is a 256-bit number. 256 bits equals 32 bytes. It means that to store the number in a file, the file will be 32 bytes big, which is tiny compared to the size of the 1.21 Megabyte cat picture.
The word “hash” means something which is chopped into small pieces or mixed up. It’s a good description of what a cryptographic hash function does. It takes the cat picture and performs a mathematical calculation on it. Out comes a big number that doesn’t remotely look like a cat. You can’t “reconstruct” the cat picture from the hash – a cryptographic hash function is a one-way function. Let’s see what happens when you change the cat picture a tiny bit and run that cat picture through the same cryptographic hash function:
Figure 2. Hashing a modified cat picture. Can you spot the difference? The cryptographic hash function certainly did.
This hash turns out completely different than the first hash. Let’s compare them:
See how that tiny change to the cat picture made a huge difference in the hash value?
Why are cryptographic hash functions useful?
Cryptographic hash functions can be used as an integrity check, to detect changes in data. Suppose you want to send your favorite cat picture to your friend Fred via email, but you suspect that the picture may be accidentally corrupted during transfer. How would you and Fred make sure that the picture Fred receives is the same as the one you sent?
Figure 3. Checking file integrity. You calculate the cryptographic hash of the cat picture and send the picture and the hash to Fred. Fred calculates the cryptographic hash of the received file and compares it to the hash provided by you in the email.
You compose an email to Fred and attach the cat picture to the email. But you also calculate the cryptographic hash, the digital fingerprint, of the cat picture. That hash is written down in the body of the email. The cryptographic hash function is standard software and available on both your computer and Fred’s computer.
When Fred receives this email, he saves the cat picture in a file on his computer and calculates the hash of that file. If the result is the same as the hash in the email, Fred knows for sure that the file isn’t accidentally corrupted.
How does a cryptographic hash function work?
The answer is complex, and we won’t go there in this article. To help you understand the operation of a cryptographic hash function, we’ll create a simplistic cryptographic hash function. Well, it’s not cryptographic, but we’ll come to that later. Let’s call it a hash function for now.
Suppose that you want to hash a file containing the six bytes a1 02 12 6b c6 7d. You want the hash to be a one byte number, 8 bits. We can construct a hash function using addition modulo 256, which means to wrap around to 0 when the result of an addition reaches 256:
Figure 4. Simplistic hash function using byte-wise addition modulo 256.
The result is the decimal number 99. What does 99 say about the original input a1 02 12 6b c6 7d? Not much. 99 looks just as random as any other single byte number.
If you change the input, the hash will change, even though there is a chance that the hash will remain 99. After all, there are just 256 different possible outputs of this simple hash function. With real cryptographic hash functions, like the one we used to hash the cat picture, the chance is unimaginably small. We will soon get a glimpse of that probability.
Properties of a cryptographic hash function
A cryptographic hash function is a function that takes any digital input data and produces a fixed-length output. In the example with the emailed cat picture, the input is the cat picture of 1.21 MB and the output is a 256 bit number. The function will output the exact same hash each time the same input is used. But it’ll output a totally different value when even the slightest variation of the input is used. The output of a cryptographic hash function is often referred to as a hash or a digest. I’m using term hash here, but either is equally valid.
Let’s have a look at what properties you can expect from a cryptographic hash function. We’ll illustrate the properties using the SHA256 (Secure Hash Algorithm with 256-bit output) algorithm, because it’s the one that Bitcoin uses the most. Several different cryptographic hash functions exist, but they provide the same basic properties:
- The same input will always produce the same hash.
- Slightly different inputs will produce different hashes.
- The hash is always of the same fixed size. For SHA256 it’s 256-bits.
- Trial-and-error is the only known way to find an input that gives a certain hash.
Figure 5. A cryptographic hash function, SHA256, in action. The input “Hello!” will give you the same output every time, but the slightly modified input “Hello” will give you a totally different output.
The first three properties are illustrated in the diagram above. The fourth property of cryptographic hash functions is what makes it a cryptographic hash function and this needs a bit more elaboration. Some variations to the fourth property exist, all of which are desirable for cryptographic hash functions:
Figure 6. Different desirable properties for cryptographic hash functions. For collision resistance, X can be anything, as long as the two inputs give the same output X.
It’s hard to find two inputs that give the same hash.
It’s hard to find an input that gives a certain hash.
It’s hard to find an input that gives the same hash as a certain other input.
Illustration of “hard”
The term “hard” in this context means astronomically hard. It’s silly to even try. We’ll have a look at second preimage resistance as an example of what “hard” means, but a similar example can be made for any of the three variants.
Suppose that you want to find an input to SHA256 that results in the same hash as Hello!:
You can’t change the input a little bit to fool the function to “not notice”. It’ll notice and output a totally different hash. The only way to find an input, other than Hello!, that gives the hash 334d016f755cd6dc58c53a86e183882f8ec14f52fb05345887c8a5edd42c87b7 is to try different inputs one by one and check if it produces the desired hash.
Table 1. Finding an input with the same hash as “Hello!” Nearly impossible.
As you can see, we aren’t successful. Let’s think about how much time it’d take for a typical desktop computer to find such an input. It can calculate about 60 million hashes per second and the expected number of tries needed to find a solution’s 2255. The result’s 2255 / (60*106) s ≈ 1068 s ≈ 3*1061 years, or about
I think we can stop trying, don’t you? I don’t think buying a faster computer will help either. Even if we had one trillion computers and ran them concurrently it’d take about 3*1049 years. Preimage resistance, second-preimage resistance and collision resistance are extremely important in Bitcoin. Most of its security relies on these properties.