Why We Should Probably Stop Visually Verifying Checksums
Hello there! Thanks for stopping by. Let me get straight into it and start things off with what a checksum is to be inclusive of all audiences here, from Wikipedia [1]:
“A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity.”
The procedure that generates this checksum is called a checksum function or checksum algorithm. Depending on its design goals, a good checksum algorithm usually outputs a significantly different value, even for small changes made to the input. This is especially true of cryptographic hash functions, which may be used to detect many data corruption errors and verify overall data integrity; if the computed checksum for the current data input matches the stored value of a previously computed checksum, there is a very high probability the data has not been accidentally altered or corrupted.” [1]
In short, a one-way cryptographic hash function takes the data in a file (or a string, etc.) and generates a fixed-length sequence based on this (using different algorithms). A small change in the source (string/file) will create a completely different hash. Therefore it is good for integrity checks – checking if someone has modified the original.
So, with that in mind, take a look at the SHA256 hash below, which is a checksum of a file. Take it in.
[intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space] [intentional blank space]
Now, without looking above (try and block it out!), look at the SHA256 hash below.
Without cheating and being honest with yourself, is the second hash the same as the first hash? Results may vary here and may be somewhat skewed as the blog post title has kind of given the game away. I, however, did this experiment in a presentation on this very subject, but without letting on the full details of the talk prior, to get an unbiased view, and I had mixed results.
In the presentation, I displayed the first hash on the screen and gave no context; I just told the audience that it is a SHA256 hash and to take it in. I then displayed a nice green landscape picture slide to refresh the mind and reset it. If I had simply switched to the next slide (with the second hash) without any padding slide in between then the like-for-like character replacement would have become instantly apparent to the human eye. I then displayed the second hash on the screen and asked the same question; who thinks that this hash is the same as the first? I had some hands up. I then displayed another landscape picture on the next slide as a reset and I then displayed the first hash again, asking if this was the same hash as the second one, I had just shown. People were unsure and now also somewhat confused. The point is that we as humans have some challenges with the subject of checksums when we are tasked with visually verifying them.
And it is with this that I kick off this blog post.
So, let’s look at some science. Back in 2020, some researchers [2] carried out an experiment in which they asked participants to wear eye trackers while they visually verified checksum values. Let me treat you to a little extract from it, with the most important bit highlighted at the end.
“Our in-situ experiments demonstrate that the verification process is taxing, with a median of around ten back-and-forth that the eyes of the participants have to do between the checksum shown on the Web page and the output of the program used to compute the checksum of the downloaded file. It also demonstrates that, despite being explicitly asked to verify the checksums, more than one-third of our participants failed to detect the mismatch (i.e., partial pre-image attack) between the checksum displayed on a fake download webpage and the one computed from the (corrupted) downloaded file. Our in-depth eye-tracking analysis shows that users pay more attention to the first digits of the checksums , which reduces drastically the security provided by such checksums.”
What is really interesting is the eye tracker heat maps shown below, focusing on the browser and terminal windows where the participants have been asked to compare the checksum values. You can clearly see the eyes focusing on only the first and last digits of the checksum in (c) on the right of the images below, where the participants failed to detect the planted mismatch. If you take a look at (a) and also (b) you can see the eyes are all over the place and these are associated with the participants correctly identifying both a correct checksum, or a planted mismatch.
So, my first thoughts were, well this is bad from a user perspective. Checksums are there to show us whether a file has been tampered with or not. Not verifying checksums leads the way for injection/replacement of bad things.
I then put my offensive red team hat on. It would be quite something if we could modify a file and have the checksum be not too far out from the original – at the start and end, anyway. There may be scenarios where you’re carrying out a red team engagement and you want to test/abuse this very thing. But how viable would something like this be to carry out? Would it actually be realistic and achievable in the timescales of a red team gig? We clearly can’t need a year to get to a point where we’re good to go. So, I did what I usually do to try and answer some of these questions, I got to work creating a proof of concept in Python.
So, for this proof of concept, we’re going to need a target. By target I mean some legitimate file/code/program which we want to inject arbitrary code into. I introduce you to spidey.py. It simply writes out “spiderlabs poc” to the screen.
Target – spidey.py
The SHA256 checksum of this program is:
62d75963b0689a5292d778ec6ac2192835a55b8fa11a257bb0ca6413b7a6cadc
Remember this checksum for later.
That was a joke – we’re no good at that!
We want to inject something into the program that will do something. In this instance, we will do a directory listing (“ls -la”) with subprocess.call. So, the full program should look like the below, introducing injectedspidey.py.
Injection – injectedspidey.py
The SHA256 checksum of this modified program is:
0eba77546b47db6aed9e2a1bf195ed22216751f4a7025ea36bba3e9e5ac4714f
Now, we have a little bit of a problem. If we put the two checksums together (from spidey.py and injectedspidey.py) they look nothing alike. If we go with the research that we’re holding onto the first and last digits in our minds when visually verifying them (when they’re not line by line like below) then that doesn’t align with that, and we will (likely?) fail because those start and end values are way out.
So, we need to come up with a solution. We need a bit of both randomness and padding to produce a different SHA256 checksum in the injected file, but we don’t want to impact the actual code execution.
So, I present you, padding 1. We add a comment (which will be ignored during execution) with the ‘#’ symbol and add a number ‘1’.
Padding 1:
import subprocess
print("spiderlabs poc")
subprocess.call(['ls','-la'])
#1
We then check the SHA256 checksum of this, which is:
d772b3ab6c3eeec6b39ba30745ce1202eb771e77b7909ec5dceccf2e1b7eccc9
If we compare this to the original target checksum we can see that we’re not in luck again:
62d75963b0689a5292d778ec6ac2192835a55b8fa11a257bb0ca6413b7a6cadc
So let’s try that again, change ‘1’ to ‘2’.
Padding 2:
import subprocess
print("spiderlabs poc")
subprocess.call(['ls','-la'])
#2
SHA256 hash of this one?
5045cda3eac046ecc15800b006d2d80e40b18686a66fef828ca84ecbc87a760a
Still out of luck.
We need to automate this whole thing; dynamically build the program, insert the comment, check the SHA256 checksum, if it doesn’t match n first and last digits of our target, increase the comment number and repeat, until we’re happy with the match comparing to the target.
I wrote a program to do this very thing. I am not going to share it because that isn’t the intention of this blog post - I wanted to raise awareness.
So, the automation process would carry this on, creating a resultant file with 3, 4…
3:
import subprocess
print("spiderlabs poc")
subprocess.call(['ls','-la'])
#3
4:
import subprocess
print("spiderlabs poc")
subprocess.call(['ls','-la'])
#4
and so on… larger numbers… 124353543…
import subprocess
print("spiderlabs poc")
subprocess.call(['ls','-la'])
#124353543
Sometime later, at a very large number (one billion, nine hundred and seventy-one million, eight hundred and fifteen thousand, four hundred and forty-six later to be exact…) we find ourselves a winner. The program reports that we have a match for the first 4 and last 4 digits.
1971815446
^^^^^ we have a winner!!!
Let’s build it, putting that number into the comment section and checking the checksum.
The SHA256 checksum of the injected code (badspidey.py) is: 62d7ef41005d9f380446205e2e0efa3692681b8f73f8712318af6183f60fcadc
If we look at our target (spidey.py), the checksum is:
62d75963b0689a5292d778ec6ac2192835a55b8fa11a257bb0ca6413b7a6cadc
The first 4 and last 4 digits clearly match (highlighted), side by side, target file (T) and injected file (I) checksums:
And that’s why we should probably stop visually verifying checksums. It only took about 7 hours on an 8 core PC to do this and that’s with my non-optimised code.
[1] https://en.wikipedia.org/wiki/Checksum
[2] “A Study on the Use of Checksums for Integrity Verification of Web Downloads”, Alexandre Meylan, Mauro Cherubini, Bertil Chapuis, Mathias Humbert, Igor Bilogrevic, Kévin Huguenin. https://dl.acm.org/doi/fullHtml/10.1145/3410154
ABOUT TRUSTWAVE
Trustwave is a globally recognized cybersecurity leader that reduces cyber risk and fortifies organizations against disruptive and damaging cyber threats. Our comprehensive offensive and defensive cybersecurity portfolio detects what others cannot, responds with greater speed and effectiveness, optimizes client investment, and improves security resilience. Learn more about us.