Supporting the unsupported sha256(sha256($pass).$salt) hash type
Travis dropped
TL;DR - You can skip right to the results for the code.
A list that recently hit Hashes.org, with 1 million records and a low crack rate, looked like an interesting target, given that the community had recovered less than 0.5% of the list. On taking a closer look, we find out why;
Out of the box, only John the Ripper (JtR) Jumbo carries support for the XenForo hash scheme1 as a dynamic hash2.
JtR’s dynamic hashes don’t run on OpenCL, and a benchmark on a single core resulted in 4MH/s, while during a run we hit around 80MH/s across 20 CPU Cores. (H/T Solar Designer for the correction.)
What we had was a pretense for writing a custom OpenCL kernel for Hashcat—perfect. Adding to the effort was the lack of documentation around Hashcat (and sparse code comments).
My general methodology was running Hashcat in single-hash mode against the kernel and using printf() debugging statements to understand what was going on under the hood. Hashcat’s other kernels and its OpenSSL-style functions were definitely key to getting something working in a limited amount of time.
To start with, we identify an OpenCL a0 kernel (or set of kernels) that closely matches our workload and input type. For example, if your hash is salted, find a salted hash of the same type, and likewise if it is iterated.
Copy it to its own kernel file, designated as m[Hashcat Mode Flag]_a0-pure.cl. Search/replace this source to fix up the mode function names. In this guide, I used m01410_a0-pure.cl6 as the donor kernel. I’ll be using m01415_a0-pure.cl for the new mode 1415.
Adding the mode
Next up, add the new mode to Hashcat.
We add a new enum value SHA256_PW_SHA256_SLT to interface.h and set it equal to 1415:
HT_* is CLI / Help-related Text and a new entry HT_01415 is added to interface.c:
A test hash ST_HASH_01415 plus its corresponding plaintext ST_PASS_HASHCAT_PEANUT are used for self-testing:
And in the main hash configuration block, the values were lifted from case 1410:, sha256($plain.$salt).
An important bit here is to set kern_type to the enum added to interface.h, in this case KERN_TYPE_SHA256_PW_SHA256_SLT. parse_func is a function pointer, set to sha256s_parse_hash - which will be invoked to parse the hash list. If your hash format can’t be consumed by these functions exactly, it will be rejected. Hashcat has a robust set of these parsing functions, and you’ll want to hunt through them in interface.c7.
Quick and dirty test
After a make clean && make, Hashcat should list your new type. Benchmark mode won’t be useful for testing, as the a0 kernels aren’t used here. Running a simple single-hash test against the new kernel will cause Hashcat to build it from the OpenCL directory to the kernels directory. This is when you’ll receive compile-time errors.
In this case, I had known hash/salt/plaintext values (recoverable from JtR and publicly listed). These are essential. You’ll find out how to use the raw results of intermediate steps in the scheme for debugging in the next section.
I used this example for testing:
Steps from the command line:
Reviewing; with the $salt value b4d93efdf7899fed0944a8bd19890a72764a9e169668d4c602fc6f1199eea449 where sha256('peanut') is 5509840d0873adb0405588821197a8634501293486c601ca51e14063abe25d06 and sha256(sha256('peanut') . $salt) is 00050655d5d6b8a8c14d52e852ce930fcd38e0551f161d71999860bba72e52aa. As expected by sha256(sha256($pass).$salt).
Hashcat test:
We now have a custom Hashcat build that lists our new type. Awesome. Let’s get these self-tests to pass with a single hash.
Kicking out the jams
You gotta have it, baby, you can’t do without
Oh, when you get that feeling you gotta sock ’em out
The majority of the work on an a0 kernel will be focused on these two functions:
__kernel void m01415_mxx
Invoked to handle cracking multiple target hashes at once. We can ignore this during testing.
__kernel void m01415_sxx
Invoked in single-hash mode (which is how we’re testing). We can focus our development effort here.
Let’s take a walk through the donor kernel’s m01410_sxx function:
In this kernel, Hashcat doesn’t bother comparing all the bytes in the hash buffer — a basic optimization technique. It’s unlikely to collide with a human-meaningful candidate as selected by Hashcat, even when doing partial matches. The bytes are configured in the dgst_pos* values of the hashconfigstruct:
Use values from another kernel that has the same final hash round as your target.
The kernel prepares the salt buffer once and carries out operations usually performed by the Hashcat sha256 library’s sha256_update_swap function.
Now is a good time to browse through the methods in inc_hash_sha256.cl8. These aren’t documented in any way, and while they resemble other low-level C crypto implementations, some things have been changed to permit more efficient cracking. It helps to select methods out of this library, and those in its hash family (e.g., sha512, sha1), to understand how these functions can be used in various contexts.9
The function’s main loop permutes the password root candidate tmp.i according to rules identified in the rules_buf array. It then establishes a sha256 context, updates the context with the password candidate buffer, updates it (minus the swap already performed) with the salt, and then finalizes the hash. It then compares the bytes outlined in hashconfig->dgst_pos* for a match.
Because our source function is sha256($plain.$salt) and our target is sha256(sha256($plain).$salt), we know we need to add a round of sha256 to the plaintext before updating the final sha256 context, ctx.
Check out some other modes that might carry out similar operations…
Mode
Algorithm
Type
40
md5($salt.utf16le($pass))
raw hash, salted, and/or iterated
3710
md5($salt.md5($pass))
raw hash, salted, and/or iterated
140
sha1($salt.utf16le($pass))
raw hash, salted, and/or iterated
4520
sha1($salt.sha1($pass))
raw hash, salted, and/or iterated
1440
sha256($salt.utf16le($pass))
raw hash, salted, and/or iterated
1740
sha512($salt.utf16le($pass))
raw hash, salted, and/or iterated
30
md5(utf16le($pass).$salt)
raw hash, salted, and/or iterated
130
sha1(utf16le($pass).$salt)
raw hash, salted, and/or iterated
1430
sha256(utf16le($pass).$salt)
raw hash, salted, and/or iterated
1730
sha512(utf16le($pass).$salt)
raw hash, salted, and/or iterated
Some nice candidates include m02610_a0-pure.cl, m04700_a0-pure.cl, m04520_a0-pure.cl, m04500_a0-pure.cl, and m04400_a0-pure.cl. This builds up more examples of how the Hashcat developers used the libraries and OpenCL environment efficiently.
Let’s take a look at an example of a salted hash, with an extra round of hash on the plaintext. Thankfully, we have one in the same hash family: 452010, sha1($salt.sha1($pass)).
Here’s the core routine:
We can see that this routine uses two sha1 contexts: ctx1 to contain sha1($pass) and ctx to contain sha1($salt.sha1($pass)). Additionally, this kernel introduces a new macro, uint_to_hex_lower8_le, and adds some logic to the *_sxx function:
uint_to_hex_lower8_le looks up a byte in l_bin2asc, which returns a lowercase ASCII representation in 2 bytes. We rely on this to convert the raw hex output of the first sha1 round to the lowercase ASCII representation that is usually composed in these salted routines.
The next step is understanding how to use this construct for sha256. We can find that sha1 has an output of 40 bytes, while sha256 outputs 64 bytes. Let’s set this up in our new kernel. First, add the uint_to_hex_lower8_le macro to the beginning of the file. Then add the routine to build the l_bin2asc array. Next, let’s change the core crack routine:
Next up, we need to address the extra bytes available in sha256’s output:
In my case, it took a few attempts to get the bit shifting / byte ordering correct. Here’s what I did:
We know, ultimately, that w0, w1, w2, and w3 needed to hold the ASCII representation of the sha256 hash, using our example ‘peanut’:
The correct array would look like this:
Using these hardcoded values, Hashcat passes the self-test… Progress!
We can play with byte ordering and bit shifting until w0[0] holds 0x35353039 (which, unless you’re shifting on a regular basis, isn’t natural):
You can debug the kernel by alternating rm -rf ./hashcat/kernels with your Hashcat single-hash test command e.g../hashcat -m 1415 -a 0 “00050655d5d6b8a8c14d52e852ce930fcd38e0551f161d71999860bba72e52aa:b4d93efdf7899fed0944a8bd19890a72764a9e169668d4c602fc6f1199eea449” peanut.txt. This is how I nailed down byte ordering and conversion.
With the right values in w0, w1, w2, and w3, we can move the *_sxx changes over to *_mxx. Just reapply your changes there while making sure the final comparison function is COMPARE_M instead of COMPARE_S.
Results
On an eight Nvidia 1080 GPU cracking rig from sagitta, we hit 605 MH/s with this unoptimized kernel — 7 times faster than CPU mode.
I’ll need to wait for another rainy day (and a corpus with a shorter salt) to hack on an optimized kernel.
You can find the final implementation at this GitHub Gist.