The final stage of the GCHQ challenge was a small (5kB) x86 Windows/cygwin binary (available here). Analyzing it in IDA Pro, I could see that it expects a 24 byte license file with the following format:

The above could be deduced from the following reverse-engineered code:

If a valid license is found, it calls a function that requests the following URI from the specified host (which should obviously still be /hash/A/B/C/key.txt


  • A = 32-bit hex key from stage 1
  • B = 32-bit hex key #1 from stage 2
  • C = 32-bit hex key #2 from stage 2

Note that the hash itself is used, and not the password corresponding to the hash. Misdirection, again, in order to waste some time for those who didn’t bother to actually understand the code.

It’s now clear that we need two 32-bit values that somehow relates to stage 2, and the so called “firmware” array is a rather obvious choice. It consists of two 32-bit values, and it wasn’t used in the stage2 challenge itself. We also need one 32-bit value from stage 1. Remembering that the payload actually starts out with a “useless” jump over four bytes (e.g a 32-bit value) that are never used in any way, this is quite an obvious choice as well.

This gives us the following URL:

Which contains the following key:

When entering the correct keyword, we get this page:

“So you did it. Well done! Now this is where it gets interesting. Could you use your skills and ingenuity to combat terrorism and cyber threats? As one of our experts, you’ll help protect our nation’s security and the lives of thousands. Every day will bring new challenges, new solutions to find – and new ways to prove that you’re one of the best.”

Interestingly, the salary for the “Cyber Specialist” position at GCHQ is £25,446 for a GC10 (Executive Officer) and £31,152 for a GC9 (Higher Executive Officer). Comparing this with the salaries in the corporate world makes it quite clear why GCHQ has to try so hard to find competent people to recruit. After all, for people with skills there are lots of opportunities for interesting work with a much higher paycheck. For juniors with potential, but no real experience, it might be an interesting opportunity though. I’m sure that the work can be quite stimulating. :)

After cracking stage 1 of the GCHQ challenge, we get the URL to the Javascript code available here.

It defines a virtual machine with four general registers, two segment registers, one flag register and of course the instruction pointer. It also includes an array of two values named “firmware”, which seems rather strange. These values are not used, nor mentioned in the definition of the virtual machine that is included in a comment further down in the javascript code. The use for these values will actually become apparent in the third and final stage.

Besides specifying the register values and an array with the current memory contents, there is a 54 line comment describing the virtual machine architecture, the instruction encoding and the instruction set:

As you can see, it then throws an exception saying that VM.exec is not yet implemented. Our job is to implement this function, based on the information provided in the comment, and executing the embedded code in our virtual machine to see what it reveals. Although I don’t really see how this would provide a demonstration of anything but quite basic skills, I decided to play along and continue with the challenge. I decided to make a C implementation instead though.

First, I decided to only implement a disassembler and reverse-engineer the code by hand instead of actually implementing the virtual machine. After finding out that the initial piece of code actually decrypts the next part of the code, and that the decrypted code decrypts a string, I decided to actually implement the VM instead of just a disassembler.

Corresponding C-code:

The decrypted code is as follows, and quite similar to the previous code:

Corresponding C-code (sort of):

The instructions at 0x0112-0x0114 are not necessary, and could either be there for misdirection or possibly to indicate a hidden challenge. Further signs of this lies in the fact that there are a couple of instructions further down that are never executed:

There is also a 0xcc at 0x0140, that isn’t even a valid a valid instruction in the instruction set for the virtual machine, but corresponds to a breakpoint in x86 assembler. Last but not least there are large chunks of memory that are neither read, written or executed. First of all, there are some zero bytes that are obviously just there for filling. What is more suspicious are these:

There are some interesting patterns in this data, especially when interpreting it as three 112 bytes chunks. Each chunk consists of 20 bytes < 0x80, 25 bytes >= 0x80, 26 bytes < 0x80, 26 bytes >= 0x80 and finally 15 bytes < 0x80. In other words, there is a pattern in which bytes have the most significant bit set in each of these chunks. This pattern could be the result of each chunk being XOR-encoded using the same algorithm as has been used in the previous code, with some specific start and step values. I decided not to waste any more time on this track unless the challenge seemed to lead to a dead end though. After the second decryption loop it will reveal the following string at 0x01b0:

Yet another URL, this time to an x86 Windows/cygwin executable. To get this string I could just have dumped the virtual machine memory after execution, since I had already reverse-engineered the code I decided to get each character after it has been decoded by reading r0 at ip = 0x010c (0x010c == 268), using the code available here.

Note that I have not bothered to implement any bounds checking when reading values from memory, so running the VM on arbitrary/random code will most likely result in a crash. :) In this case I had already seen that the code was “well behaved” and just wanted to go on to the next stage of the challenge though.

I heard about “the code” at during the first friday of december (2011-12-02), and cracked the final stage on sunday two days later. The reason for not cracking it all during friday evening was, unfortunately, not because it presented much of a challenge but because I was out partying pretty much 24/7 during friday and saturday (including after parties until dawn both nights). :)

I didn’t want to publish any writeup until the challenge was over, even though I heard that there were others who didn’t bother waiting. As a man who loves challenges of all kinds, I would hate to be the one spoiling it for someone who actually wanted to give it a try though.

The first part of the code was this picture:

I recognized it as x86-assembler at the first glance, because of the “eb 04” (jmp $+6), the multiple “31 cX” (xor reg, reg), the “90”:s in the end (NOPs) and last but not least the “cd 80” (int 0x80) to mention a few of the immediately recognizable opcodes. The int 0x80 also tells me that this is most likely a Linux-based payload, since 0x80 is the interrupt used for making system calls in Linux. Looking at the code, I could see that it actually just calls exit(0) after the relevant code has been executed.

By looking at the code in a disassembler I could see that it has a decode-loop, that looks very much like the RC4-cipher (initializing a 256 byte buffer with values 0 to 255, swapping values around based on what corresponds to the RC4-key (0xdeadbeef) followed by a decryption loop. Not that it really matters which crypto is being used, since the key is embedded into the code.

Looking closer at the decryption loop we can see that it is actually slightly different from the original RC4 cipher, even though the same key scheduling is used.

Original RC4 can be implemented like this:

This implementation changes j to the xor-byte in each iteration:

This is actually pretty funny, considering that GCHQ themselves have published an (very brief) explanation of the challenge now after it’s over, which describes the crypto as RC4. So from what I can tell, this is an unintentional bug rather than by design. A mistake that is easily made when you’re hand coding assembler, and not being careful of which registers are used for what, but a rather silly mistake nonetheless.

Anyway. Analyzing the code further we can see that the payload seems to be missing some important parts, like the buffer that is to be decrypted for instance. This, on the other hand, is definitely by design. :)

See the “AAAA” (41 41 41 41) at the end of the payload? That’s a tag that is checked by this piece of code:

So far so good. If the tag isn’t there, it jumps ahead to the part of the code that calls exit(0):

Then it continues with checking for another tag (“BBBB” = 42 42 42 42), this tag is _not_ included in the payload:

Of course, we can just append this tag to the code to bypass it. This doesn’t really help us much though, since the code then continues with reading a byte count to decrypt, followed by the actual encrypted data:

After this follows the code that actually performs the decryption, and finally it calls exit(0) just like when the tag on the stack is not found. The exit(0) could be replaced by code that writes the decrypted data to stdout, or we can simply set a breakpoint there (or manually insert a 0xcc = int3 there instead) so we can read the decrypted data in a debugger.

So, where is the missing data? My first thought was that the data is probably either hidden in the HTML code of the page, or that it is hidden within the image of the payload. Hiding it within the image of the payload seemed pretty likely, since it actually was a bit odd that they used an image at all instead of plain text that would allow for copy & paste instead of manually writing the payload down from the image.

There are several ways to hide data within an image, some more refined than others. The easiest way to do it is to simply inject the data to the hide into some form of meta data. Using “exiftool” we can extract the following text field (although I first noticed it with a simple “strings”):

This is quite obviously base64-encoded data, judging from the “==” at the end and the range of characters being used. The following command line reveals its decoded contents:

As you can see, it starts with the “BBBB” that the payload looked for after the end of the payload, followed by an 32-bit LSB integer (32 00 00 00 = 0x00000032 = 50 = The size of the encrypted buffer), and finally the 50 bytes of encrypted data.

In other words, the full payload is as follows:

By embedding this payload into a small C program (available here), and injecting a breakpoint at the exit(0) call by manually changing the “cd” in “cd 80” (int 0x80, the system call interrupt) to “cc” (int3) lets me read the decrypted string in a debugger.

This gives us the URL for the next stage of the challenge (available here).