This is my write-up for the maze challenge in the 31C3 CTF, that I played with the Hacking For Soju team. We “only” got 10th place (out of the 286 teams that scored any points at all), but considering that only me, capsl and avlidienbrunn had time to spend any time on it (and I was able to score 170 out of our 340 points, which would have given me the #33 spot if I had played alone), it wasn’t too shabby! :) If I hadn’t been so sleepy/off during large parts of the CTF, I would probably have been able to score a bit more. Greets to capsl for brainstorming about the potential ROP-scenarios btw!

To make this write-up more useful for people that want to learn, I have tried to make it quite detailed.

The information for the challenge was:

The provided tar.gz file contained the binary for the challenge: maze

To run it as a server process on your own system, you can use the following command:

When connecting to the target, the following message is displayed:

Since the name of the challenge is “maze”, and we are presented with a message that states that we can only go east, the valid inputs are presumably east, west, north and south, and we are supposed to navigate through a maze in order to eventually reach some sort of goal. In order to understand exactly what is going on, and to see if there are any vulnerabilities that we can exploit along the way, our next step is to load the binary in IDA Pro and analyze the code. But first, let’s see what kind of protections are enabled for the binary in question, using checksec.sh.

I would suggest that you make a habit of checking which exploit mitigations have been enabled for your target before you start auditing it. For real-world targets, this will give you an idea of the preconditions you will probably have to meet in order to achieve reliable exploitation of the target (i.e. if information leaks are necessary, and so on). For CTF challenges, it can even provide hints on what kind of vulnerabilities to look for.

In this case, NX is enabled, so we will probably have to use a ROP based payload. PIE (Position-Independent Executable) is not enabled, so we will be able to use ROP gadgets from the executable itself even if ASLR is enabled on the target system. However, the binary is rather small, so there is probably only a limited number of suitable ROP gadgets available. Stack canaries are not enabled, which is a strong indication that the vulnerability to look out for in this case is probably a stack-based buffer overflow.

Do not rely completely on the information you determine this way though. In some cases (i.e, a poorly organized CTF), the binary running on the actual target is slightly different than the one provided, or some protections have been explicitly disabled/enabled on the target system.

Also note that the binary is a 64-bit Linux executable. Analyzing 64-bit (x86_64, to be specific) code has recently become a lot less time consuming, due to the release of Hex-Rays x64. Hex-Rays is an excellent decompiler plugin for IDA Pro, that lets you interactively work with the decompiled code in order to make sense of it. Note that there are still corner-cases that Hex-Rays is not able to handle, especially when dealing with obfuscated code and code that has been explicitly designed in order to make static analysis difficult. If you find something strange in the decompiled code, always perform manual analysis of the assembly code in question.

After the initial pass through the Hex-Rays decompiler, the code shown below is produced (note that irrelevant code, i.e the automatically inserted stub that is calling __libc_start_main() and the code handling initialization and cleanup, has been manually removed from the source code listing). While reading it, try to see if you can spot the vulnerability:

As you can see, this is a really small and simple program, and the Hex-Rays output is pretty readable as-is. When working with the code in IDA, I can clean things up further though. Renaming functions and variables, changing types and function definitions, adding comments, and manually fixing mistakes that has been made by IDA:s automatic code analysis, can make the code a lot easier to understand, and make it easier to spot vulnerabilities in the code.

After working with the code manually in IDA for a while, we end up with the following code. If you have not found the vulnerability yet, look for it once again while reading through the updated code listing:

If you did not find the vulnerability this time either, you need to practice more. ;) In the original code, it was easy to overlook (although still quite visible, when taking the stack/frame pointer offsets in the automatically produced comments into account). In the revised code, the vulnerability is rather obvious though. Note how the decompiled code in the append_highscore() function (originally, sub_400CE0) has changed:

Before:

After:

As you can see, 1056 bytes is read into a 1056 byte buffer, but, it begins reading at offset 1. This means that we are able to overflow the buffer with one byte. While this is obviously a rather contrived example, off-by-one vulnerabilities are often found in the wild as well. Typical cases include people doing things like buf[sizeof(buf)] = X, or allocating room for strings without taking the terminating NUL-byte into account.

So, what can we accomplish with a 1-byte-overflow? Well, that obviously depends on what is stored right after the buffer in question, and if our target is running on a little-endian or a big-endian architecture. For little-endian architectures, including x86 and x86_64, the least significant byte (LSB) of a value, including pointers, is stored first. So, if we are overwriting the LSB of an address that is originally 0xbadc0ded, it is the ‘ed’ at the end of the address that will be overwritten, and not the ‘ba’ in the beginning of it. To make this clearer, here is an illustration of how the 32-bit value 0xbadc0ded is stored in memory (each hex-character represents 4 bits, so one byte is represented by two hex characters):

badc0ded-lsb

Overwriting the LSB of an address will thus shift it slightly, with up to 255 bytes depending on what the original value was. If the original value is 0x123456, overwriting the LSB with 00 will change it to 0x123400, effectively subtracting 0x56 from the original value. So, what is stored right after our buffer in this case? Well, the automatically produced comment for our buf-variable will reveal that right away. “[bp-420h]” means that the buffer is located at rbp (the current base/frame pointer value) minus 0x420 (and 0x420 = 1056). Note that rbp points to the location of the saved rbp value, that will be restored when returning from the function. Our 1-byte-overflow will overwrite the least-significant-byte of the saved rbp value, which means that we will be able to control (part of) rbp after append_highscore() has returned back into reached_goal().

Regarding frame pointers, note that they are actually forming a linked list. The current frame pointer value (rbp) points to the saved frame pointer value in the current function’s stack frame, and the saved frame pointer value points to the saved frame pointer of the calling function’s stack frame, and so on. Below you can see how the stack frames for each function in maze are linked together, at the time when append_highscore() is executed. Since the stack grows down, towards lower addresses, the ‘buf’ buffer in append_highscore()’s stack frame is stored at a lower address than all the saved frame pointer values.

maze stack

This description is a bit simplified though, since each function, including append_highscore(), saves a few other registers as well. While this is not apparent when looking in Hex-Rays, when looking at the actual assembly code in IDA we can see that the stack space that is being reserved for the buffer on the stack is actually 0x408 = 1032 bytes, and since this probably includes some extra padding added by gcc, the original size of the buffer in the source code was probably less than this. Most likely it was declared to be 1016 bytes, and the buf[1016] = ‘\0’ was actually buf[sizeof(buf)] = ‘\0’ in the original source code, and since the contents of the rbx register is pushed to the stack right before the stack space for buf is reserved, this actually means that the LSB of the saved rbx register is being overwritten with a NUL-byte regardless of if we are exploiting the overflow with read() or not.

function prologue for append_highscore()

When returning from append_highscore(), back into reached_goal(), the saved contents of the rbx, r12 and r13 registers are restored from the stack, as well as the saved frame pointer (rbp). In this case, reached_goal() is not using any of those registers before returning into main(), but if it did, that could have potentially resulted in other exploitable scenarios as well. Even though it did not matter in this particular case, always aim to have a complete understanding of everything that is going on, since there will be times when paying attention to those small details are crucial to success.

If reached_goal() would have referenced any local stack variables after append_highscore() returned, and if the assembly code produced to reference those variables used the base/frame pointer, that could have resulted in potentially exploitable side-effects for us as well. In cases such as this, when the function that was calling the vulnerable function is returning to a function one step further up the call-chain, an even more convenient opportunity arises though. For code that is compiled to use frame pointers, the function epilogue usually ends with “leave; ret”, or equivalently, “mov rsp, rbp; pop rbp; ret”. As you can see, this sets rsp = rbp (i.e, if we controlled the contents of rbp, we now control the stack pointer), pops the new rbp value, and finally returns (i.e. pops the return address, using the stack pointer that is now under our control). Looking at the code, we can see that it uses a variation of this, that sets rsp = rbp-0x28, restores 5 registers (5*8=40=0x28) and then pops rbp and returns.

function epilogue for reached_goal()

So, by exploiting the 1-byte-overflow of a stack buffer in append_highscore(), we will actually be able to control the stack pointer when returning from the reached_goal() function. In other words, we have control over where the return address is about to be retrieved. Pointing rsp into a buffer that we control the contents of will thus allow us to control the return address. Ideally, we want to point rsp into a buffer containing a full ROP chain to exploit the program in question. If we would have known the address of the system() function, and if the RDI register (that is used for storing the first function argument, in the standard x86_64 ABI) had contained the address of a buffer that we control the contents of, the full ROP chain in question would have only consisted of a return into system(). :) Always keep in mind all the parameters that are under your control (i.e register values, buffer contents, and so on), and information you currently have (base addresses of libraries, the executable, stack/heap addresses, and so on), and what information you can potentially deduce. Sometimes there are case-specific shortcuts you can take in order to achieve reliable exploitation of a particular target.

As I mentioned earlier, overwriting the least-significant-byte of the saved rbp value will allow us to shift it to an address slightly further up or down the stack, with up to a maximum of 255 bytes (up to 248 bytes in this case, since the saved rbp value will obviously be aligned to an 8-byte-boundary). Shifting it to a higher address will not do us any good, since we do not control the contents of any stack buffers allocated there. Shifting it to a lower address can potentially point it into the ‘buf’ buffer though, depending on what the LSB of the original saved rbp value is. By overflowing with a NUL-byte, we ensure that we are effectively subtracting as much as possible from the saved rbp value, and maximize our chances of pointing rbp into our buffer.

Due to ASLR, the LSB of the saved rbp value will vary (and sometimes, the LSB will even be 0 to begin with), so it is possible that we need to make a few attempts in order to exploit this. When ASLR is enabled, the base address (and once again, keep in mind that the stack grows down towards lower addresses) of the stack will be randomized, and unlike the randomization of base addresses for dynamically loaded libraries, and PIE-binaries, the offset within the memory page will be randomized as well. For libraries/PIE-binaries, the 12 least significant bits will always remain the same, and can be used to narrow down the potential binary versions that are running on a system that you are exploiting in cases where you have found an information leak.

We now know where the vulnerability is, and have a rough idea of how to exploit it (i.e, overflow the LSB of the saved rbp with a NUL-byte, hoping that it is enough to point it into our buffer, where we will store a ROP chain). This was the easy part. ;) To actually trigger the vulnerability, it turns out that we have to solve the maze. Since the maze is fairly large (179×95), we obviously don’t want to go through the process of solving the maze manually on each exploitation attempt (as I mentioned, the exploit will not work every time due to ASLR). Since the maze is static and hardcoded into the binary, we can just solve it manually and send the solution after connecting though.

Initially, I decided to make a small python script in order to visualize the maze. By analyzing the, slightly hairy, algorithm of the check_valid_moves_loop, we can deduce that maze_map[] is actually an array containing the x- and y-coordinates of all the occupied squares (the walls) in the maze. Even if we would not have been able to deduce this by looking at the code alone, it could have been deduced by analyzing the data in the array in question. Especially when visualizing it. :)

I made this script to extract the maze data from the binary, and create a PNG file:

This is the resulting PNG file:
maze

I then wrote a simple recursive algorithm in order to solve the maze:

By connecting and sending the full solution to the maze we reach the vulnerable part of the code, i.e. the code that reads a name to add to the highscore list, in a fraction of a second. So, even if we need to make a few attempts in order to exploit it, due to ASLR, it will not make any real difference for us. Time to pwn. ;)

Since the binary is so small, we don’t have a lot of suitable ROP gadgets to play with. We need to find ways to use the ones we have as effectively as possible. My first attempt was to see if I could return into reached_goal(), right before the call to fopen(). We don’t actually need to get code execution on the target to solve this challenge, to get the flag we only need to be able to read a file (and based on the other levels, it seemed like a reasonable guess that the flag was stored in either /home/user/flag or /home/user/flag.txt). The code in reached_goal() reads the current highscore file, and prints the last of the names in question.

There is a problem with this approach though. We need to populate RDI with a pointer to a string with the filename we want to read, and at this point, we do not know the address of any buffer under our control. This problem is possible to solve by returning into read() though, in order to populate a buffer at an address of our choosing. A suitable address for this purpose must obviously be a valid and writable one, and since the target is a non-PIE-binary, the data segment for the executable itself resides at a fixed and known address. We can simply use 0x6060A0, which is the address of the .data section.

Note that read() is a libc-function, and since we do not know the address where libc is mapped, we can not return directly into read(). If we had known the libc base address, we could have just returned directly into system() at this point, after populating RDI with the address of the “/bin/sh” string within libc itself. We solve this by returning into the PLT-entry for read(), since read() is one of the functions that are imported by the non-PIE target binary. The PLT-entry for read(), which acts as a trampoline into read() in libc, is stored at 0x400850.

Since read() takes three arguments, that are passed in RDI, RSI and RDX respectively, we need to make a ROP chain that populates those registers before returning into read(). Ideally, our target binary would have contained instruction sequences such as “pop rdi; ret”, “pop rsi; ret” and “pop rdx; ret”, or even “pop rdi; pop rsi; pop rdx; ret”, but those instruction sequences are not commonly found in practice. Since x86 and x86_64 use variable-length instructions, that do not have to be aligned to any n-byte-boundary, it is possible to return into the middle of existing instructions though. By looking at partial instructions as well, we can find “pop rdi; ret” at 0x400f63 (the “pop rdi” instruction, opcode 0x5F, is actually the second byte of “pop r15” instruction) and “pop rsi; pop r15; ret” at 0x400f61 (the “pop rsi”, opcode 0x5E, is the second byte of a “pop r14” instruction), as you can see below:

maze-rop-gadgets

It doesn’t matter that there is a “pop r15” instruction between the “pop rsi” and the “ret” instruction, as long as we are able to populate the registers we care about, without any side effects that cause the program to crash (invalid memory accesses, etc), it suits our purposes just fine.

Looking for ROP gadgets manually can be time-consuming though, especially for small binaries where there are few naturally occuring instruction sequences that are useful. Alternatives include using PEDA, Python Exploit Development Assistance for GDB, as can be seen below:

As you can see above, there are unfortunately no suitable “pop rdx” gadgets. There may be other ways for us to populate RDX though, and for our purposes, we don’t need to populate RDX with any specific value. Any non-zero value that is not too small is fine. The code we want to execute is read(0, ptr, N), where ptr is a pointer to a buffer that we are reading data into and N just needs to be at least as large as the data we want to read. As long as RDX still contains a non-zero value after the read(), even 1 might have been ok, if we can chain multiple calls to read().

For a more complete listing of ROP gadgets, that we can inspect manually in order to see if we can find anything useful, the ROPgadget tool by Jonathan Salwan can be used:

There does not seem to be any obvious gadgets available for setting RDX, and unfortunately, RDX is set to 0 (as a side effect of the call to fclose() before returning from append_highscore(), at least when running it on my own system while testing) when the function epilogue for reached_goal() is executed. Since RDX can be set as a side-effect when calling functions, we can try looking for “harmless” functions to return into in order to set RDX to a non-zero value though.

We also still have the problem of not knowing the base address of libc, and maybe there’s a way to solve both of these problems at once. :) By returning into the PLT-entry for puts(), that prints a string (or rather, prints any data up until the first NUL-byte it encounters), with RDI set to an address that contains a pointer into libc (such as a GOT-entry), we are able to both set RDX as a side-effect of the call to puts(), as well as leak a libc address that can be used to calculate the address of arbitrary libc functions. The fact that puts() also happens to set RDX as a side-effect was just a lucky coincidence, but if it hadn’t, there were a number of other functions we could try to call for that purpose.

Our original plan of simply returning into reached_goal(), right before the call to fopen(), is now obsolete. Since we have now leaked a libc address, we can simply use the read() in order to read a second stage ROP chain into a known location and then pivot the stack into that. The exploit will read the leaked address (a pointer to puts() in libc, by reading the GOT-entry for puts() in the address space of the non-PIE binary), calculate the base address of libc from that, and then the address of system(). Since we have just read arbitrary data into a known location, we can also place an arbitrary command string to be executed there, rather than using the “/bin/sh” string from libc. This also makes it more suitable for cases where we don’t know which libc version is used on the target system, since we only have to bruteforce one offset (between puts() and system()) rather than also having to know the address of the “/bin/sh” string. Another possibility, in that case, would be to use puts()-calls to leak data at page-boundaries below the leaked puts()-address, in order to find the base address of libc, and then implement symbol resolving by parsing the ELF header. That was actually what I ended up doing on the cfy-challenge, after my attempts that assumed an Ubuntu 14.04 libc failed (it turned out to be Ubuntu 14.10). :P

The only remaining piece of the puzzle at this point are gadgets to perform the stack pivot, into our second stage ROP chain. For this, we can use a “pop rbp; ret” gadget, that can be found at address 0x400AB0, in order to populate RBP. Then we use the “leave; ret”-equivalent in the function epilogue of reached_goal(), that I have already mentioned earlier, in order to point RSP into our second stage ROP chain. For the first stage ROP chain we also need a simple ret-gadget (such as the one at 0x400F64), since we do not know the exact offset into our buffer where the stack will be shifted (it will vary with each execution). By just filling the start of the buffer with addresses of ret-instructions, it will keep on returning until it reaches our ROP chain that we have placed right at the end of the buffer.

To sum it up. The gadgets we need are:

  • 0x400F64: Prepended to ROP chain for “NOP sled” effect (ret)
  • 0x400F63: Set RDI, i.e. the 1st function argument (pop rdi; ret)
  • 0x400F61: Set RSI, i.e. the 2nd function argument (pop rsi; pop r15; ret)
  • 0x400AB0: Set RBP, to prepare for the stack pivot (pop rbp; ret)
  • 0x400E96: Stack pivot (lea rsp, [rbp-0x28]; pop {rbx,r12-r15,rbp}; ret)

Note that the three fi

Besides these ROP gadgets, we also need:

  • 0x606028: GOT-entry for puts(), used to leak a libc address
  • 0x400850: PLT-entry for read(), returned into to read our 2nd stage ROP chain
  • 0x4007F0: PLT-entry for puts(), returned into to print the leaked address
  • 0x606XXX: Scratch buffer, that our 2nd stage ROP chain is read into

Initially, I used 0x6060A0 as the scratch buffer address, i.e. the start of the .data section. That resulted into running out of stack space in system() though, since the memory below this address will be used as stack space for functions that we are returning into from our 2nd stage ROP chain. I changed it to 0x606500, to give the stack more room to grow, and now we finally have a full working exploit. :)

As a final touch, I implemented support for providing a full interactive pty-session rather than a lousy interactive shell with no job control. ;) What good is pwning, if you can’t run vim on your targets?! :)

Sample session:

Note that you may have to run it multiple times to succeed, due to ASLR. If arrow-up+enter is too cumbersome, just run while true; ./maze-xpl.py; done :)

Source code for exploit provided below:

This was the only challenge remaining for us (ClevCode Rising) in the GITS 2015 Teaser CTF (http://ghostintheshellcode.com/2015-teaser/final_scores.txt), after I had solved the Citadel challenge and my team mate Zelik had solved Lost in Time. With no previous GNU Radio experience, I tried my luck, and was able to come very close to solving this in time to win the entire CTF. Unfortunately I had reversed the output-bits from the deinterleave-block… Next time, I won’t do the same mistake. ;)

Since it was a pretty fun challenge, I took the time to fix up my GNU Radio Companion diagram after the challenge ended and I had been made aware of my mistake, and made a cleaned up and minimized Python solution script as well.

This was the information given in the challenge:

Along with this picture:
dontpanicshiftkeying.jpg

And this data file:
key.iq

My final solution was this piece of Python code:

Or, this GNU Radio Companion file:
dontpanic.grc

Which yields this image, containing the flag! :)
key

This is my writeup for the Vuln 500 challenge in the Codegate Quals 2012 competition.

The vulnerability is a straight forward format string vulnerability in a SUID Linux/x86 program. Since ASLR & NX was activated, it was not quite as straight forward to exploit though. Since partial RELRO was used as well, DTORS was read-only, but the GOT still writable. The only function call after the vulnerability is triggered is to __stack_chk_fail() though, and this is only called if the stack cookie for main() has been corrupted.

One way to exploit this vulnerability would be to use a ROP based payload, chaining gadgets from within the (non-randomized) .text section of the binary, and/or from glibc by bruteforcing its base address. Since a pointer to the format string was passed as the first argument of printf() right before this, we can return directly into system() which will use the format string pointer as its argument. This makes things a whole lot easier for us. :D

We still have to overcome the ASLR though, and we need to overwrite both the stack cookie on the randomized stack and the GOT-entry for __stack_chk_fail() with the address of system() in glibc which has a randomized base address. Looking at the stack pointer value between different executions about 20 bits of its address seems to be randomized though, and about eight bits of the glibc base address. This means that a bruteforce attack will take quite some time, unless we can figure out a way to exploit it more efficiently.

Fortunately, there are a few shortcuts. First of all, instead of overwriting the stack cookie on the stack, we can overwrite the value it checks the cookie against instead. This is stored at %gs:0x14, which is mapped to an address that is randomized as well, but that always seems to be located at a fixed offset beneath the glibc base address. See example below, the cookie is always stored at the system() address minus 0x39a2c in this particular program. This means we only have to bruteforce the glibc base address, which only has about eight bits randomized.

Using this, we can exploit it in within a couple of hundred attempts, which doesn’t take long. Note that we will use system-5 instead of the direct address of system(), since the latter happens to contain a NUL-byte, and the instruction at system()-5 is “harmless” (it just moves zero to edi, and does not dereference memory or anything else that might cause a crash).

There is, however, an even better way to do it. When “ulimit -s unlimited” / setrlimit(RLIMIT_STACK, {RLIM_INFINITY}) is used to make the stack as large as possible, glibc will always be mapped at the same address. This means we don’t have to do any bruteforcing at all, so our exploit will always work on the first attempt. In this case, the cookie at %gs:0x14 moves to a fixed address mapped after glibc instead of before it though.

Since my test program and the vulnerable programs are both linked to the same libraries, the addresses will be the same.

The last address we need to determine is the address to the GOT-entry for __stack_chk_fail():

The final version of my exploit is as follows:

This is the output when running it:

The final stage of the GCHQ challenge was a small (5kB) x86 Windows/cygwin binary (available here). Analyzing it in IDA Pro, I could see that it expects a 24 byte license file with the following format:

The above could be deduced from the following reverse-engineered code:

If a valid license is found, it calls a function that requests the following URI from the specified host (which should obviously still be www.canyoucrackit.co.uk): /hash/A/B/C/key.txt

Where:

  • A = 32-bit hex key from stage 1
  • B = 32-bit hex key #1 from stage 2
  • C = 32-bit hex key #2 from stage 2

Note that the hash itself is used, and not the password corresponding to the hash. Misdirection, again, in order to waste some time for those who didn’t bother to actually understand the code.

It’s now clear that we need two 32-bit values that somehow relates to stage 2, and the so called “firmware” array is a rather obvious choice. It consists of two 32-bit values, and it wasn’t used in the stage2 challenge itself. We also need one 32-bit value from stage 1. Remembering that the payload actually starts out with a “useless” jump over four bytes (e.g a 32-bit value) that are never used in any way, this is quite an obvious choice as well.

This gives us the following URL:
http://www.canyoucrackit.co.uk/hqDTK7b8K2rvw/a3bfc2af/d2ab1f05/da13f110/key.txt

Which contains the following key:
Pr0t3ct!on#cyber_security@12*12.2011+

When entering the correct keyword, we get this page:
http://www.canyoucrackit.co.uk/soyoudidit.asp

“So you did it. Well done! Now this is where it gets interesting. Could you use your skills and ingenuity to combat terrorism and cyber threats? As one of our experts, you’ll help protect our nation’s security and the lives of thousands. Every day will bring new challenges, new solutions to find – and new ways to prove that you’re one of the best.”

Interestingly, the salary for the “Cyber Specialist” position at GCHQ is £25,446 for a GC10 (Executive Officer) and £31,152 for a GC9 (Higher Executive Officer). Comparing this with the salaries in the corporate world makes it quite clear why GCHQ has to try so hard to find competent people to recruit. After all, for people with skills there are lots of opportunities for interesting work with a much higher paycheck. For juniors with potential, but no real experience, it might be an interesting opportunity though. I’m sure that the work can be quite stimulating. :)

After cracking stage 1 of the GCHQ challenge, we get the URL to the Javascript code available here.

It defines a virtual machine with four general registers, two segment registers, one flag register and of course the instruction pointer. It also includes an array of two values named “firmware”, which seems rather strange. These values are not used, nor mentioned in the definition of the virtual machine that is included in a comment further down in the javascript code. The use for these values will actually become apparent in the third and final stage.

Besides specifying the register values and an array with the current memory contents, there is a 54 line comment describing the virtual machine architecture, the instruction encoding and the instruction set:

As you can see, it then throws an exception saying that VM.exec is not yet implemented. Our job is to implement this function, based on the information provided in the comment, and executing the embedded code in our virtual machine to see what it reveals. Although I don’t really see how this would provide a demonstration of anything but quite basic skills, I decided to play along and continue with the challenge. I decided to make a C implementation instead though.

First, I decided to only implement a disassembler and reverse-engineer the code by hand instead of actually implementing the virtual machine. After finding out that the initial piece of code actually decrypts the next part of the code, and that the decrypted code decrypts a string, I decided to actually implement the VM instead of just a disassembler.

Corresponding C-code:

The decrypted code is as follows, and quite similar to the previous code:

Corresponding C-code (sort of):

The instructions at 0x0112-0x0114 are not necessary, and could either be there for misdirection or possibly to indicate a hidden challenge. Further signs of this lies in the fact that there are a couple of instructions further down that are never executed:

There is also a 0xcc at 0x0140, that isn’t even a valid a valid instruction in the instruction set for the virtual machine, but corresponds to a breakpoint in x86 assembler. Last but not least there are large chunks of memory that are neither read, written or executed. First of all, there are some zero bytes that are obviously just there for filling. What is more suspicious are these:

There are some interesting patterns in this data, especially when interpreting it as three 112 bytes chunks. Each chunk consists of 20 bytes < 0x80, 25 bytes >= 0x80, 26 bytes < 0x80, 26 bytes >= 0x80 and finally 15 bytes < 0x80. In other words, there is a pattern in which bytes have the most significant bit set in each of these chunks. This pattern could be the result of each chunk being XOR-encoded using the same algorithm as has been used in the previous code, with some specific start and step values. I decided not to waste any more time on this track unless the challenge seemed to lead to a dead end though. After the second decryption loop it will reveal the following string at 0x01b0:

Yet another URL, this time to an x86 Windows/cygwin executable. To get this string I could just have dumped the virtual machine memory after execution, since I had already reverse-engineered the code I decided to get each character after it has been decoded by reading r0 at ip = 0x010c (0x010c == 268), using the code available here.

Note that I have not bothered to implement any bounds checking when reading values from memory, so running the VM on arbitrary/random code will most likely result in a crash. :) In this case I had already seen that the code was “well behaved” and just wanted to go on to the next stage of the challenge though.

I heard about “the code” at www.canyoucrackit.co.uk during the first friday of december (2011-12-02), and cracked the final stage on sunday two days later. The reason for not cracking it all during friday evening was, unfortunately, not because it presented much of a challenge but because I was out partying pretty much 24/7 during friday and saturday (including after parties until dawn both nights). :)

I didn’t want to publish any writeup until the challenge was over, even though I heard that there were others who didn’t bother waiting. As a man who loves challenges of all kinds, I would hate to be the one spoiling it for someone who actually wanted to give it a try though.

The first part of the code was this picture:

I recognized it as x86-assembler at the first glance, because of the “eb 04” (jmp $+6), the multiple “31 cX” (xor reg, reg), the “90”:s in the end (NOPs) and last but not least the “cd 80” (int 0x80) to mention a few of the immediately recognizable opcodes. The int 0x80 also tells me that this is most likely a Linux-based payload, since 0x80 is the interrupt used for making system calls in Linux. Looking at the code, I could see that it actually just calls exit(0) after the relevant code has been executed.

By looking at the code in a disassembler I could see that it has a decode-loop, that looks very much like the RC4-cipher (initializing a 256 byte buffer with values 0 to 255, swapping values around based on what corresponds to the RC4-key (0xdeadbeef) followed by a decryption loop. Not that it really matters which crypto is being used, since the key is embedded into the code.

Looking closer at the decryption loop we can see that it is actually slightly different from the original RC4 cipher, even though the same key scheduling is used.

Original RC4 can be implemented like this:

This implementation changes j to the xor-byte in each iteration:

This is actually pretty funny, considering that GCHQ themselves have published an (very brief) explanation of the challenge now after it’s over, which describes the crypto as RC4. So from what I can tell, this is an unintentional bug rather than by design. A mistake that is easily made when you’re hand coding assembler, and not being careful of which registers are used for what, but a rather silly mistake nonetheless.

Anyway. Analyzing the code further we can see that the payload seems to be missing some important parts, like the buffer that is to be decrypted for instance. This, on the other hand, is definitely by design. :)

See the “AAAA” (41 41 41 41) at the end of the payload? That’s a tag that is checked by this piece of code:

So far so good. If the tag isn’t there, it jumps ahead to the part of the code that calls exit(0):

Then it continues with checking for another tag (“BBBB” = 42 42 42 42), this tag is _not_ included in the payload:

Of course, we can just append this tag to the code to bypass it. This doesn’t really help us much though, since the code then continues with reading a byte count to decrypt, followed by the actual encrypted data:

After this follows the code that actually performs the decryption, and finally it calls exit(0) just like when the tag on the stack is not found. The exit(0) could be replaced by code that writes the decrypted data to stdout, or we can simply set a breakpoint there (or manually insert a 0xcc = int3 there instead) so we can read the decrypted data in a debugger.

So, where is the missing data? My first thought was that the data is probably either hidden in the HTML code of the www.canyoucrackit.co.uk page, or that it is hidden within the image of the payload. Hiding it within the image of the payload seemed pretty likely, since it actually was a bit odd that they used an image at all instead of plain text that would allow for copy & paste instead of manually writing the payload down from the image.

There are several ways to hide data within an image, some more refined than others. The easiest way to do it is to simply inject the data to the hide into some form of meta data. Using “exiftool” we can extract the following text field (although I first noticed it with a simple “strings”):

This is quite obviously base64-encoded data, judging from the “==” at the end and the range of characters being used. The following command line reveals its decoded contents:

As you can see, it starts with the “BBBB” that the payload looked for after the end of the payload, followed by an 32-bit LSB integer (32 00 00 00 = 0x00000032 = 50 = The size of the encrypted buffer), and finally the 50 bytes of encrypted data.

In other words, the full payload is as follows:

By embedding this payload into a small C program (available here), and injecting a breakpoint at the exit(0) call by manually changing the “cd” in “cd 80” (int 0x80, the system call interrupt) to “cc” (int3) lets me read the decrypted string in a debugger.

This gives us the URL for the next stage of the challenge (available here).

This is my writeup for the twenty-fifth challenge in the PlaidCTF 2011 competition. The information for the challenge was:

“Amalgamated has banned the use of Solitaire due to loss of productivity.
The only employee who would write a new game for everyone only likes ‘retro’ games, and has placed a text-adventure version of pacman on a company server.
We don’t believe he could have coded this securely, and the server contains a vital key.
Connect to the game here and find the key.
nc a9.amalgamated.biz 60123”

There was no binary available for download, so this level was actually completely blind. Connecting to the game we get:

Typing “help” gave a list of valid commands:

Walking around in the maze eating pills we notice a few things. First I noticed that after eating 16 pills, each increasing my number of points with 8, my number of points suddenly becomes -128. This indicates that a signed 8-bit integer is used to store the number of points.

We also notice that the number of power points seem to determine which item that is listed in our status. Each time we eat a powerpill the number of points increase with 40, and each time we make a move the number of power points (if any) decreases with one. The somehow odd “t” command which is described as “wait” decreases the number of power points with one as well, and surely this must be for a reason.

Moving on through the game eating pills and powerpills as we encounter them, we will sooner or later run into this situation:

We cannot eat another powerpill since this would put us over the maximum of 128. However, if a signed 8-bit integer is used for the number of powerpoints as well the real maximum should be 127. Let’s use the “t” command a few times until our number of powerpoints is 88, and let’s see what happens when we eat the powerpill:

Notice anything strange? Well, besides now having a negative number of powerpoints (which we sort of expected) we can also see that our item-string is now “n” instead of “The some-name Trophy!”. Since the number of powerpoints seemed to determine the item that is listed, it seems likely that the number of powerpoints is used as an index into an array of items/trophies and that when a negative index is being used (-128) it will point into some other string that in this case happened to be “n”.

So where does this “n” come from? Well, since “n” is one of the commands for this game it could certainly be a command string that I’ve entered earlier. It’s not the last command I’ve entered though, so if “n” is indeed a command it is probably stored in a command history buffer. Looking back through the scrollback we also notice:

See the “Command limit reached.” message? We got that on our 127:th command, which probably means that a history of 127 commands are being stored in an array. If this array is stored right before the item string array, the “n” listed as our item after getting a negative number of powerpoints may be taken from this command history array. So, let’s see what happens if we use “t” or some other command until the command limit is reached again:

As you can see I’ve jumped ahead a bit to show you how this bug is turned into an exploitable vulnerability. The very first command entered after the command limit has been reached is used as the item-string, and is actually being passed directly as the format string argument to presumably printf(). This allows to read memory, like we do above, or even write to arbitrary addresses using the %n format string specifier.

After the number of power points have been wrapped into -128, it stays there, so now we can just wrap around the command history buffer each time we want to use a different format string. I developed the following script for being able to play around, enter different format strings and examine their output: