Joel Eriksson
Vulnerability researcher, exploit developer and reverse-engineer. Have spoken at BlackHat, DefCon and the RSA conference. CTF player. Puzzle solver (Cicada 3301, Boxen)

CanYouCrackIt.co.uk / GCHQ Challenge Solution – Stage 2

After cracking stage 1 of the GCHQ challenge, we get the URL to the Javascript code available here.

It defines a virtual machine with four general registers, two segment registers, one flag register and of course the instruction pointer. It also includes an array of two values named “firmware”, which seems rather strange. These values are not used, nor mentioned in the definition of the virtual machine that is included in a comment further down in the javascript code. The use for these values will actually become apparent in the third and final stage.

Besides specifying the register values and an array with the current memory contents, there is a 54 line comment describing the virtual machine architecture, the instruction encoding and the instruction set:

  exec: function()
  {
    // virtual machine architecture
    // ++++++++++++++++++++++++++++
    //
    // segmented memory model with 16-byte segment size (notation seg:offset)
    //
    // 4 general-purpose registers (r0-r3)
    // 2 segment registers (cs, ds equiv. to r4, r5)
    // 1 flags register (fl)
    //
    // instruction encoding
    // ++++++++++++++++++++
    //
    //           byte 1               byte 2 (optional)
    // bits      [ 7 6 5 4 3 2 1 0 ]  [ 7 6 5 4 3 2 1 0 ]
    // opcode      - - -             
    // mod               -           
    // operand1            - - - -
    // operand2                         - - - - - - - -
    //
    // operand1 is always a register index
    // operand2 is optional, depending upon the instruction set specified below
    // the value of mod alters the meaning of any operand2
    //   0: operand2 = reg ix
    //   1: operand2 = fixed immediate value or target segment (depending on instruction)
    //
    // instruction set
    // +++++++++++++++
    // 
    // Notes:
    //   * r1, r2 => operand 1 is register 1, operand 2 is register 2
    //   * movr r1, r2 => move contents of register r2 into register r1
    // 
    // opcode | instruction | operands (mod 0) | operands (mod 1)
    // -------+-------------+------------------+-----------------
    // 0x00   | jmp         | r1               | r2:r1
    // 0x01   | movr        | r1, r2           | rx,   imm 
    // 0x02   | movm        | r1, [ds:r2]      | [ds:r1], r2
    // 0x03   | add         | r1, r2           | r1,   imm
    // 0x04   | xor         | r1, r2           | r1,   imm 
    // 0x05   | cmp         | r1, r2           | r1,   imm 
    // 0x06   | jmpe        | r1               | r2:r1
    // 0x07   | hlt         | N/A              | N/A
    //
    // flags
    // +++++
    // 
    // cmp r1, r2 instruction results in:
    //   r1 == r2 => fl = 0
    //   r1 < r2  => fl = 0xff
    //   r1 > r2  => fl = 1
    // 
    // jmpe r1
    //   => if (fl == 0) jmp r1
    //      else nop
    
    throw "VM.exec not yet implemented";
  }

As you can see, it then throws an exception saying that VM.exec is not yet implemented. Our job is to implement this function, based on the information provided in the comment, and executing the embedded code in our virtual machine to see what it reveals. Although I don’t really see how this would provide a demonstration of anything but quite basic skills, I decided to play along and continue with the challenge. I decided to make a C implementation instead though.

First, I decided to only implement a disassembler and reverse-engineer the code by hand instead of actually implementing the virtual machine. After finding out that the initial piece of code actually decrypts the next part of the code, and that the decrypted code decrypts a string, I decided to actually implement the VM instead of just a disassembler.

0000:  movr r1, 0x4       ; r1 = 4 = Start of loop
0002:  movr r3, 0xaa      ; r3 = 170 = Value to XOR with
0004:  movm r0, [ds:r2]   ; Get encrypted byte
0006:  xor  r0, r3        ; Use XOR with r3 to decrypt
0008:  movm [ds:r2], r0   ; Write decrypted byte back to memory
000a:  add  r2, 0x1       ; Increment pointer to next encrypted byte
000c:  add  r3, 0x1       ; Increment value to XOR with
000e:  cmp  r2, 0x50      ; Compare pointer to end of encrypted data
0010:  movr r0, 0x14      ; Point r0 to instruction after decryption loop
0012:  jmpe r0            ; Jump to r0 if decryption is completed (r2 == 0x50)
0013:  jmp  r1            ; Jump to r1 (start of loop = 0004)
0014:  xor  r0, r0        ; Decryption completed. Set r0 = 0
0016:  jmp  0x10:r0       ; Jump to 0x10:0 = 0x0100 = The buffer that was just decrypted 

Corresponding C-code:

key = 0xaa;
for (i = 0; i < 0x50; i++)
        buf[i] ^= key++;

The decrypted code is as follows, and quite similar to the previous code:

0100:  movr r2, 0x0       ; r2 = 0
0102:  add  ds, 0xc       ; ds = ds + 0xc = 0x1C
0104:  movr r1, 0x8       ; r1 = 8 = Start of loop
0106:  movr r3, 0x32      ; r3 = 50 = Value to XOR with
0108:  movm r0, [ds:r2]   ; Get encrypted byte
010a:  xor  r0, r3        ; Use XOR with r3 to decrypt
010c:  movm [ds:r2], r0   ; Write decrypted byte back to memory
010e:  add  r2, 0x1       ; Increment pointer to next encrypted byte
0110:  add  r3, 0x3       ; Increment value to XOR with by three
0112:  cmp  r2, 0x0       ; Compare pointer with 0, indicating that it has wrapped (it's only a byte)
0114:  jmpe r3            ; Jump to r3 if it has wrapped. This will never happen...
0115:  cmp  r0, 0x0       ; Compare decrypted byte with 0
0117:  movr r0, 0x1b      ; r0 = 0x1b
0119:  jmpe r0            ; If decrypted byte was 0, jump to 0x011b = Instruction after the loop
011a:  jmp  r1            ; Jump to r1 (start of loop = 0x0108)
011b:  hlt                ; Halt execution

Corresponding C-code (sort of):

key = 0x32;
do {
        buf[i] ^= key;
        key += 3;
} while (buf[i]);

The instructions at 0x0112-0x0114 are not necessary, and could either be there for misdirection or possibly to indicate a hidden challenge. Further signs of this lies in the fact that there are a couple of instructions further down that are never executed:

0132:  add  ds, 0x10
0134:  jmp  r1

There is also a 0xcc at 0x0140, that isn't even a valid a valid instruction in the instruction set for the virtual machine, but corresponds to a breakpoint in x86 assembler. Last but not least there are large chunks of memory that are neither read, written or executed. First of all, there are some zero bytes that are obviously just there for filling. What is more suspicious are these:

0150:  7D 1F 15 60 4D 4D 52 7D 0E 27 6D 10 6D 5A 06 56   }..`MMR}.'m.mZ.V
0160:  47 14 42 0E B6 B2 B2 E6 EB B4 83 8E D7 E5 D4 D9   G.B.............
0170:  C3 F0 80 95 F1 82 82 9A BD 95 A4 8D 9A 2B 30 69   .............+0i
0180:  4A 69 65 55 1C 7B 69 1C 6E 04 74 35 21 26 2F 60   JieU.{i.n.t5!&/`
0190:  03 4E 37 1E 33 54 39 E6 BA B4 A2 AD A4 C5 95 C8   .N7.3T9.........
01a0:  C1 E4 8A EC E7 92 8B E8 81 F0 AD 98 A4 D0 C0 8D   ................
01b0:  AC 22 52 65 7E 27 2B 5A 12 61 0A 01 7A 6B 1D 67   ."Re~'+Z.a..zk.g
...
0200:  37 7A 07 11 1F 1D 68 25 32 77 1E 62 23 5B 47 55   7z....h%2w.b#[GU
0210:  53 30 11 42 F6 F1 B1 E6 C3 CC F8 C5 E4 CC C0 D3   S0.B............
0220:  85 FD 9A E3 E6 81 B5 BB D7 CD 87 A3 D3 6B 36 6F   .............k6o
0230:  6F 66 55 30 16 45 5E 09 74 5C 3F 29 2B 66 3D 0D   ofU0.E^.t\?)+f=.
0240:  02 30 28 35 15 09 15 DD EC B8 E2 FB D8 CB D8 D1   .0(5............
0250:  8B D5 82 D9 9A F1 92 AB E8 A6 D6 D0 8C AA D2 94   ................
0260:  CF 45 46 67 20 7D 44 14 6B 45 6D 54 03 17 60 62   .EFg }D.kEmT..`b
0270:  55 5A 4A 66 61 11 57 68 75 05 62 36 7D 02 10 4B   UZJfa.Whu.b6}..K
0280:  08 22 42 32 BA E2 B9 E2 D6 B9 FF C3 E9 8A 8F C1   ."B2............
0290:  8F E1 B8 A4 96 F1 8F 81 B1 8D 89 CC D4 78 76 61   .............xva
02a0:  72 3E 37 23 56 73 71 79 63 7C 08 11 20 69 7A 14   r>7#Vsqyc|.. iz.
02b0:  68 05 21 1E 32 27 59 B7 CF AB DD D5 CC 97 93 F2   h.!.2'Y.........
02c0:  E7 C0 EB FF E9 A3 BF A1 AB 8B BB 9E 9E 8C A0 C1   ................
02d0:  9B 5A 2F 2F 4E 4E 00 00 00 00 00 00 00 00 00 00   .Z//NN..........

There are some interesting patterns in this data, especially when interpreting it as three 112 bytes chunks. Each chunk consists of 20 bytes < 0x80, 25 bytes >= 0x80, 26 bytes < 0x80, 26 bytes >= 0x80 and finally 15 bytes < 0x80. In other words, there is a pattern in which bytes have the most significant bit set in each of these chunks. This pattern could be the result of each chunk being XOR-encoded using the same algorithm as has been used in the previous code, with some specific start and step values. I decided not to waste any more time on this track unless the challenge seemed to lead to a dead end though. After the second decryption loop it will reveal the following string at 0x01b0:

GET /da75370fe15c4148bd4ceec861fbdaa5.exe HTTP/1.0

Yet another URL, this time to an x86 Windows/cygwin executable.

To get this string I could just have dumped the virtual machine memory after execution, since I had already reverse-engineered the code I decided to get each character after it has been decoded by reading r0 at ip = 0x010c (0x010c == 268), using the code available here.

Note that I have not bothered to implement any bounds checking when reading values from memory, so running the VM on arbitrary/random code will most likely result in a crash. :) In this case I had already seen that the code was “well behaved” and just wanted to go on to the next stage of the challenge though.