The musl allocator was too slow, so we went to a company known for 🚀 Blazing Fast 🚀 software, Microsoft!
- Surg
This challenge was built using musl and uses a preloaded allocator called mimalloc, which is an allocator maintained by Microsoft that is compatible with libc. From the docker file we can see that it is version 2.2.4. The source code for this version is available on GitHub.
The challenge is a heap-notes style program, which means we can create, delete, look at, and update objects on the heap. All created notes are 128 bytes large. Looking and updating are done through the read and write functions which aren't vulnerable to buffer overwrites (they take in a size parameter, in this case 127). Looking closer, it becomes obvious there is a use-after-free: the delete function does nothing to stop other functions from using the note.
void delete() {
unsigned int number = get_index();
free(notes[number]); // No clean up done here
printf("Done!\n");
return;
}
void update() {
unsigned int number = get_index();
printf("Content? (127 max): ");
read(STDIN_FILENO, notes[number], NOTE_SIZE-1); // No check for freed block here
printf("Done!\n");
return;
}
Thus, we can write to the internal values of blocks in the freelist if the mimalloc allocator stores information in freed memory (hint: it does). The rest of this writeup will look at how the mimalloc allocator works, how to exploit it to get arbitrary read and write, and then finally how to pop a shell.
How do we set this up locally? After some futzing, what I did was:
libc.musl stored in the container.LD_PRELOAD and LD_LIBRARY_PATH environment variables, as well as the musl ld (ld-musl-x86_64.so.1), to run the binary locallygdb understands using the --args option and the env command (which allows setting environment variables per run)
the final command looked like this:gdb chal --args env LD_PRELOAD=./libmimalloc.so.2.2 LD_LIBRARY_PATH=$PWD ./ld-musl-x86_64.so.1 ./chal
If you install musl and use that instead of the container's library you can get symbols (including some musl and mimalloc symbols). When I moved to using the containers library, I could get symbols by breaking after it starts and then loading the file (file chal) and using vmmap to get the base offset of said symbols (in pwndbg).
The mimalloc source code is very messy (due to things like debug statements, ifdef sections, and similar), so the first thing we want to do is test it locally to see what it does. Running it in GDB, i did this:
note 0:
0x4cd9e010080: 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000...
note 1:
0x4cd9e010100: 0x000004cd9e010080 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000...
It lookes like the first long of note 1 points to note 0, which was freed first. It seems like the freelist appends to the front, so note 1 should be at the top of the list and we can write an address to note 1, make a note (putting what is in note 1 to the top) and make another note (putting the address we put into note 1 into the notes array). If this works, we will get arbitrary read and write at the address we put in note 1. What should our target be? Well, (I guessed with no real reason), the start of the mapped space probably has something from libmimalloc, so lets look there:
beginning of the heap's page
0x4cd9e000000: 0x0000000000000000 0x0000000000000001
0x0000000600010100 0x0000000000000101
0x0000000002000000 0x00007ffff7f49d40
0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000
Looking at this (installing musl with apt because it gave better symbols), we find that the library pointer looking value (0x00007ffff7f49d40) is mi_subproc_default from libmalloc. This allows us to find the base of libmalloc in memory.
If you try to exploit this in its current form, it won't work. Lets say you try this exploit:
0x4cd9e000000, create 2 blocks (block 2 and block 3)0x4cd9e010180, which is 128 bytes after note 1. Not only this, 0x4cd9e010200 is the value stored in note 3 instead...There are 2 structures that go into allocating a block for mimalloc: mi_heap_t and mi_page_t. Of these, mi_page_t is the one with the free list, called free. It also has another linked list called local_free, which we will talk about later:
struct mi_heap_s {
... // Unimportant for us
mi_page_t* pages_free_direct[MI_PAGES_DIRECT];.
mi_page_queue_t pages[MI_BIN_FULL + 1]; // queue of pages for each size class (or "bin")
};
typedef struct mi_page_queue_s {
mi_page_t* first;
mi_page_t* last;
size_t block_size;
} mi_page_queue_t;
typedef struct mi_page_s {
...
mi_block_t* free; // list of available free blocks (`malloc` allocates from this list)
mi_block_t* local_free; // list of deferred free blocks by this thread (migrates to `free`)
...
} mi_page_t;
Just looking at local_free's comment (which I did not do until it was too late) should give you an idea of what will happen.
Looking into mallocs definition in mimalloc, we are sent to the function _mi_page_malloc_zero, which uses page->free as the new chunk if its available and otherwise calls _mi_malloc_generic. I'm going to gloss over what I actually did to find this due to it being confusing and somewhat boring, but for a gist of the struggles it boiled down to flaky symbols with my GDB setup and mimalloc being heavily optimized (functions seemingly being rearranged by the compiler with jumps) making it confusing to find what function I was in. But, after some toying, this is what I found.
As stated before, the newly allocated note 3 (from the previous section) points to 0x4cd9e010200. If you were to look at the memory at that location, you would find it points to 0x4cd9e010280, another 128 bytes after. You can continue following this chain until you reach 0x4cd9e010f80, which will point to NULL. This looks like a linked list. This creates a hypothesis: local_free is where note 1 and note 0 live, and free is where all of these chunks which are allocated live.
Looking further into mimalloc for where local_free is used, this seems to be the case:
void _mi_page_free_collect(mi_page_t* page, bool force) {
...
if mi_likely(page->free == NULL) {
// usual case
page->free = page->local_free;
page->local_free = NULL;
page->free_is_zero = false;
}
...
}
Looking in _mi_malloc_generic for where this is called, we find the function mi_find_page calls the function mi_find_free_page, which calls _mi_page_free_collect if the page already exists. This happens every allocation, so there's nothing we need to do to cause it beside getting page->free == NULL to be true.
From the fact the chunk 0x4cd9e010f80 points to null, we can deduce that page->free == NULL will be true after 0x1000 / 0x80 = 32 allocations, therefore we can get this sequence to leak libmimalloc:
0xFFFFF to get our heap page baseWe run this, and it succeeds!
[+] Opening connection to doremi.chal.uiuc.tf on port 1337: Done
# Creating and freeing chunks...
# Getting heap leak...
# Writing target address...
# Setting free to local_free...
# Creating target notes...
# Reading chunk 3 (mimalloc leak note)...
00000000000000000100000000000000000101000600000001010000000000000000000200000000403dc29f367f
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000ffffffffffffff
'mi_subproc_default`: 0x7f369fc23d40
libmimalloc base: 0x7f369fbf5000
After all this, all we have is leak to mimalloc, which doesn't have any obvious virtual tables or other targets. If we an find musl's offset, then we have a better chance of finding something exploitable. How do we find musl> As with a lot of the rest of this challenge, testing, guessing, and checking will help. First, in pwndbg, we can use vmmap to check if libmimalloc is at a constant offset to libc.musl. You can disable ASLR by using set disable-randomization off. You will find that it is a constant offset. In GDB this was 0x33000. For some reason, running it outside of GDB and using proc/<pid>/maps gave a different offset, but if we check it on remote and it gives us the expected value (say, a stack location) then we know which one is correct. Running it with offset 0x33000:
# Reading environ...
c8e28719ff7f00006900bf6ded0433dd000000000000000000000000000000000000000000000000000000000000
00000000000000000000afff8719ff7f0000baff8719ff7f0000fffb8b1700000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000
0x7fff1987e2c8 looks like a stack address (which we expect environ to be, and for most other things to not be). So, on remote 0x33000 is the correct offset.
Now we need a target. As me using environ as a leak target may suggest, I was originally planning on using environ and then creating a ROP chain to get a shell. However, the way you do this is by doing the ROP on the update functions stack frame (which I forgot you could do somehow LOL). At the time, I was thinking you were blocked from doing this as main never returned, but instead used exit. I did not think about this further, because I decided to exploit exit handlers instead.
Looking through some previous CTF writeups, the struct that governs the atexit functionallity of musl is struct fl in the file atexit.c. Its quite simple:
static struct fl
{
struct fl *next;
void (*f[COUNT])(void *);
void *a[COUNT];
} builtin, *head;
To understand it fully, lets look at what is called when the program exits (abridged without LOCk and UNLOCK noise):
static int slot;
...
void __funcs_on_exit()
{
void (*func)(void *), *arg;
for (; head; head=head->next, slot=COUNT) while(slot-->0) {
func = head->f[slot];
arg = head->a[slot];
func(arg);
}
}
If slot is greater than zero, it wil go through f and a, paring each function with its arguement. If we can overwrite the first func and the first arg, and also overwrite slot to 1, then we can call system("/bin/sh"). The only annoying problem is that struct fl is 520 bytes long and bigger than a single chunk, but this can be solved by making multiple chunks in a row (remember, each chunk is ox80 large and 0x80 apart from each other, so they are contiguous without any meta-data).
Here is the general outline:
/bin/sh in it and will be put into the a array in the struct fl. Chunks 1 through 3 will be used as a fake struct fl and will be filled with a pointer to system and a pointer to chunk 4. Chunk 0 will contain the address for chunk one, and chunk 3 will contain the address for chunk 4./bin/sh is written to chunk 4, system is written to chunk 1, and chunk 4's address is written to chunk 3.head global variable in atexit.c (this can be found by disassembling the __funcs_on_exit function in GDB), using the exploit we used to gain a libmimalloc leak
head - 24. This is because we want free to be empty after our note is created, otherwise we won't be able to allocate again. head - 24 is the first 8 bytes which are all 0x00 and won't cause a crashslots variableheadexit by sending an invalid indexUsing this, we properly pop a shell and can get the flag:
# Overwriting head and slots...
[*] Switching to interactive mode
$ a
Invalid Input.
$ ls
chal
flag
libmimalloc.so.2.2
The solve script is a little too long for this write up, but its hosted on Github Gists: solve.py