Implementing syscall hooks in Rust

Hooking syscalls

Injecting our DLL on process creation

You can keep tabs on my progress by starring / watching my repo.

I have moved the Ghost Hunting detection into the kernel - you can see the merge of the refactor here, or to see the specific source file, check here.

The info in this blog re syscall hooking remains the same - its the detection mechanism that has moved.

The first step of this process is to inject our EDR’s DLL into all new processes. Well, that would be nice but whilst testing I don’t want to cause mass system instability so instead we will just inject into notepad. This will then be expanded to inject into our test / dummy malware so that we can test injection into notepad.

As we have our kernel driver component, we notify the Usermode Engine of a new process being created through IOCTL requests made every 60 ms. I use my wdk-mutex create (which you can check out here) to handle asynchronous access to the underlying structures. To be honest; this process and how I achieved new process callbacks does demand its own post.

Until then, I’ll provide a short summary of what I did.

In the driver component, we register a callback routine with the kernel, so anytime a new process is created our callback function is executed. Microsoft have made a nice handy function that we can call to make this registration, it’s actually really easy to do.

In my driver, that looks like:

let res = PsSetCreateProcessNotifyRoutineEx(Some(core_callback_notify_ps), FALSE as u8);

As you may expect, core_callback_notify_ps is a function which essentially gathers information from the PS_CREATE_NOTIFY_INFO struct, and adds it to an internal cache protected by a mutex from wdk-mutex.

Every 60 ms an IOCTL call is made from the usermode engine down to the driver; in response the driver will send information about newly created processes. I use a two-call style IOCTL where the first call gets the size of the required buffer for the response, and the second call fills the buffer with the data.

Once the usermode engine receives the data, it can then start processing it. A check is done to determine whether the driver has told us a new process has been created, and if it has, we go ahead and call the onboard_new_process function.

let process_creations = driver_messages.process_creations;
if !process_creations.is_empty() {
   for p in process_creations {
      if self.process_monitor.write().await.onboard_new_process(&p).await.is_err() {
            logger.log(LogLevel::Error, &format!("Failed to add new process to live processes. Process: {:?}", p));
      }
   }
}

Within the onboard_new_process function, we inject our EDR’s DLL into the newly created process using no fancy tricks, other than from my write up in this post.

Resolving syscall callback function addresses

So; our DLL has now been injected into the target process. From the DLL’s perspective we need to create a new thread out of DLL Main to set itself up. This initialisation routine should resolve the virtual addresses of the callback functions to be executed when a syscall we wish to intercept is called.

We will use those addresses to then make an unconditional jump via assembly to the address within our DLL, where we can then start assessing parameters and the environment.

We will save the address as a usize which should help ³²⁄₆₄ bit compatibility (although our use of the rax register below is not 32-bit friendly!!).

This initialisation looks like the following:

/// A structure to hold the stub addresses for each callback function we wish to have for syscalls.
/// 
/// The address of each function within the DLL will be used to overwrite memory in the syscall, allowing us to jmp
/// to the address.
pub struct StubAddresses {
    open_process: usize,
}

impl StubAddresses {
    /// Retrieve the virtual addresses of all callback functions for the DLL
    fn new() -> Self {

        // Get a handle to ourself
        let h_kernel32 = unsafe { GetModuleHandleA(s!("sanctum.dll")) };
        let h_kernel32 = match h_kernel32 {
            Ok(h) => h,
            Err(_) => todo!(),
        };

        //
        // Get function pointers to our callback symbols
        //

        // Get a function pointer to LoadLibraryA from Kernel32.dll
        let open_process_fn_addr = unsafe { GetProcAddress(h_kernel32, s!("open_process")) };
        let open_process_fn_addr = match open_process_fn_addr {
            None => {
                unsafe { MessageBoxA(None, s!("Could not get fn addr"), s!("Could not get fn addr"), MB_OK) };
                todo!();
            },
            Some(address) => address as *const (),
        } as usize;

        Self {
            open_process: open_process_fn_addr,
        }
    }
}

/// Injected DLL routine for examining the arguments passed to ZwOpenProcess and NtOpenProcess from 
/// any process this DLL is injected into.
#[unsafe(no_mangle)]
unsafe extern "system" fn open_process(
    process_handle: HANDLE,
    desired_access: u32,
    // We do not care for now about the OA
    _: *mut c_void,
    // We do not  care for now about the client id
    _: *mut c_void,
) {
    // start off by causing a break in the injected process indicating we successfully called our function!
    unsafe {asm!("int3")};
}

The variable open_process_fn_addr will hold the resolved function address of our open_process symbol; and in theory we can then load this address into a register and call jmp, moving execution to our function.

To check that we did in fact resolve the address correctly, we can print the address out via a MessageBoxA (debugging this DLL is going to be nasty in the future and I can feel it!):

Function pointer

As you can see, the method worked!

Redirecting execution to the callback

Now we have our callback created, we need to alter the flow execution so that we can move the instruction pointer to our callback function.

First, to test this, lets try a manual assembly jmp from somewhere we easily control to the function’s virtual address.

What we will do, is after calling StubAddresses::new() (which is where we resolve the function pointer to the function open_process), we will use some inline assembly to jump to the memory address. Note there is a commented int3 which will be used shortly to verify our assembly.

/// Initialise the DLL by resolving function pointers to our syscall hook callbacks.
unsafe extern "system" fn initialise_injected_dll(_: *mut c_void) -> u32 {

    let stub_addresses = StubAddresses::new();
    let x = format!("Addr: {}\0", stub_addresses.open_process);

    unsafe {
        MessageBoxA(None, PCSTR::from_raw(x.as_ptr()), PCSTR::from_raw(x.as_ptr()), MB_OK);
    }

    // test jump to open_process
    unsafe {
        asm!(
            // "int3",
            // push whatever is in eax onto the stack
            "push rax",
            // move our VA into eax
            "mov rax, {x}",
            // unconditional jump
            "jmp rax",
            // bring eax back from the stack
            "pop rax",
            x = in(reg) stub_addresses.open_process
        );
    }

    STATUS_SUCCESS.0 as _
}

As a reminder, our open_process function is as follows (4 args in, and an int3 instruction to break execution):

#[unsafe(no_mangle)]
unsafe extern "system" fn open_process(
    process_handle: HANDLE,
    desired_access: u32,
    // We do not care for now about the OA
    _: *mut c_void,
    // We do not  care for now about the client id
    _: *mut c_void,
) {
    // start off by causing a break in the injected process indicating we successfully called our function!
    unsafe {asm!("int3")};
}

Let’s run this, attach a debugger to Notepad.exe and see what happens:

open_process stub

Lo! As you can see we did successfully jump to the address of open_process in our DLL! Awesome!!

Now, to uncomment the int3 instruction in the initialise_injected_dll function to see what that looks like in terms of the generated assembly:

Comparing assembly to Rust source code

You can quite clearly see the transposition of the inline assembly to the binary. In fact, this has shown some redundant instructions as the address of open_process is already in in the rax register making the mov rax, rax obsolete. Personally, I wouldn’t have faith in the compiler making its own optimisations around things like this, so I would rather be explicit to ensure it is included in the binary. Perhaps that is a minor optimisation for the future!

Finally to take the safety wheels off - let’s remove the int3 from both the callback and the inline asm in initialise_injected_dll and make sure we don’t crash notepad…

The function now looks as follows, if we get a popup saying “Bazinga” and nothing crashes, then in the words of Shania Twain, looks like we made it.

/// Initialise the DLL by resolving function pointers to our syscall hook callbacks.
unsafe extern "system" fn initialise_injected_dll(_: *mut c_void) -> u32 {

    let stub_addresses = StubAddresses::new();

    // test jump to open_process
    unsafe {
        asm!(
            // push whatever is in eax onto the stack
            "push rax",
            // move our VA into eax
            "mov rax, {x}",
            // unconditional jump
            "jmp rax",
            // bring eax back from the stack
            "pop rax",
            x = in(reg) stub_addresses.open_process
        );
    }

    unsafe {
        MessageBoxA(None, PCSTR::from_raw("Bazinga!\0".as_ptr()), PCSTR::from_raw("Bazinga!\0".as_ptr()), MB_OK);
    }

    STATUS_SUCCESS.0 as _
}

That didn’t work for two reasons. First, we need to replace the jmp instruction with a call so the return from our callback returns to the correct location (call will push the return address to the stack). Using call we don’t have to worry about that ourselves.

Secondly, the push rax instruction is potentially misaligning the stack - so we will remove the push and pop instructions, and worry about saving state properly later.

And now:

Assembly correct

Why are we using assembly

At this point you may be wondering why we are using assembly to call the function pointer instead of just calling it through Rust. In the next section where we will look to hook the syscall, we will have to directly overwrite the machine code in Ntdll.dll with our own machine code which performs the call to our hook. At this low level, we can no longer deal with a ‘mid-level’ language such as Rust, C & C++.

By making a simple proof of concept in a function we can control with inline assembly, we can then take the bytes (machine code) this outputs and use that as our template for patching the syscall.

Hooking the syscall

Before we get into the meat; here is a video of this section showing the working POC. I’d suggest giving that a watch before (or after) reading as it always helps to see some visual context around theory heavy topics like this.

So, thinking this through logically, in order to hook the syscall we need to do several things.

One - We will need to suspend all threads except our own thread of execution. Seeing as we are messing with the memory of the process, we don’t want other threads accessing partially modified memory until our modifications are complete.

Two- At runtime we will need to convert the virtual address of the ‘callback’ function (in this case, open_process) into 8 bytes in little endian (LE) form, and read each byte into a buffer.

Three - We will need to overwrite the hooked function’s instructions with our own machine code (including the LE address of our callback function).

Four - When all memory patches are done - resume the suspended threads.

Whilst above we changed our jmp instruction to a call to make it work in the context of the POC, we will need to go back to using a jmp. Why? We don’t want to change the registers / stack frame at all, we are simply redirecting execution. This may change later once we start inspecting parameters, adding new stack variables etc, but I don’t think it will matter. I do not want to resume execution back to NTDLL after we make the syscall, so the ret instruction should take the instruction pointer back to where the call to ZwOpenProcess originated. At least, that is the theory for this step of the process.

So, first - lets enumerate all threads in the current process and suspend them. To do this we can write a few functions as follows:

/// Enumerate all threads in the current process
/// 
/// # Returns
/// A vector of thread ID's
fn get_thread_ids() -> Result<Vec<u32>, ()> {
    let pid = unsafe { GetCurrentProcessId() };
    let snapshot = unsafe { CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, pid) };
    let snapshot = match snapshot {
        Ok(s) => s,
        Err(_) => return Err(()),
    };

    let mut thread_ids: Vec<u32> = vec![];
    let current_thread = unsafe { GetCurrentThreadId() };

    let mut thread_entry = THREADENTRY32::default();
    thread_entry.dwSize = std::mem::size_of::<THREADENTRY32>() as u32;

    if unsafe { Thread32First(snapshot,&mut thread_entry)}.is_ok() {
        loop {

            if thread_entry.th32OwnerProcessID == pid {
                // We dont want to suspend our own thread..
                if thread_entry.th32ThreadID != current_thread {
                    thread_ids.push(thread_entry.th32ThreadID);
                }
            }
            
            if !unsafe { Thread32Next(snapshot, &mut thread_entry) }.is_ok() {
                break;
            }
        }
    }

    Ok(thread_ids)
}

/// Suspend all threads in the current process except for the thread executing our EDR setup (i.e. the current thread)
/// 
/// # Returns
/// A vector of the suspended handles
fn suspend_all_threads(thread_ids: Vec<u32>) -> Vec<HANDLE> {
    let mut suspended_handles: Vec<HANDLE> = vec![];
    for id in thread_ids {
        let h = unsafe { OpenThread(THREAD_SUSPEND_RESUME, false, id) };
        match h {
            Ok(handle) => {
                unsafe { SuspendThread(handle)};
                suspended_handles.push(handle);
            },
            Err(e) => {
                unsafe {
                    let x = format!("Error with handle: {:?}\0", e);
                    MessageBoxA(None, PCSTR::from_raw(x.as_ptr()), PCSTR::from_raw(x.as_ptr()), MB_OK);
                }
            },
        }
    }

    suspended_handles
}

/// Resume all threads in the process
fn resume_all_threads(thread_handles: Vec<HANDLE>) {
    for handle in thread_handles {
        unsafe { ResumeThread(handle)};
        let _ = unsafe { CloseHandle(handle) };
    }
}

And to call them, from our thread we can do:

unsafe extern "system" fn initialise_injected_dll(_: *mut c_void) -> u32 {

    // get all thread ID's except the current thread
    let thread_ids = get_thread_ids();
    if thread_ids.is_err() {
        todo!()
    }
    let thread_ids = thread_ids.unwrap();
    let suspended_handles= suspend_all_threads(thread_ids);
    // ...
}

Next, to patch NTDLL, we need to make a series of memory writes.

First, we can clear out the current instructions from ZwOpenProcess and replace them with NOP’s. I do this to ensure a clean segment of memory to work with - I think this is nice just so we can see exactly what we are doing and what we are writing - it isn’t strictly necessary.

So, to clean out the memory, we can simply write a load of NOP’s (0x90) to the address of ZaOpenProcess, and I calculated the number of NOP’s required, so in the code, this looks as follows:

fn patch_ntdll(addresses: &StubAddresses) {
    let buffer: &[u8] = &[
        0x90, 0x90, 0x90,
        0x90, 0x90, 0x90,
        0x90, 0x90, 0x90,
        0x90, 0x90, 0x90,
        0x90, 0x90, 0x90,
        0x90, 0x90, 0x90,
        0x90, 0x90, 0x90,
        0x90, 0x90, 0x90,
        0x90, 0x90, 0x90,
        0x90, 0x90, 0x90,
        0x90, 0x90,
    ];

    let proc_hand = unsafe { GetCurrentProcess() };
    let mut bytes_written: usize = 0;
    let _ = unsafe {
        WriteProcessMemory(
            proc_hand, 
            addresses.ntdll.zw_open_process as *const _,
            buffer.as_ptr() as *const _, 
            buffer.len(), 
            Some(&mut bytes_written)
        )
    };
}

Looking at what this looks like now in the target process:

Nop instructions in memory

Next, we need to move the virtual address of our callback function into eax as we did in the previous section above; this involves a little bit manipulation to get the relevant bytes out of the integer and we order them in LE style. Seeing as we are moving an 8 byte pointer into eax, instead of mov we will actually use the instruction movabs, which moves a 64 bit constant into a register. It’s worth noting at this point our EDR will not work on 32-bit machines, according to the x86 assembly manual, this instruction is only valid under -xarch=amd64.

// write movabs rax, _ (thanks to https://defuse.ca/online-x86-assembler.htm#disassembly)
let mob_abs_rax: [u8; 2] = [0x48, 0xB8];
let _ = unsafe {
    WriteProcessMemory(
        proc_hand, 
        addresses.ntdll.zw_open_process as *const _,
        mob_abs_rax.as_ptr() as *const _, 
        mob_abs_rax.len(), 
        None,
    )
};

//
// convert the address of the function to little endian (8) bytes and write them at the correct offset
//
let mut addr_bytes = [0u8; 8]; // 8 for ptr, 2 for call
let addr64 = addresses.edr.open_process as u64; // ensure we are 8-byte aligned
for (i, b) in addr_bytes.iter_mut().enumerate() {
    *b = ((addr64 >> (i * 8)) & 0xFF) as u8;
}

// write it
let _ = unsafe {
    WriteProcessMemory(
        proc_hand, 
        (addresses.ntdll.zw_open_process + 2) as *const _,
        addr_bytes.as_ptr() as *const _, 
        addr_bytes.len(), 
        None,
    )
};

With that done, all that is left is to write the jmp instruction:

let jmp_bytes: &[u8] = &[0xFF, 0xE0];
let _ = unsafe {
    WriteProcessMemory(
        proc_hand,
        (addresses.ntdll.zw_open_process + 10) as *const _,
        jmp_bytes.as_ptr() as *const _, 
        jmp_bytes.len(), 
        None,
    )
};

And, bon appetit, we have successfully hooked a syscall!

If you didn’t watch the above linked YouTube demo of this in action, please go watch it above or click here.

So, our ZwOpenProcess hooked syscall stub now looks like:

Hooked syscall malware evasion

Interacting with the environment

As before - I have done a POC video for this section:

Okay, for the next chapter, we need to start making sense of what we hooked.

First things first, we need a strategy to ensure that whatever we do up until the syscall, we aren’t going to clobber any data or registers. Ideally we also don’t want to be messing around too much with our own assembly instructions as we may end up working against the compiler. Thankfully, Rust’s asm!() macro can help us out here.

Let’s start by organising our syscall. Instead of moving values into registers by hand (i.e. mov rcx, {value}) we can use in to ensure that the compiler will move a value into a given register at the point our assembly runs. Using this, over directly moving memory ourselves, makes sure that the compiler will essentially reserve our data to make sure it is properly placed into the registers for us. We use th is principal for the function parameters, and also for the SSN tht goes into rax.

So, lets add this to our function:

/// Injected DLL routine for examining the arguments passed to ZwOpenProcess and NtOpenProcess from 
/// any process this DLL is injected into.
#[unsafe(no_mangle)]
unsafe extern "system" fn open_process(
    process_handle: HANDLE,
    desired_access: u32,
    object_attrs: *mut c_void,
    client_id: *mut CLIENT_ID,
) {
    let ssn = 0x26; // give the compiler awareness of rax

    unsafe {
        asm!(
            "mov r10, rcx",
            "syscall",
            in("rax") ssn,
            // Use the asm macro to load our registers so that the Rust compiler has awareness of the
            // use of the registers. Loading these by hands caused some instability
            in("rcx") process_handle.0,
            in("rdx") desired_access,
            in("r8") object_attrs,
            in("r9") client_id,

            options(nostack, preserves_flags)
        );
    }
}

Great, with that done and our parameters safe (i.e. we can guarantee now we wont mess up our arguments), we can finally interact with the environment. So, what we will do as a proof of concept at this stage is simply open a message box with a message showing the handle found in the input param process_handle, and also a handle found in the CLIENT_ID structure.

All together, this looks like:

/// Injected DLL routine for examining the arguments passed to ZwOpenProcess and NtOpenProcess from 
/// any process this DLL is injected into.
#[unsafe(no_mangle)]
unsafe extern "system" fn open_process(
    process_handle: HANDLE,
    desired_access: u32,
    object_attrs: *mut c_void,
    client_id: *mut CLIENT_ID,
) {
    if !client_id.is_null() {
        let unique_proc = unsafe {(*client_id).UniqueProcess};
        let x = format!("UniqueProcess: {:?}, proc hand: {:?}\0", unique_proc, process_handle);
        unsafe { MessageBoxA(None, PCSTR::from_raw(x.as_ptr()), PCSTR::from_raw(x.as_ptr()), MB_OK) };
    }
    
    let ssn = 0x26; // give the compiler awareness of rax

    unsafe {
        asm!(
            "mov r10, rcx",
            "syscall",
            in("rax") ssn,
            // Use the asm macro to load our registers so that the Rust compiler has awareness of the
            // use of the registers. Loading these by hands caused some instability
            in("rcx") process_handle.0,
            in("rdx") desired_access,
            in("r8") object_attrs,
            in("r9") client_id,

            options(nostack, preserves_flags)
        );
    }
}

Slight correction on CLIENT_ID

Adding this as it’s own section seeing as though it is baked into the video now - if the CLIENT_ID.UniqueProcess field is cast as a u32, this does actually give you the PID:

Pid from CLIENT_ID

Wrapping up

I think that is probably enough for this post as it was quite heavy going. Make sure you watch the included videos as I think seeing this visually adds a lot of good context.