Implementing syscall hooks in Rust

Hooking syscalls


Intro

This blog post is a work in progress - and I’ll continually update it as I make progress with the functionality of syscall hooking, so check back if it isn’t yet complete! You can keep tabs on my progress by starring / watching my repo.

Injecting our DLL on process creation

The first step of this process is to inject our EDR’s DLL into all new processes. Well, that would be nice but whilst testing I don’t want to cause mass system instability so instead we will just inject into notepad. This will then be expanded to inject into our test / dummy malware so that we can test injection into notepad.

As we have our kernel driver component, we notify the Usermode Engine of a new process being created through IOCTL requests made every 60 ms. I use my wdk-mutex create (which you can check out here) to handle asynchronous access to the underlying structures. To be honest; this process and how I achieved new process callbacks does demand its own post.

Until then, I’ll provide a short summary of what I did.

In the driver component, we register a callback routine with the kernel, so anytime a new process is created our callback function is executed. Microsoft have made a nice handy function that we can call to make this registration, it’s actually really easy to do.

In my driver, that looks like:

let res = PsSetCreateProcessNotifyRoutineEx(Some(core_callback_notify_ps), FALSE as u8);

As you may expect, core_callback_notify_ps is a function which essentially gathers information from the PS_CREATE_NOTIFY_INFO struct, and adds it to an internal cache protected by a mutex from wdk-mutex.

Every 60 ms an IOCTL call is made from the usermode engine down to the driver; in response the driver will send information about newly created processes. I use a two-call style IOCTL where the first call gets the size of the required buffer for the response, and the second call fills the buffer with the data.

Once the usermode engine receives the data, it can then start processing it. A check is done to determine whether the driver has told us a new process has been created, and if it has, we go ahead and call the onboard_new_process function.

let process_creations = driver_messages.process_creations;
if !process_creations.is_empty() {
   for p in process_creations {
      if self.process_monitor.write().await.onboard_new_process(&p).await.is_err() {
            logger.log(LogLevel::Error, &format!("Failed to add new process to live processes. Process: {:?}", p));
      }
   }
}

Within the onboard_new_process function, we inject our EDR’s DLL into the newly created process using no fancy tricks, other than from my write up in this post.

Resolving syscall callback function addresses

So; our DLL has now been injected into the target process. From the DLL’s perspective we need to create a new thread out of DLL Main to set itself up. This initialisation routine should resolve the virtual addresses of the callback functions to be executed when a syscall we wish to intercept is called.

We will use those addresses to then make an unconditional jump via assembly to the address within our DLL, where we can then start assessing parameters and the environment.

We will save the address as a usize which should help 3264 bit compatibility (although our use of the rax register below is not 32-bit friendly!!).

This initialisation looks like the following:

/// A structure to hold the stub addresses for each callback function we wish to have for syscalls.
/// 
/// The address of each function within the DLL will be used to overwrite memory in the syscall, allowing us to jmp
/// to the address.
pub struct StubAddresses {
    open_process: usize,
}

impl StubAddresses {
    /// Retrieve the virtual addresses of all callback functions for the DLL
    fn new() -> Self {

        // Get a handle to ourself
        let h_kernel32 = unsafe { GetModuleHandleA(s!("sanctum.dll")) };
        let h_kernel32 = match h_kernel32 {
            Ok(h) => h,
            Err(_) => todo!(),
        };

        //
        // Get function pointers to our callback symbols
        //

        // Get a function pointer to LoadLibraryA from Kernel32.dll
        let open_process_fn_addr = unsafe { GetProcAddress(h_kernel32, s!("open_process")) };
        let open_process_fn_addr = match open_process_fn_addr {
            None => {
                unsafe { MessageBoxA(None, s!("Could not get fn addr"), s!("Could not get fn addr"), MB_OK) };
                todo!();
            },
            Some(address) => address as *const (),
        } as usize;

        Self {
            open_process: open_process_fn_addr,
        }
    }
}

/// Injected DLL routine for examining the arguments passed to ZwOpenProcess and NtOpenProcess from 
/// any process this DLL is injected into.
#[unsafe(no_mangle)]
unsafe extern "system" fn open_process(
    process_handle: HANDLE,
    desired_access: u32,
    // We do not care for now about the OA
    _: *mut c_void,
    // We do not  care for now about the client id
    _: *mut c_void,
) {
    // start off by causing a break in the injected process indicating we successfully called our function!
    unsafe {asm!("int3")};
}

The variable open_process_fn_addr will hold the resolved function address of our open_process symbol; and in theory we can then load this address into a register and call jmp, moving execution to our function.

To check that we did in fact resolve the address correctly, we can print the address out via a MessageBoxA (debugging this DLL is going to be nasty in the future and I can feel it!):

Function pointer

As you can see, the method worked!

Redirecting execution to the callback

Now we have our callback created, we need to alter the flow execution so that we can move the instruction pointer to our callback function.

First, to test this, lets try a manual assembly jmp from somewhere we easily control to the function’s virtual address.

What we will do, is after calling StubAddresses::new() (which is where we resolve the function pointer to the function open_process), we will use some inline assembly to jump to the memory address. Note there is a commented int3 which will be used shortly to verify our assembly.

/// Initialise the DLL by resolving function pointers to our syscall hook callbacks.
unsafe extern "system" fn initialise_injected_dll(_: *mut c_void) -> u32 {

    let stub_addresses = StubAddresses::new();
    let x = format!("Addr: {}\0", stub_addresses.open_process);

    unsafe {
        MessageBoxA(None, PCSTR::from_raw(x.as_ptr()), PCSTR::from_raw(x.as_ptr()), MB_OK);
    }

    // test jump to open_process
    unsafe {
        asm!(
            // "int3",
            // push whatever is in eax onto the stack
            "push rax",
            // move our VA into eax
            "mov rax, {x}",
            // unconditional jump
            "jmp rax",
            // bring eax back from the stack
            "pop rax",
            x = in(reg) stub_addresses.open_process
        );
    }

    STATUS_SUCCESS.0 as _
}

As a reminder, our open_process function is as follows (4 args in, and an int3 instruction to break execution):

#[unsafe(no_mangle)]
unsafe extern "system" fn open_process(
    process_handle: HANDLE,
    desired_access: u32,
    // We do not care for now about the OA
    _: *mut c_void,
    // We do not  care for now about the client id
    _: *mut c_void,
) {
    // start off by causing a break in the injected process indicating we successfully called our function!
    unsafe {asm!("int3")};
}

Let’s run this, attach a debugger to Notepad.exe and see what happens:

open_process stub

Lo! As you can see we did successfully jump to the address of open_process in our DLL! Awesome!!

Now, to uncomment the int3 instruction in the initialise_injected_dll function to see what that looks like in terms of the generated assembly:

Comparing assembly to Rust source code

You can quite clearly see the transposition of the inline assembly to the binary. In fact, this has shown some redundant instructions as the address of open_process is already in in the rax register making the mov rax, rax obsolete. Personally, I wouldn’t have faith in the compiler making its own optimisations around things like this, so I would rather be explicit to ensure it is included in the binary. Perhaps that is a minor optimisation for the future!

Finally to take the safety wheels off - let’s remove the int3 from both the callback and the inline asm in initialise_injected_dll and make sure we don’t crash notepad…

The function now looks as follows, if we get a popup saying “Bazinga” and nothing crashes, then in the words of Shania Twain, looks like we made it.

/// Initialise the DLL by resolving function pointers to our syscall hook callbacks.
unsafe extern "system" fn initialise_injected_dll(_: *mut c_void) -> u32 {

    let stub_addresses = StubAddresses::new();

    // test jump to open_process
    unsafe {
        asm!(
            // push whatever is in eax onto the stack
            "push rax",
            // move our VA into eax
            "mov rax, {x}",
            // unconditional jump
            "jmp rax",
            // bring eax back from the stack
            "pop rax",
            x = in(reg) stub_addresses.open_process
        );
    }

    unsafe {
        MessageBoxA(None, PCSTR::from_raw("Bazinga!\0".as_ptr()), PCSTR::from_raw("Bazinga!\0".as_ptr()), MB_OK);
    }

    STATUS_SUCCESS.0 as _
}

That didn’t work for two reasons. First, we need to replace the jmp instruction with a call so the return from our callback returns to the correct location (call will push the return address to the stack). Using call we don’t have to worry about that ourselves.

Secondly, the push rax instruction is potentially misaligning the stack - so we will remove the push and pop instructions, and worry about saving state properly later.

And now:

Assembly correct

Why are we using assembly

At this point you may be wondering why we are using assembly to call the function pointer instead of just calling it through Rust. In the next section where we will look to hook the syscall, we will have to directly overwrite the machine code in Ntdll.dll with our own machine code which performs the call to our hook. At this low level, we can no longer deal with a ‘mid-level’ language such as Rust, C & C++.

By making a simple proof of concept in a function we can control with inline assembly, we can then take the bytes (machine code) this outputs and use that as our template for patching the syscall.

Hooking the syscall

This part of the post is coming soon.. :-) I will get around to writing this section up; but for now - here is a video demo of it working!