Reflective DLL injection and bootstrapping in C

Achieving fileless malware with reflective DLL injection.

Intro

If you want to get straight to it, the full project can be found on GitHub, we are only looking at the key snippets of theory in this blog post.

Reflective DLL Injection is a technique used by not only threat actors, but also red teams and penetration testers (AKA Cobalt Strikes famous ‘Beacon’). It offers a covert method of executing code within the address space of a target process, but unlike conventional methods, it cleverly bypasses standard and heavily monitored Windows API calls associated with DLL loading. Furthermore with reflective DLL injection, we are able to achieve ‘fileless malware’ where the actual payload doesn’t need to sit on the disk. For a threat actor, this has several advantages from increased protection against static scanning, and making it harder for a blue team to find the final payload on disk, and thus able to deduce through reverse engineering what it is doing.

Legal disclaimer applies, by reading on you acknowledge that, see the legal disclaimer here. In short, you must not use the below information for any criminal or unethical purposes, and it should only be used by security professionals, or for those interested in cyber security to deepen your knowledge.

To see this project in action, it is featured in my first YouTube video where I explore some up to date Open Source Intelligence around new attack techniques!

Traditional DLL Loading vs. Reflective Injection

Traditional DLL (Dynamic Link Library) loading in Windows operating systems is a straightforward process that typically involves loading a DLL file from a disk location into a process’s address space. This method relies on built-in Windows API functions and is commonly used for extending the functionality of a process at runtime. The standard approach uses functions like LoadLibrary to load the DLL from disk and GetProcAddress to find the address of exported functions within the loaded DLL. Doing all this in another process (with no obfuscation, see my post here on evasion) is a straight up red flag for any antivirus / EDR.

Reflective DLL Injection is a more sophisticated technique that bypasses the standard Windows loader altogether. Instead of loading a DLL from a disk location, it involves directly mapping the DLL into memory from a binary blob. This doesn’t use typical Windows APIs like LoadLibrary or GetProcAddress in the same manner, making it slightly harder to detect.

High-Level Overview of the Bootstrapper

The bootstrapper in Reflective DLL Injection is responsible for:

Embedding the DLL: The DLL is embedded as a raw binary blob within the executable, not as a standard resource. Alternately, the bootstrapper may request the DLL from a URL, or somewhere on disk. At this stage, the bootstrapper may decrypt the blob into DLL shellcode.
Manual Mapping: The bootstrapper manually allocates memory the side of the DL, then maps it into the target process’s memory. This involves copying the DLL’s sections into allocated memory in the target process.
Relocation and Imports Resolution: It then performs necessary relocations and resolves imports without using the standard Windows APIs. Instead, it manually parses the PE structure to adjust addresses and link imported functions.
Execution: Finally, it invokes execution within the target process, often through CreateRemoteThread, starting at a specific entry point in the injected DLL.

The bootstrapper serves as a custom loader, preparing and executing the DLL within another process’s memory space, all while maintaining a low profile to evade detection.

To show the finished product to hopefully help make some of this make sense, here is a screenshot of my reflective injector, you can see magic bytes (MZ) in the image at the start of the .data section of the PE (see below explanation for what this section is for). This shows that the DLL is ‘baked’ into the binary and can be accessed through an ordinary pointer.

DLL reflection loaded into a PE data section

If you wanted to build this as a “Stage-0” loader, you may omit storing the DLL into the binary like this and opt instead to download it straight into a buffer from a command and control server, or download the encrypted DLL to disk, then load that in as a “Stage-1”.

The DLL embedded in the PE file is 125 kb in size, and the .data section is approximately 128 kb, which makes sense.

To compare the memory ratios of the embedded injector PE and the DLL we are injecting, check out the images in the section below.

Reflective DLL Injection

The project, which can be found on my GitHub, is a simple C DLL Reflective Injector which can serve as a base for a more complex injector which deals with additional obfuscation, encryption, packing, downloading from a URL, additional staging, and different methods of storing the DLL within the executable.

The project is structured to execute a series of operations: locating a target process, loading a DLL directly from a byte array, adjusting the PE file in the target process’s memory, and executing the DLL.

Reflective loader structure

Before looking at the code, one thing worth explaining is the layout of the Injector PE (Portable Executable) which comes pre-packaged with our implant DLL.

Headers: At the beginning of the PE file, the headers (DOS, NT, and Section Headers) describe the file’s layout and characteristics. The DOS header leads to the more relevant NT headers, which contains information like the entry point, image base, and size of the image.

Here is a good image from Wikipedia of the header format of a PE (exe):

Stealer process

Sections: Following the headers are the sections, which include .text, .data, .rdata, .bss, and others. Each section serves a specific purpose:

.text: Contains the executable code.
.data: Stores initialised global and static data.
.rdata: Holds read-only data, like constants and import/export directories.
.bss: Contains uninitialised data.

Data Directories: Part of the NT headers, data directories provide information for essential operations, such as imports, exports, and resource management.

In our reflective DLL injection, the DLL is not a separate file but a byte array (a blob) embedded within the PE loader. This embedding raises the question: In which section does this blob reside?

Typical Placement: In most cases, such a blob is placed in sections like .data or .rdata. The choice between these sections depends on the desired characteristics: .data for writable storage or .rdata for read-only.
Custom Sections: Custom sections can be created to store the blob. This approach allows for more control over the blob’s attributes and visibility within the PE structure.
Accessing the Blob: Regardless of its location, the blob is accessed via pointers and treated like any other data in the program. The loader reads this blob, performs necessary adjustments (like realigning the PE), and injects it into the target process.

If you would like to read more, a good example of different approaches to building custom sections is outlined in a Cobalt Strike blog post.

Nuance with API calls

In the reflective loader, LoadLibrary and GetProcAddress are used, but in a unique context:

Not for Direct Disk Loading: Instead of loading a DLL from disk, we are embedding the DLL as a binary blob within the loader. This embedded DLL is not loaded using LoadLibrary; rather, it’s manually mapped into memory.
Function Pointers in the Target Process: The references to LoadLibraryA and GetProcAddress in our code (dll.load_library_a_addr and dll.get_process_addr) are function pointers. These pointers are used not to load the DLL (as it’s already embedded) but to resolve addresses of functions inside the target process during manual mapping and realignment.

The meat of memory copying

In the reflective process, we manually copy the implant PE’s headers and sections into the memory we have created, in simple terms these are the key sections responsible for copying the sections into memory:

// Allocating memory in the target process with the size equal to the DLL's image size.
// This space is reserved for the entire content of the DLL including headers, sections, and other data.
target_base_addr = VirtualAllocEx(target_process_handle, NULL, nt->OptionalHeader.SizeOfImage, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE)

// Writing the PE headers of the DLL into the allocated memory in the target process.
// This includes the DOS header, NT headers, and optional headers which define the structure and execution information of the PE file.
WriteProcessMemory(target_process_handle, target_base_addr, local_dll_base, nt->OptionalHeader.SizeOfHeaders, NULL);

Next we iterate through the sections of the DLL and copy those to our new memory block, as evidenced here in some debugging output:

DLL sections being copied to memory

// Iterating through the sections of the PE (like .text, .data, .rdata) and writing them to the allocated memory.
// Each section is copied to its respective virtual address offset within the allocated space.
for (int i = 0; i < nt->FileHeader.NumberOfSections; i++) {
    WriteProcessMemory(target_process_handle, target_base_addr + section->VirtualAddress, local_dll_base + section->PointerToRawData, section->SizeOfRawData, NULL);
    section++;
}

Next, we perform some important calculations to determine the size of the bootstrapper code. This is crucial because we need to write the bootstrapper to a newly allocated segment of memory within the target process. The bootstrapper code is contained in bootstrapper.c. To calculate its size accurately, we demarcate the start and end of the bootstrapper segment using two marker functions. Any additional functions or code we wish to include in the bootstrapper should be placed between these markers.

The bootstrap_code_size calculation measures the size of the bootstrapper segment by subtracting the address of the start_of_injectable_code from the address of end_of_injectable_code. This size is used when allocating memory in the target process for the bootstrapper.

The dll_info_struct is then set up with important addresses. The base member is assigned the base address of the memory allocated for the DLL in the target process. This address is used by the realign_pe function during the bootstrapping process to locate and modify the DLL. Additionally, the structure is also set with pointers to the GetProcAddress and LoadLibraryA functions, which are used during the bootstrapping process for resolving function addresses and loading libraries.

bootstrap_code_size = (DWORD)((ULONGLONG)end_of_injectable_code - (ULONGLONG)start_of_injectable_code);
dll_info_struct.base = target_base_addr;    // the base address of the DLL inside the target process. When realign_pe is executed, 
                                            // it uses this information from dll_info_struct to locate and interact with the DLL.
dll_info_struct.get_process_addr = GetProcAddress;
dll_info_struct.load_library_a_addr = LoadLibraryA;

// Allocating memory in the target process for the bootstrapping code.
// This memory will host the custom code responsible for properly loading and aligning the DLL in the process's memory.
bootstrap_memory_base = VirtualAllocEx(target_process_handle, NULL, bootstrap_code_size + sizeof(dll_info_struct), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);```

// Writing the DLL_INFO structure to the beginning of the allocated bootstrapping memory.
// This structure contains crucial information like base addresses and function pointers needed by the bootstrapping code.
WriteProcessMemory(target_process_handle, bootstrap_memory_base, &dll_info_struct, sizeof(dll_info_struct), NULL);

// Writing the bootstrapping code (realign_pe function) immediately after the DLL_INFO structure in the allocated memory.
// This code will perform tasks like base relocation and import address table resolution.
WriteProcessMemory(target_process_handle, bootstrap_memory_base + sizeof(dll_info_struct), realign_pe, bootstrap_code_size, NULL);

To explain the final two WriteProcessMemory calls, the first is writing dll_info_struct into the beginning of the memory allocated for the bootstrapper, as explained above.

In the second call, we are writing the realign_pe function into the target process, placed right after the above custom DLL structure we wrote. The realign_pe function acts as a custom bootloader function which will be executed in the context of the target process, adjusting memory addresses and resolving dependencies needed by the implant DLL.

By writing these into the target process’s memory, the injector sets the stage for the DLL to be loaded and executed as if it were a natural part of the target process.

Comparing the embedded PE with the DLL

You saw the memory layout of the embedded PE earlier in this section, and we have seen (above) that the debug output shows 92672 bytes of data being copied into the remote process from the DLL we are injecting. Opening up the DLL we are injecting in PEBear, you can see the difference in memory layout on the right hand side, with the majority of space being taken by the .text section (blue).

PE Bear shows us the .text section of the PE is 16A00 bytes long, converting this number to decimal, this gives us 92672. We can be confident that the process is working as expected! Awesome!

DLL that's loaded into the reflective loader.

Taking it further

As stated earlier, this is only a basic version of the reflective injector, to take it further you could add additional obfuscation, encryption, packing, downloading from a URL, additional staging, and different methods of storing the DLL within the executable.

My next step here is to refactor this into a rust based reflective injector, I’d be quite curious to see what the detection difference is from this C example vs Rust out of the box.

As an extra bit of fun, the project can be built either as a standalone exe to be run, or as a standalone DLL. To run the dll, you would want to use something like rundll32.exe bad.dll,runMain; however if you wanted to make this more advanced you could use a DLL Side-Loading or DLL Search Order Hijacking to be extra sneaky! All you would need to do is make a few minor modifications..

I also plan on showcasing some sophisticated build pipelines into what I am calling “The Factory”, which should be a one-stop-shop for all things related to CI/CD (continuous integration, continuous deployment) for red team operations. This would give the ability to craft various payload methodologies on the fly, and have the c2 automatically rotate encryption, endpoints, servers etc with minimal operator interaction. To see my basic post on CI/CD for red teams, check my blog post here.