DLL Injection EDR Evasion 1: Hiding an elephant in the closet

Sometimes the best way to get into a process is to crash the party!

Intro

Project code: https://github.com/0xflux/GoSneak

Legal disclaimer applies, by reading on you acknowledge that, see the legal disclaimer here. In short, you must not use the below information for any criminal or unethical purposes, and it should only be used by security professionals, or for those interested in cyber security to deepen your knowledge.

Note: When we talk about EDR evasion with techniques such as ETW bypasses, APC Queue Injection, and Process Injection - these can STILL be detected by more sophisticated EDR's. By modern standards, these techniques are outdated, but are still worth learning as it teaches us techniques which can still work.

If you are interested in learning about how modern EDRs can detect this type of behaviour, I have a blog series where I am building an EDR from scratch, and you can check the specifics about detecting these bypass techniques here.

I started this section by intending to ‘open source’ the DLL injector I have written for my red team framework, an injector written in C++, wrapped with Go, with a Go based DLL. I was going to talk about how CGO works, how you can do low level things in C++ and return out to your Go function, but instead this turned into a series of deeper learning of the windows internals and in turn, some upgrades to my injector.

Whilst there is still a wrapper for this loader in Go contained in the project, for now I am just working on the C++ implementation (mostly from scratch), based on some recent research I have done from other blogs and security content creators. At the end of the post, I’ll introduce the Go wrapper for the DLL injector, and show how it can be used as part of a Go binary.

With this basic iteration of the injector, we will be dealing with an unencrypted payload written to disk. This of course is far from the ideal way of executing some stealthy ops, but that may (will) come later :)

This is not a tutorial on DLL injection. If you are looking to learn the basics of such things, there are plenty of resources for that.

As per my ethical disclaimers; this is not a tutorial to be a 1337 haxx0r. It is intended to document my learning and growth, and to showcase some theory to security professionals, or those interested in cyber security. Do not use this information, code or technique for unethical / illegal purposes. I do not condone in any way, shape, or form malicious or illegal use of computing.

EDR Hooking

EDR Hooking refers to the methods used by Endpoint Detection and Response (EDR) systems to monitor the behavior of software on a computer, particularly for identifying and mitigating potential threats. These systems are designed to detect malicious activities by observing interactions between software processes and the operating system.

There are different ways in which EDR’s will perform hooking, a few of the more common:

Inline Hooking:

The EDR modifies the actual binary code of a function in memory. It typically replaces the first few bytes of the function with a jump to its own monitoring code. When the hooked function is called, execution is diverted to the EDR’s code first, allowing it to monitor or modify the behaviour of the function. Here is a great resource to read more about detecting inline hooking: https://www.ired.team/offensive-security/defense-evasion/detecting-hooked-syscall-functions.

Import Address Table (IAT) Hooking:

IAT hooking involves modifying a program’s import table, which lists the API functions used by the program. This means when the program runs, instead of calling the actual API function, it calls the EDR’s monitoring function.

Unrelated to the topic here of the loader, but taken from ired.team:

This lab shows how it's still possible to dump the process memory (lsass) and bypass Cylance (or any other Antivirus/Endpoint Detection & Response solution) that uses userland API hooking to determine if a program is malicious during its execution.

I haven't yet had chance to try this, but this looks really cool, and I want to implement this into my red team framework as a capability.

I haven’t yet had chance to try this, but this looks really cool, and I want to implement this into my red team framework as a capability.

Back to loaders……

Going down the syscall rabit hole

Now we know what EDR’s are looking for, lets come up with a plan. Well, actually, we dont need to come up with a plan as the security community already has this well documented. Since we know that EDR can hook API calls, researchers such as https://alice.climent-pommeret.red/posts/a-syscall-journey-in-the-windows-kernel/ (and many, many more) have documented the journey of API calls and how we can basically call syscalls and get data into the kernel, in such a way that bypasses EDR hooking.

I’ll provide an example of what this looks like, below are a series of screenshots from x64dbg, in which I find the main function of the program, then look for a VirtualAllocEx() call to the Windows API (which eventually resplves a EDR hooked function). You can see, by following the call operators, we move through kernel32.dll, into kernelbase.dll which gives us the call to NtAllocateVirtualMemory. Inspecting NtAllocateVirtualMemory (part of the undocumented Windows NTAPI), we see in the assembly 18h being moved into eax then making a syscall. 18h on my architecture and Windows version (as explained below) is the syscall number for the kernel performing the actions of NtAllocateVirtualMemory, showing that transition at the low level from user-land to kernel-land.

Here’s a visual representation of this:

Syscalls in Windows

Here’s examining the chain in a debugger:

Diving into the syscalls

Having followed the rabbit hole this deep, we are now left with the assembly making the syscall. On my particular version of windows (11, I know, it makes me as sick as you are reading that) - you can see the syscall number is 0x18. To learn about syscalls, go check out this excellent blog post: https://alice.climent-pommeret.red/posts/a-syscall-journey-in-the-windows-kernel/

Diving into the syscalls

We can then put this into our assembly file like so, remember to publicly export the procedure:

public NtAllocateVirtualMemory
NtAllocateVirtualMemory PROC
  mov r10, rcx
  mov eax, 18h
  syscall
  ret
NtAllocateVirtualMemory ENDP

Remember, assembly requires PROC and ENDP to section off that procedures code.

Then, we can repeat this process over and over until all hooked syscalls are resolved to the syscall number.

Pro tip, pressing ctrl+g in x64dbg will allow for quick searching of where the symbols can be found in the disassembly (so you dont have to keep manually clicking down all the way like in the above process).

This is only half of the story, the next step is defining the undocumented NTAPI functions, and providing an abstraction of these. The best resource I have found for any documentation around the lower level APIs of Windows are from the wininsiders GitHub https://github.com/winsiderss/phnt. For example, searching that for NtVirtualAllocateMemory.

Remember, we know to look for NtVirtualAllocateMemory because of how we followed the 'proxy chain' in the disassembly step.

In combining (or linking) assembly with the C++ project, we create a new header file packed with function prototypes, tagging them with the extern “C” keyword.

This bit is crucial to make sure the C++ linker susses out and properly links these functions.

Here’s the thing with C++: function names get a bit of a twist, known as ‘name mangling’, rendering each name unique for the linker, especially key for stuff like function overloading. By flagging these functions with extern “C”, we’re telling the C++ compiler to stick to C-style linkage, thus dodging any name mangling bulls#*%. This guarantees that the function names we’ve defined in our assembly are the image of those the C++ linker is looking for, enabling a smooth integration. Take for example, we declare the prototype for NtAllocateVirtualMemory – an undocumented NTAPI function we found through the wininsiders GitHub – in our header file to ensure it’s in step with its assembly equivalent.

extern "C" {
    NTSTATUS NtAllocateVirtualMemory(
        _In_        HANDLE              ProcessHandle,
        _Inout_ _At_ (*BaseAddress, _Readable_bytes_(*RegionSize) _Writable_bytes_(*RegionSize) _Post_readable_byte_size_(*RegionSize)) PVOID *BaseAddress,
        _In_        ULONG_PTR           ZeroBits,
        _Inout_     PSIZE_T             RegionSize,
        _In_        ULONG               AllocationType,
        _In_        ULONG               Protect
    );
}

Assembly does not support name mangling, hence why extern “C” is important not to overlook.

Let’s quickly talk about why we’re defining these function prototypes.

Normally in C, we declare function prototypes to inform the compiler about the function’s signature - what arguments it takes and what it returns. This is crucial because it tells the compiler how to set up the call to the function, including how to arrange data in memory or in registers. The same logic applies here, but with a twist. We’re defining our functions in assembly, not C. This means that while the C++ part of our code knows what to expect thanks to the prototypes, the actual work – the nitty-gritty of the operations – is done in the assembly code. This becomes particularly important when dealing with system calls. System calls are like special requests to the kernel, and they expect data to be presented in a very specific way. By defining a function prototype in our C++ code, we ensure that when we call this function, the compiler arranges all the necessary data (arguments) in the way that our assembly code – and in turn, the kernel – expects. These function prototypes serve as a bridge between the high-level structure of C++ and the low-level operations of assembly and system calls.

Lets take a breath

Before we move on, lets recap and take a look at whats going on under the hood. What we know so far:

Defined the structures the kernel requires in order to actually complete the syscall (and for placing data on the stack as expected by the kernel)
Replaced the Windows API and proxying by replacing uses of the Windows API for those EDR hooked functions with the assembly which is responsible for making the syscalls into the Windows Kernel.
This circumvents EDR hooks

Lets check what we think is going on, is actually going on. On my version of Windows as we have seen, 0x18 represents the syscall for NtAllocateVirtualMemory (i.e. in Windows API speak: VirtualAllocEx). Looking at the new disassembly (which is easy to find thanks to the call to GetProcAddress) we can see a call to inj.xxxxxxxxxx, hovering over this you can see the popout what this relates to; our assembly!

The linker has correctly assembled our file, incorporating the assembly instructions!

Diving into the syscalls

Just for fun, lets replace syscall with the instruction int 2E (for more info on 2E check this link) and rebuild:

Diving into the syscalls

Unexpectedly, this doesn’t function as a syscall as I observe no complete DLL injection (and no errors obviously returned). Given the description from codemachine above:

"int 2e" is the legacy way of performing user to kernel mode transitions and is supported by all x86 CPUs existing today. The call to "int 2e" results in the interrupt service routine registered in the interrupt descriptor table (IDT) for vector 0x2e (i.e. nt!KiSystemService) being invoked.

I did expect this to execute into a syscall on x64, but lo, no! I would be interested to debug this on x86 and see it working there.. but that is not a challenge for today.

Automating this

Whilst this is nice, we are depending on knowing the target architecture, build number etc, which isn’t exactly favourable. So, thanks to information in here https://redops.at/en/blog/direct-syscalls-vs-indirect-syscalls (and thanks to cr0w for providing a base function), we can automate the process of finding the SSNs (syscall numbers).

Here is my implementation of automating this:

DWORD getSSN(IN HMODULE dllModule, IN LPCSTR NtFunction) {

    FARPROC NtFunctionAddress = GetProcAddress(dllModule, NtFunction);

    if (NtFunctionAddress == NULL) {
        char logBuffer[256];
        sprintf(logBuffer, "Failed to get the address of %s", NtFunction);
        printError(logBuffer);
        return 0;
    }

    /**
     * 
     * 
     *  public NtOpenProcess
            NtOpenProcess PROC
                mov r10, rcx                ; 3 bytes
                mov eax, wNtOpenProcess     ; mov (1 byte) + 28h (4 bytes) = 5 bytes
                syscall
                ret
            NtOpenProcess ENDP

     * With the below, take the byte pointer of the NT Function, then add 4 bytes to the memory location we are pointing to.
     * Here we will find the SSN (see above math).
     * Cast this location as a pointer to a double word (i.e. 4 bytes)
     * Dereference that pointer, to get the underlying value from where we were pointing.
     * 
    */
    DWORD NtFunctionSSN = *((PDWORD)((PBYTE)NtFunctionAddress + 4));

    return NtFunctionSSN;
}

I have tried to provide an explanation for what is going on with the line:

DWORD NtFunctionSSN = *((PDWORD)((PBYTE)NtFunctionAddress + 4));

So, in case it doesn’t make sense, the assembly instructions for mov r10, rcx are 3 bytes long, the fourth byte is mov eax, and the remaining 4 bytes (DWORD) are for the actual SSN.

By this point. using the dumpbin utility:

dumpbin /imports .\inj.exe

I can see that calls to things such as OpenProcess don’t appear (good) but clearly there is potential room for improvement (if any of these functions cause EDR to get suspicious):

Looking at dumpbin.

Next steps

If you liked this post, you may also like the following posts: