In Part 2, our driver gained the ability to capture process creation and exit events, storing them in a spinlock-protected queue, while our agent pulls a packed stream over the IOCTL channel.

By the end of this post, the driver will also report thread starts/exits and image loads.

Same caveat as before: this is a test-signed kernel driver. A single bug can bluescreen the system. Run it on a VM or a machine you’re willing to break, with bcdedit /set testsigning on and a reboot.
Source for this post is on GitHub, tagged blog-03. Diff it against Part 2 to see exactly what moved: blog-02...blog-03.

New contracts

In Part 2, we built the event format so that adding a source means adding a type and a payload struct. Let’s add the event types and payload formats for thread and image notifications.

typedef enum _AEGIS_EVENT_TYPE {
    AegisEvtProcessCreate = 1,
    AegisEvtProcessExit   = 2,
    AegisEvtThreadCreate  = 3,
    AegisEvtThreadExit    = 4,
    AegisEvtImageLoad     = 5,
    AegisEvtFileOp        = 6,   /* reserved - minifilter  */
    AegisEvtNetConn       = 7,   /* reserved - WFP callout */
} AEGIS_EVENT_TYPE;

_{common/AegisDriverProtocol.h L35–L43}

Next, we define the payload for thread events. A thread belongs to a target process and is started by a creator process (which are usually the same).

typedef struct _AEGIS_THREAD_EVENT {
    unsigned long  ProcessId;
    unsigned long  ThreadId;
    unsigned long  CreatingProcessId;
    unsigned char  Remote;            /* 1 if CreatingProcessId != ProcessId   */
    unsigned char  Reserved[3];
} AEGIS_THREAD_EVENT, *PAEGIS_THREAD_EVENT;

_{common/AegisDriverProtocol.h L70–L76}

Finally, we define the payload for image loads, tracking which image mapped into which process, where it was loaded in memory, and whether it loaded into the kernel:

typedef struct _AEGIS_IMAGE_EVENT {
    unsigned long    ProcessId;
    unsigned char    SystemModeImage; /* 1 if loaded into the kernel, not a process */
    unsigned char    ImagePathExact;  /* exact source name and not truncated   */
    unsigned short   ImagePathLength; /* WCHAR count in ImagePath, no NUL       */
    unsigned __int64 ImageBase;       /* where it mapped                        */
    unsigned __int64 ImageSize;
    wchar_t          ImagePath[AEGIS_MAX_PATH];
} AEGIS_IMAGE_EVENT, *PAEGIS_IMAGE_EVENT;

_{common/AegisDriverProtocol.h L80–L88}

That is the entire change to our communication contract. The rest of our work involves writing two new modules to feed these events into the queue.

Watching threads

Thread monitoring follows the same pattern as process monitoring: the kernel maintains a list of registered callbacks and invokes them whenever a thread is created or destroyed. We register our callback using PsSetCreateThreadNotifyRoutine:

NTSTATUS
ThreadMonStart(void)
{
    NTSTATUS status = PsSetCreateThreadNotifyRoutine(ThreadNotify);
    if (NT_SUCCESS(status)) {
        g_Registered = TRUE;
    } else {
        DbgPrint("[AegisMon] ThreadMon register failed 0x%08X\n", status);
    }
    return status;
}

_{driver/modules/ThreadMon.c L42–L52}

The callback itself is straightforward:

static void
ThreadNotify(_In_ HANDLE ProcessId, _In_ HANDLE ThreadId, _In_ BOOLEAN Create)
{
    AEGIS_THREAD_EVENT ev;

    RtlZeroMemory(&ev, sizeof(ev));
    ev.ProcessId = (ULONG)(ULONG_PTR)ProcessId;
    ev.ThreadId  = (ULONG)(ULONG_PTR)ThreadId;

    if (!Create) {
        /* Thread teardown: only the identity is meaningful. */
        AegisPublish(AegisEvtThreadExit, &ev, sizeof(ev));
        return;
    }

    /* On creation this routine runs in the context of the thread doing the
     * creating, so the current process is the creator. If that isn't the
     * process the new thread will run in, something reached across a process
     * boundary to start it - the CreateRemoteThread shape. It also happens
     * benignly: a parent starts its child's first thread the same way. So the
     * flag is a lead to correlate, not a verdict. */
    ev.CreatingProcessId = (ULONG)(ULONG_PTR)PsGetCurrentProcessId();
    ev.Remote = (ev.CreatingProcessId != ev.ProcessId) ? 1 : 0;

    AegisPublish(AegisEvtThreadCreate, &ev, sizeof(ev));
}

_{driver/modules/ThreadMon.c L15–L40}

The kernel passes three arguments to our callback: the target process ID (ProcessId), the new thread ID (ThreadId), and a boolean (Create) indicating whether the thread is being created or destroyed.

Who started this thread?

The create path has one extra line that earns its keep:

ev.CreatingProcessId = (ULONG)(ULONG_PTR)PsGetCurrentProcessId();

When the thread creation callback fires, it executes in the context of the thread initiating the creation-the creator. Consequently, PsGetCurrentProcessId returns the ID of the process performing the creation, rather than the target process where the new thread will run.

Normally, these are the same process, such as a program spawning a worker thread inside itself. When they differ, it indicates that one process has reached across boundaries to start a thread inside another process. This is the exact mechanism of CreateRemoteThread, a classic building block of code injection (mapping shellcode into a target process and creating a remote thread to execute it).

flowchart LR
    subgraph normal["Normal: in-process thread"]
        direction TB
        A1["process A"] -->|"creates thread in A"| A2["thread (runs in A)"]
    end

    subgraph remote["Remote: cross-process thread"]
        direction TB
        B1["process B<br/>(injector)"] -->|"creates thread in C"| C1["thread (runs in C)"]
    end

    %% Force horizontal placement
    normal ~~~ remote

We set a Remote flag in the event when the creating process and the target process differ. When the agent receives this, it prints [remote].

However, similar to the PPID-spoofing discussion in Part 2, cross-process thread creation is not always malicious. When a parent launches a child, the child’s very first thread is created by the parent, meaning the creator and owner differ legitimately. Therefore, you will see the [remote] flag on benign process launches. The true detection signal comes from correlation: a remote thread suddenly appearing in a long-running process, initiated by a process that has no logical reason to interact with it.

Watching images

Following the same pattern, PsSetLoadImageNotifyRoutine registers a callback that the kernel invokes every time an image is mapped-whether it is a user-mode DLL loaded into a process or a kernel driver loaded into the system. The callback is slightly more complex because the kernel provides more metadata:

static void
ImageNotify(_In_opt_ PUNICODE_STRING FullImageName, _In_ HANDLE ProcessId,
            _In_ PIMAGE_INFO ImageInfo)
{
    AEGIS_IMAGE_EVENT ev;

    RtlZeroMemory(&ev, sizeof(ev));
    /* ProcessId is zero when the image is a kernel-mode driver rather than a
     * user-mode module mapped into a process. */
    ev.ProcessId       = (ULONG)(ULONG_PTR)ProcessId;
    ev.SystemModeImage = ImageInfo->SystemModeImage ? 1 : 0;
    ev.ImageBase       = (unsigned __int64)(ULONG_PTR)ImageInfo->ImageBase;
    ev.ImageSize       = (unsigned __int64)ImageInfo->ImageSize;

    /* FullImageName is optional and not guaranteed NUL-terminated; copy by
     * Length and terminate ourselves, truncating overlong paths. */
    if (FullImageName != NULL && FullImageName->Buffer != NULL) {
        USHORT chars = FullImageName->Length / sizeof(WCHAR);
        ev.ImagePathExact = 1;
        if (chars > AEGIS_MAX_PATH - 1) {
            chars = AEGIS_MAX_PATH - 1;
            ev.ImagePathExact = 0;
        }
        RtlCopyMemory(ev.ImagePath, FullImageName->Buffer, chars * sizeof(WCHAR));
        ev.ImagePath[chars] = L'\0';
        ev.ImagePathLength = chars;
    }

    AegisPublish(AegisEvtImageLoad, &ev, sizeof(ev));
}

_{driver/modules/ImageMon.c L16–L45}

The arguments:

FullImageName specifies the file path of the loading image. It is optional (the kernel does not always supply it) and is not guaranteed to be null-terminated. We therefore copy it by length and null-terminate it manually, truncating the path if it exceeds our buffer size.
ProcessId is the ID of the process into which the image is being loaded.
ImageInfo contains information about the image being loaded, such as its base address and size.

Image loads are critical for detecting malicious activity. A DLL loading into a process from a temporary folder, or an unsigned driver loading into the kernel, will surface here with its file path and load address.

Thread and image load events are far noisier than process creations. A single application launch can trigger dozens of image-load callbacks (for every DLL loaded) and multiple thread creation events.

Reflective DLL Injection

While PsSetLoadImageNotifyRoutine is highly effective for tracking standard module loading, it has a significant blind spot: it only catches images mapped through the standard Windows loader.

When a legitimate application loads a DLL (e.g., via LoadLibrary), the request goes through the user-mode loader (ntdll!LdrLoadDll). The kernel then maps the file from disk as an image section (SEC_IMAGE). It is this kernel-level mapping operation that triggers our ImageNotifycallback.

Attackers bypass this entirely using reflective DLL injection:

Allocate Memory: The malware allocates raw, unformatted memory in the target process (e.g., using VirtualAllocEx with read/write/execute permissions).
Copy Raw Bytes: The malware copies the entire DLL file directly into this allocated space (WriteProcessMemory) as a raw data blob, rather than mapping it as an image section.
Execution: The injector starts a thread in the target process pointing to a custom function embedded within the DLL, known as the Reflective Loader.
Self-Relocation: This embedded loader executes in user space, resolving its own relocations, loading its import dependencies, and manually calling DllMain.

Because the DLL is loaded as anonymous private memory instead of a mapped image file, the kernel never generates an image-mapping event, and our callback remains completely blind to the execution.

To catch reflective loading, a production EDR cannot rely on load-image callbacks alone. It must correlate this data with memory scanning (looking for unbacked executable memory and PE headers in private space) and behavior monitoring (detecting suspicious API patterns like VirtualAllocEx immediately followed by CreateRemoteThread)

Wiring it in

The last step is to wire these modules into the driver. We initialize them in DriverEntry:

    /* Start monitor modules. Each new source is one Start call here and its
     * matching Stop in AegisUnload; the queue and IOCTL underneath never change. */
    status = ProcessMonStart();
    if (NT_SUCCESS(status)) { status = ThreadMonStart(); }
    if (NT_SUCCESS(status)) { status = ImageMonStart(); }
    if (!NT_SUCCESS(status)) {
        /* Each Stop is a no-op unless that module registered, so unwinding all
         * three is safe regardless of which one failed. */
        ImageMonStop();
        ThreadMonStop();
        ProcessMonStop();
        IoDeleteSymbolicLink(&g_SymLink);
        IoDeleteDevice(g_DeviceObject);
        g_DeviceObject = NULL;
        AegisQueueDrain();
        return status;
    }

_{driver/core/Driver.c L129–L145}

Remember to stop them in AegisUnload as well:

    /* Stop modules in reverse order. */
    ImageMonStop();
    ThreadMonStop();
    ProcessMonStop();

_{driver/core/Driver.c L78-80}

ImageMonStop calls PsRemoveLoadImageNotifyRoutine to unregister its callback, and ThreadMonStop calls PsRemoveCreateThreadNotifyRoutine.

The agent gets a little more code, mainly to format the new event types:

    case AegisEvtThreadCreate: {
        const AEGIS_THREAD_EVENT *t = (const AEGIS_THREAD_EVENT *)(evt + 1);
        if (evt->Size < sizeof(*evt) + sizeof(*t)) {
            fprintf(stderr, "[%s] malformed thread-create event (%u bytes)\n",
                    ts, evt->Size);
            break;
        }
        printf("[%s] #%-5lu THREAD  pid=%-6lu tid=%-6lu creator=%-6lu%s\n",
               ts, evt->Sequence, t->ProcessId, t->ThreadId,
               t->CreatingProcessId, t->Remote ? "  [remote]" : "");
        break;
    }

_{agent/Agent.c L68–L90 (thread cases), L91–L105 (image case)}

The event-pulling loop in the agent-including the DeviceIoControl call and parsing the packed batch using event sizes-is identical to the implementation in Part 2. Then it walks event headers and hands each one to PrintEvent.

Running it

The build script already compiles all modules under driver/modules, so no changes are needed there. Rebuild the project using build.cmd, unload the previous driver using uninstall.ps1, install the new version with install.ps1, and run the agent.

[20:47:46.565] #177   CREATE  pid=9980   ppid=3792   creator=3792   \??\C:\Windows\system32\notepad.exe
[20:47:46.565] #178   THREAD  pid=9980   tid=4240   creator=3792    [remote]
[20:47:46.581] #179   IMAGE   pid=5628   base=0x00007ffa8f1c0000 \Device\HarddiskVolume2\Windows\System32\rasadhlp.dll
[20:47:46.581] #180   IMAGE   pid=9980   base=0x00007ff610860000 \Device\HarddiskVolume2\Windows\System32\notepad.exe
[20:47:46.581] #181   IMAGE   pid=9980   base=0x00007ffa99cd0000 \Device\HarddiskVolume2\Windows\System32\ntdll.dll
[20:47:46.581] #182   IMAGE   pid=9980   base=0x00007ffa98dc0000 \Device\HarddiskVolume2\Windows\System32\kernel32.dll
[20:47:46.596] #183   IMAGE   pid=9980   base=0x00007ffa978c0000 \Device\HarddiskVolume2\Windows\System32\KernelBase.dll
[20:47:46.596] #184   IMAGE   pid=9980   base=0x00007ffa98360000 \Device\HarddiskVolume2\Windows\System32\gdi32.dll

AegisAgent streaming thread creation and image load events

One process launch triggers multiple events: process creation, DLL image loads, and the initial thread creation.

Look at event #178: the initial thread of notepad.exe is flagged as [remote]. This is the benign remote-thread case in action: Notepad’s first thread (TID 4240) runs in PID 9980, but was created by PID 3792 (Explorer, the parent process launching Notepad). Although the creator and target process differ, this behavior is entirely legitimate.

Catching real injection

The first Notepad thread above was flagged because Explorer created the process. Now we use a small injector that creates a another thread in it.

The payload here is deliberately inert: one ret instruction. The example still performs the real allocation, write, protection change, and remote-thread creation sequence, but the new thread returns immediately instead of doing anything useful.

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>

static void
PrintError(const char *operation)
{
    fprintf(stderr, "%s failed: error %lu\n", operation, GetLastError());
}

int
wmain(int argc, wchar_t **argv)
{
    static const unsigned char payload[] = { 0xC3 }; /* ret */
    wchar_t *end = NULL;
    unsigned long parsedPid;
    DWORD pid;
    DWORD threadId = 0;
    DWORD oldProtect = 0;
    SIZE_T bytesWritten = 0;
    LPVOID remoteAddress = NULL;
    HANDLE process = NULL;
    HANDLE thread = NULL;
    int result = 1;

    if (argc != 2) {
        fwprintf(stderr, L"usage: %ls <pid>\n", argv[0]);
        return 2;
    }

    parsedPid = wcstoul(argv[1], &end, 10);
    if (argv[1][0] == L'\0' || *end != L'\0' || parsedPid == 0) {
        fwprintf(stderr, L"invalid pid: %ls\n", argv[1]);
        return 2;
    }
    pid = (DWORD)parsedPid;

    process = OpenProcess(
        PROCESS_CREATE_THREAD |
        PROCESS_QUERY_INFORMATION |
        PROCESS_VM_OPERATION |
        PROCESS_VM_WRITE |
        PROCESS_VM_READ,
        FALSE,
        pid);
    if (process == NULL) {
        PrintError("OpenProcess");
        goto Cleanup;
    }

    remoteAddress = VirtualAllocEx(
        process, NULL, sizeof(payload), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    if (remoteAddress == NULL) {
        PrintError("VirtualAllocEx");
        goto Cleanup;
    }

    if (!WriteProcessMemory(
            process, remoteAddress, payload, sizeof(payload), &bytesWritten) ||
        bytesWritten != sizeof(payload)) {
        PrintError("WriteProcessMemory");
        goto Cleanup;
    }

    if (!VirtualProtectEx(
            process, remoteAddress, sizeof(payload), PAGE_EXECUTE_READ,
            &oldProtect)) {
        PrintError("VirtualProtectEx");
        goto Cleanup;
    }

    if (!FlushInstructionCache(process, remoteAddress, sizeof(payload))) {
        PrintError("FlushInstructionCache");
        goto Cleanup;
    }

    thread = CreateRemoteThread(
        process,
        NULL,
        0,
        (LPTHREAD_START_ROUTINE)remoteAddress,
        NULL,
        0,
        &threadId);
    if (thread == NULL) {
        PrintError("CreateRemoteThread");
        goto Cleanup;
    }

    printf("injector=%lu target=%lu remote-thread=%lu\n",
           GetCurrentProcessId(), pid, threadId);

    if (WaitForSingleObject(thread, INFINITE) != WAIT_OBJECT_0) {
        PrintError("WaitForSingleObject");
        goto Cleanup;
    }

    result = 0;

Cleanup:
    if (thread != NULL) {
        CloseHandle(thread);
    }
    if (remoteAddress != NULL && process != NULL) {
        VirtualFreeEx(process, remoteAddress, 0, MEM_RELEASE);
    }
    if (process != NULL) {
        CloseHandle(process);
    }
    return result;
}

Compile the injector and execute it, passing the PID of the Notepad process:

.\remote-thread-demo.exe 9980

Ensure the injector is run at the same integrity level as Notepad. With AegisAgent already running and pulling events, the output should look similar to this:

[21:20:54.997] #10658 THREAD  pid=4988   tid=4532   creator=4988
[21:20:55.763] #10659 CREATE  pid=9492   ppid=8208   creator=8208   \??\C:\Users\sonx\AppData\Local\Temp\remote-thread-demo.exe
[21:20:55.763] #10660 THREAD  pid=9492   tid=4504   creator=8208    [remote]
[21:20:55.763] #10661 IMAGE   pid=8208   base=0x00007ffa94c50000 \Device\HarddiskVolume2\Windows\System32\apphelp.dll
[21:20:55.763] #10662 IMAGE   pid=9492   base=0x00007ff677570000 \Device\HarddiskVolume2\Users\sonx\AppData\Local\Temp\remote-thread-demo.exe
[21:20:55.763] #10663 IMAGE   pid=9492   base=0x00007ffa99cd0000 \Device\HarddiskVolume2\Windows\System32\ntdll.dll
[21:20:55.763] #10664 IMAGE   pid=9492   base=0x00007ffa98dc0000 \Device\HarddiskVolume2\Windows\System32\kernel32.dll
[21:20:55.763] #10665 IMAGE   pid=9492   base=0x00007ffa978c0000 \Device\HarddiskVolume2\Windows\System32\KernelBase.dll
[21:20:55.763] #10666 IMAGE   pid=9492   base=0x00007ffa94c50000 \Device\HarddiskVolume2\Windows\System32\apphelp.dll
[21:20:55.763] #10667 THREAD  pid=9980   tid=5252   creator=9492    [remote]
[21:20:55.763] #10668 TEXIT   pid=9980   tid=5252
[21:20:55.763] #10669 IMAGE   pid=9492   base=0x00007ffa95310000 \Device\HarddiskVolume2\Windows\System32\kernel.appcore.dll

AegisAgent capturing a remote thread injection event

The agent log capturing a remote thread creation and immediate exit in the target process (notepad.exe, PID 9980) initiated by the injector (remote-thread-demo.exe, PID 9492).

Event #10667 is the remote thread created by our injector. It runs in the Notepad process (PID 9980) but was created by the injector (PID 9492), which flags it as [remote]. This provides clear visibility into cross-process code injection in a real-world scenario.

Thread and image load monitoring are straightforward additions-they are simply more notify callbacks executing at PASSIVE_LEVEL, the same as process creation. The next two event sources break this pattern, which is what makes them interesting:

Filesystem visibility comes from a minifilter-a different type of driver entirely, with its own registration model, altitudes, and pre/post-operation callbacks that can fire at DISPATCH_LEVEL. This is the level the queue’s spinlock was designed for, and we will finally exercise it.
Network visibility comes from a WFP callout-the Windows Filtering Platform, hooking the connect/accept path.

Both mechanisms are heavier than simple notify callbacks, and the minifilter even changes how the driver is packaged. However, the queue they publish into and the IOCTL the agent pulls from remain identical to the ones we built in Part 2.