In Part 2, our driver gained the ability to capture process creation and exit events, storing them in a spinlock-protected queue, while our agent pulls a packed stream over the IOCTL channel.
By the end of this post, the driver will also report thread starts/exits and image loads.
Same caveat as before: this is a test-signed kernel driver. A single bug can bluescreen the system. Run it on a VM or a machine you’re willing to break, with
bcdedit /set testsigning onand a reboot.Source for this post is on GitHub, tagged
blog-03. Diff it against Part 2 to see exactly what moved:blog-02...blog-03.
New contracts
In Part 2, we built the event format so that adding a source means adding a type and a payload struct. Let’s add the event types and payload formats for thread and image notifications.
typedef enum _AEGIS_EVENT_TYPE {
AegisEvtProcessCreate = 1,
AegisEvtProcessExit = 2,
AegisEvtThreadCreate = 3,
AegisEvtThreadExit = 4,
AegisEvtImageLoad = 5,
AegisEvtFileOp = 6, /* reserved - minifilter */
AegisEvtNetConn = 7, /* reserved - WFP callout */
} AEGIS_EVENT_TYPE;
common/AegisDriverProtocol.h L35–L43
Next, we define the payload for thread events. A thread belongs to a target process and is started by a creator process (which are usually the same).
typedef struct _AEGIS_THREAD_EVENT {
unsigned long ProcessId;
unsigned long ThreadId;
unsigned long CreatingProcessId;
unsigned char Remote; /* 1 if CreatingProcessId != ProcessId */
unsigned char Reserved[3];
} AEGIS_THREAD_EVENT, *PAEGIS_THREAD_EVENT;
common/AegisDriverProtocol.h L70–L76
Finally, we define the payload for image loads, tracking which image mapped into which process, where it was loaded in memory, and whether it loaded into the kernel:
typedef struct _AEGIS_IMAGE_EVENT {
unsigned long ProcessId;
unsigned char SystemModeImage; /* 1 if loaded into the kernel, not a process */
unsigned char ImagePathExact; /* exact source name and not truncated */
unsigned short ImagePathLength; /* WCHAR count in ImagePath, no NUL */
unsigned __int64 ImageBase; /* where it mapped */
unsigned __int64 ImageSize;
wchar_t ImagePath[AEGIS_MAX_PATH];
} AEGIS_IMAGE_EVENT, *PAEGIS_IMAGE_EVENT;
common/AegisDriverProtocol.h L80–L88
That is the entire change to our communication contract. The rest of our work involves writing two new modules to feed these events into the queue.
Watching threads
Thread monitoring follows the same pattern as process monitoring: the kernel maintains a list of registered callbacks and invokes them whenever a thread is created or destroyed. We register our callback using PsSetCreateThreadNotifyRoutine:
NTSTATUS
ThreadMonStart(void)
{
NTSTATUS status = PsSetCreateThreadNotifyRoutine(ThreadNotify);
if (NT_SUCCESS(status)) {
g_Registered = TRUE;
} else {
DbgPrint("[AegisMon] ThreadMon register failed 0x%08X\n", status);
}
return status;
}
driver/modules/ThreadMon.c L42–L52
The callback itself is straightforward:
static void
ThreadNotify(_In_ HANDLE ProcessId, _In_ HANDLE ThreadId, _In_ BOOLEAN Create)
{
AEGIS_THREAD_EVENT ev;
RtlZeroMemory(&ev, sizeof(ev));
ev.ProcessId = (ULONG)(ULONG_PTR)ProcessId;
ev.ThreadId = (ULONG)(ULONG_PTR)ThreadId;
if (!Create) {
/* Thread teardown: only the identity is meaningful. */
AegisPublish(AegisEvtThreadExit, &ev, sizeof(ev));
return;
}
/* On creation this routine runs in the context of the thread doing the
* creating, so the current process is the creator. If that isn't the
* process the new thread will run in, something reached across a process
* boundary to start it - the CreateRemoteThread shape. It also happens
* benignly: a parent starts its child's first thread the same way. So the
* flag is a lead to correlate, not a verdict. */
ev.CreatingProcessId = (ULONG)(ULONG_PTR)PsGetCurrentProcessId();
ev.Remote = (ev.CreatingProcessId != ev.ProcessId) ? 1 : 0;
AegisPublish(AegisEvtThreadCreate, &ev, sizeof(ev));
}
driver/modules/ThreadMon.c L15–L40
The kernel passes three arguments to our callback: the target process ID (ProcessId), the new thread ID (ThreadId), and a boolean (Create) indicating whether the thread is being created or destroyed.
Who started this thread?
The create path has one extra line that earns its keep:
ev.CreatingProcessId = (ULONG)(ULONG_PTR)PsGetCurrentProcessId();
When the thread creation callback fires, it executes in the context of the thread initiating the creation-the creator. Consequently, PsGetCurrentProcessId returns the ID of the process performing the creation, rather than the target process where the new thread will run.
Normally, these are the same process, such as a program spawning a worker thread inside itself. When they differ, it indicates that one process has reached across boundaries to start a thread inside another process. This is the exact mechanism of CreateRemoteThread, a classic building block of code injection (mapping shellcode into a target process and creating a remote thread to execute it).
flowchart LR
subgraph normal["Normal: in-process thread"]
direction TB
A1["process A"] -->|"creates thread in A"| A2["thread (runs in A)"]
end
subgraph remote["Remote: cross-process thread"]
direction TB
B1["process B<br/>(injector)"] -->|"creates thread in C"| C1["thread (runs in C)"]
end
%% Force horizontal placement
normal ~~~ remote
We set a Remote flag in the event when the creating process and the target process differ. When the agent receives this, it prints [remote].
However, similar to the PPID-spoofing discussion in Part 2, cross-process thread creation is not always malicious. When a parent launches a child, the child’s very first thread is created by the parent, meaning the creator and owner differ legitimately. Therefore, you will see the [remote] flag on benign process launches. The true detection signal comes from correlation: a remote thread suddenly appearing in a long-running process, initiated by a process that has no logical reason to interact with it.
Watching images
Following the same pattern, PsSetLoadImageNotifyRoutine registers a callback that the kernel invokes every time an image is mapped-whether it is a user-mode DLL loaded into a process or a kernel driver loaded into the system. The callback is slightly more complex because the kernel provides more metadata:
static void
ImageNotify(_In_opt_ PUNICODE_STRING FullImageName, _In_ HANDLE ProcessId,
_In_ PIMAGE_INFO ImageInfo)
{
AEGIS_IMAGE_EVENT ev;
RtlZeroMemory(&ev, sizeof(ev));
/* ProcessId is zero when the image is a kernel-mode driver rather than a
* user-mode module mapped into a process. */
ev.ProcessId = (ULONG)(ULONG_PTR)ProcessId;
ev.SystemModeImage = ImageInfo->SystemModeImage ? 1 : 0;
ev.ImageBase = (unsigned __int64)(ULONG_PTR)ImageInfo->ImageBase;
ev.ImageSize = (unsigned __int64)ImageInfo->ImageSize;
/* FullImageName is optional and not guaranteed NUL-terminated; copy by
* Length and terminate ourselves, truncating overlong paths. */
if (FullImageName != NULL && FullImageName->Buffer != NULL) {
USHORT chars = FullImageName->Length / sizeof(WCHAR);
ev.ImagePathExact = 1;
if (chars > AEGIS_MAX_PATH - 1) {
chars = AEGIS_MAX_PATH - 1;
ev.ImagePathExact = 0;
}
RtlCopyMemory(ev.ImagePath, FullImageName->Buffer, chars * sizeof(WCHAR));
ev.ImagePath[chars] = L'\0';
ev.ImagePathLength = chars;
}
AegisPublish(AegisEvtImageLoad, &ev, sizeof(ev));
}
driver/modules/ImageMon.c L16–L45
The arguments:
FullImageNamespecifies the file path of the loading image. It is optional (the kernel does not always supply it) and is not guaranteed to be null-terminated. We therefore copy it by length and null-terminate it manually, truncating the path if it exceeds our buffer size.ProcessIdis the ID of the process into which the image is being loaded.ImageInfocontains information about the image being loaded, such as its base address and size.
Image loads are critical for detecting malicious activity. A DLL loading into a process from a temporary folder, or an unsigned driver loading into the kernel, will surface here with its file path and load address.
Thread and image load events are far noisier than process creations. A single application launch can trigger dozens of image-load callbacks (for every DLL loaded) and multiple thread creation events.
Reflective DLL Injection
While PsSetLoadImageNotifyRoutine is highly effective for tracking standard module loading, it has a significant blind spot: it only catches images mapped through the standard Windows loader.
When a legitimate application loads a DLL (e.g., via LoadLibrary), the request goes through the user-mode loader (ntdll!LdrLoadDll). The kernel then maps the file from disk as an image section (SEC_IMAGE). It is this kernel-level mapping operation that triggers our ImageNotifycallback.
Attackers bypass this entirely using reflective DLL injection:
- Allocate Memory: The malware allocates raw, unformatted memory in the target process (e.g., using
VirtualAllocExwith read/write/execute permissions). - Copy Raw Bytes: The malware copies the entire DLL file directly into this allocated space (
WriteProcessMemory) as a raw data blob, rather than mapping it as an image section. - Execution: The injector starts a thread in the target process pointing to a custom function embedded within the DLL, known as the Reflective Loader.
- Self-Relocation: This embedded loader executes in user space, resolving its own relocations, loading its import dependencies, and manually calling
DllMain.
Because the DLL is loaded as anonymous private memory instead of a mapped image file, the kernel never generates an image-mapping event, and our callback remains completely blind to the execution.
To catch reflective loading, a production EDR cannot rely on load-image callbacks alone. It must correlate this data with memory scanning (looking for unbacked executable memory and PE headers in private space) and behavior monitoring (detecting suspicious API patterns like VirtualAllocEx immediately followed by CreateRemoteThread)
Wiring it in
The last step is to wire these modules into the driver. We initialize them in DriverEntry:
/* Start monitor modules. Each new source is one Start call here and its
* matching Stop in AegisUnload; the queue and IOCTL underneath never change. */
status = ProcessMonStart();
if (NT_SUCCESS(status)) { status = ThreadMonStart(); }
if (NT_SUCCESS(status)) { status = ImageMonStart(); }
if (!NT_SUCCESS(status)) {
/* Each Stop is a no-op unless that module registered, so unwinding all
* three is safe regardless of which one failed. */
ImageMonStop();
ThreadMonStop();
ProcessMonStop();
IoDeleteSymbolicLink(&g_SymLink);
IoDeleteDevice(g_DeviceObject);
g_DeviceObject = NULL;
AegisQueueDrain();
return status;
}
driver/core/Driver.c L129–L145
Remember to stop them in AegisUnload as well:
/* Stop modules in reverse order. */
ImageMonStop();
ThreadMonStop();
ProcessMonStop();
ImageMonStop calls PsRemoveLoadImageNotifyRoutine to unregister its callback, and ThreadMonStop calls PsRemoveCreateThreadNotifyRoutine.
The agent gets a little more code, mainly to format the new event types:
case AegisEvtThreadCreate: {
const AEGIS_THREAD_EVENT *t = (const AEGIS_THREAD_EVENT *)(evt + 1);
if (evt->Size < sizeof(*evt) + sizeof(*t)) {
fprintf(stderr, "[%s] malformed thread-create event (%u bytes)\n",
ts, evt->Size);
break;
}
printf("[%s] #%-5lu THREAD pid=%-6lu tid=%-6lu creator=%-6lu%s\n",
ts, evt->Sequence, t->ProcessId, t->ThreadId,
t->CreatingProcessId, t->Remote ? " [remote]" : "");
break;
}
agent/Agent.c L68–L90 (thread cases), L91–L105 (image case)
The event-pulling loop in the agent-including the DeviceIoControl call and parsing the packed batch using event sizes-is identical to the implementation in Part 2. Then it walks event headers and hands each one to PrintEvent.
Running it
The build script already compiles all modules under driver/modules, so no changes are needed there. Rebuild the project using build.cmd, unload the previous driver using uninstall.ps1, install the new version with install.ps1, and run the agent.
[20:47:46.565] #177 CREATE pid=9980 ppid=3792 creator=3792 \??\C:\Windows\system32\notepad.exe
[20:47:46.565] #178 THREAD pid=9980 tid=4240 creator=3792 [remote]
[20:47:46.581] #179 IMAGE pid=5628 base=0x00007ffa8f1c0000 \Device\HarddiskVolume2\Windows\System32\rasadhlp.dll
[20:47:46.581] #180 IMAGE pid=9980 base=0x00007ff610860000 \Device\HarddiskVolume2\Windows\System32\notepad.exe
[20:47:46.581] #181 IMAGE pid=9980 base=0x00007ffa99cd0000 \Device\HarddiskVolume2\Windows\System32\ntdll.dll
[20:47:46.581] #182 IMAGE pid=9980 base=0x00007ffa98dc0000 \Device\HarddiskVolume2\Windows\System32\kernel32.dll
[20:47:46.596] #183 IMAGE pid=9980 base=0x00007ffa978c0000 \Device\HarddiskVolume2\Windows\System32\KernelBase.dll
[20:47:46.596] #184 IMAGE pid=9980 base=0x00007ffa98360000 \Device\HarddiskVolume2\Windows\System32\gdi32.dll

One process launch triggers multiple events: process creation, DLL image loads, and the initial thread creation.
Look at event #178: the initial thread of notepad.exe is flagged as [remote]. This is the benign remote-thread case in action: Notepad’s first thread (TID 4240) runs in PID 9980, but was created by PID 3792 (Explorer, the parent process launching Notepad). Although the creator and target process differ, this behavior is entirely legitimate.
Catching real injection
The first Notepad thread above was flagged because Explorer created the process. Now we use a small injector that creates a another thread in it.
The payload here is deliberately inert: one ret instruction. The example still performs the real allocation, write, protection change, and remote-thread creation sequence, but the new thread returns immediately instead of doing anything useful.
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
static void
PrintError(const char *operation)
{
fprintf(stderr, "%s failed: error %lu\n", operation, GetLastError());
}
int
wmain(int argc, wchar_t **argv)
{
static const unsigned char payload[] = { 0xC3 }; /* ret */
wchar_t *end = NULL;
unsigned long parsedPid;
DWORD pid;
DWORD threadId = 0;
DWORD oldProtect = 0;
SIZE_T bytesWritten = 0;
LPVOID remoteAddress = NULL;
HANDLE process = NULL;
HANDLE thread = NULL;
int result = 1;
if (argc != 2) {
fwprintf(stderr, L"usage: %ls <pid>\n", argv[0]);
return 2;
}
parsedPid = wcstoul(argv[1], &end, 10);
if (argv[1][0] == L'\0' || *end != L'\0' || parsedPid == 0) {
fwprintf(stderr, L"invalid pid: %ls\n", argv[1]);
return 2;
}
pid = (DWORD)parsedPid;
process = OpenProcess(
PROCESS_CREATE_THREAD |
PROCESS_QUERY_INFORMATION |
PROCESS_VM_OPERATION |
PROCESS_VM_WRITE |
PROCESS_VM_READ,
FALSE,
pid);
if (process == NULL) {
PrintError("OpenProcess");
goto Cleanup;
}
remoteAddress = VirtualAllocEx(
process, NULL, sizeof(payload), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if (remoteAddress == NULL) {
PrintError("VirtualAllocEx");
goto Cleanup;
}
if (!WriteProcessMemory(
process, remoteAddress, payload, sizeof(payload), &bytesWritten) ||
bytesWritten != sizeof(payload)) {
PrintError("WriteProcessMemory");
goto Cleanup;
}
if (!VirtualProtectEx(
process, remoteAddress, sizeof(payload), PAGE_EXECUTE_READ,
&oldProtect)) {
PrintError("VirtualProtectEx");
goto Cleanup;
}
if (!FlushInstructionCache(process, remoteAddress, sizeof(payload))) {
PrintError("FlushInstructionCache");
goto Cleanup;
}
thread = CreateRemoteThread(
process,
NULL,
0,
(LPTHREAD_START_ROUTINE)remoteAddress,
NULL,
0,
&threadId);
if (thread == NULL) {
PrintError("CreateRemoteThread");
goto Cleanup;
}
printf("injector=%lu target=%lu remote-thread=%lu\n",
GetCurrentProcessId(), pid, threadId);
if (WaitForSingleObject(thread, INFINITE) != WAIT_OBJECT_0) {
PrintError("WaitForSingleObject");
goto Cleanup;
}
result = 0;
Cleanup:
if (thread != NULL) {
CloseHandle(thread);
}
if (remoteAddress != NULL && process != NULL) {
VirtualFreeEx(process, remoteAddress, 0, MEM_RELEASE);
}
if (process != NULL) {
CloseHandle(process);
}
return result;
}
Compile the injector and execute it, passing the PID of the Notepad process:
.\remote-thread-demo.exe 9980
Ensure the injector is run at the same integrity level as Notepad. With AegisAgent already running and pulling events, the output should look similar to this:
[21:20:54.997] #10658 THREAD pid=4988 tid=4532 creator=4988
[21:20:55.763] #10659 CREATE pid=9492 ppid=8208 creator=8208 \??\C:\Users\sonx\AppData\Local\Temp\remote-thread-demo.exe
[21:20:55.763] #10660 THREAD pid=9492 tid=4504 creator=8208 [remote]
[21:20:55.763] #10661 IMAGE pid=8208 base=0x00007ffa94c50000 \Device\HarddiskVolume2\Windows\System32\apphelp.dll
[21:20:55.763] #10662 IMAGE pid=9492 base=0x00007ff677570000 \Device\HarddiskVolume2\Users\sonx\AppData\Local\Temp\remote-thread-demo.exe
[21:20:55.763] #10663 IMAGE pid=9492 base=0x00007ffa99cd0000 \Device\HarddiskVolume2\Windows\System32\ntdll.dll
[21:20:55.763] #10664 IMAGE pid=9492 base=0x00007ffa98dc0000 \Device\HarddiskVolume2\Windows\System32\kernel32.dll
[21:20:55.763] #10665 IMAGE pid=9492 base=0x00007ffa978c0000 \Device\HarddiskVolume2\Windows\System32\KernelBase.dll
[21:20:55.763] #10666 IMAGE pid=9492 base=0x00007ffa94c50000 \Device\HarddiskVolume2\Windows\System32\apphelp.dll
[21:20:55.763] #10667 THREAD pid=9980 tid=5252 creator=9492 [remote]
[21:20:55.763] #10668 TEXIT pid=9980 tid=5252
[21:20:55.763] #10669 IMAGE pid=9492 base=0x00007ffa95310000 \Device\HarddiskVolume2\Windows\System32\kernel.appcore.dll

The agent log capturing a remote thread creation and immediate exit in the target process (notepad.exe, PID 9980) initiated by the injector (remote-thread-demo.exe, PID 9492).
Event #10667 is the remote thread created by our injector. It runs in the Notepad process (PID 9980) but was created by the injector (PID 9492), which flags it as [remote]. This provides clear visibility into cross-process code injection in a real-world scenario.
Next
Thread and image load monitoring are straightforward additions-they are simply more notify callbacks executing at PASSIVE_LEVEL, the same as process creation. The next two event sources break this pattern, which is what makes them interesting:
- Filesystem visibility comes from a minifilter-a different type of driver entirely, with its own registration model, altitudes, and pre/post-operation callbacks that can fire at
DISPATCH_LEVEL. This is the level the queue’s spinlock was designed for, and we will finally exercise it. - Network visibility comes from a WFP callout-the Windows Filtering Platform, hooking the connect/accept path.
Both mechanisms are heavier than simple notify callbacks, and the minifilter even changes how the driver is packaged. However, the queue they publish into and the IOCTL the agent pulls from remain identical to the ones we built in Part 2.