Saturday, August 15, 2009

Windows and UNICODE

I'll levae it for the reader to get some background on Unicode from the Internet, and will directly jump to its affect on Win32 programing.

One may have noticed use of new data types everywhere in win32 programs. Data types like LPCTSTR, LPWSTR, TCHAR etc...
Accompanying them exist these functions with have either an A/W or T pre/post pended.

A char is a char. One byte, everywhere. Wide characters are defined as wchar_t. If you look in WCHAR.H, you'll find something like :

typedef unsigned short wchar_t ;       // 2bytes


When you are dealing with wide strings, you have to apend an L .. for eg :

wchar_t* str = L"Hello";
wchar_t ch = L'v';

Functions that one is used to working with ASCII strings wont work properly with wide characters. These are 2 byte long after all. So functions for wide strings exist.

printf - wprintf
strcpy - wcscpy

so on and so forth.

However, if you want to make your app work with both ASCII and UNICODE, you must use the 'T' flavoured functions/data types. These translate to ASCII or UNICODE depending upon if _UNICODE identifier is defined. For eg :

TCHAR tch;

now, TCHAR is defined as :

typedef wchar_t TCHAR ;   // if _UNICODE is defined
typedef char TCHAR ;    // Otherwise
 
Also, there exists a macro named : _T(x)/__T(x)/TEXT(x), which tranlates to L##x if _UNICODE is defined, and to simple x otherwise. So declaring a string which is intended to work with both A/U should be with the TEXT macro, like :

TEXT("Hello");

To support this, there exist functions and APIs which accept "T" flavoured arguments.
e.g.

_tcscpy, which is wcscpy if _UNICODE is defined, strcpy otherwise.

Disclaimer : The information in this weblog is provided “AS IS” with no warranties, and confers no rights.

This is a personal blog, and does not represent the thoughts, intentions or plans of my employer/peers. It is solely my opinion.

Heap Corruption

Corrupted the heap and caused an app crash ? No problems, as we'll see just exactly can you deal with such issues, and debug them. We'll work on a simple example of Heap Corruption, which is caused due to overwriting beyond the memory allocated to you.

Here is a simple program that demonstrates this :

#include "windows.h"
#include "conio.h"
#define SIZE 10

int _tmain(int argc, _TCHAR* argv[])
{
LPCTSTR lpSource = TEXT("
This is the source of the ominious string which will overflow the heap
");
LPTSTR lpDest = NULL;
lpDest = (LPTSTR) HeapAlloc( GetProcessHeap(), 0, sizeof(SIZE) * sizeof(TCHAR) );

if( lpDest )
{
_tcscpy( lpDest, lpSource );
_tprintf( TEXT("Destination : %s\n"), lpDest );
HeapFree( GetProcessHeap(), 0, lpDest );

getch();
return 0;
}



Well, lets execute this program :

Ahh...there you go !

Lets see in detail what has happened :

0:000> !peb
PEB at 7ffdf000
... ProcessHeap: 00150000


Single stepping, we reach the HeapAlloc call :

0:000> p
eax=00152e40 ebx=7ffdf000 ecx=7c9101bb edx=00150608 esi=0012fe84 edi=0012fe7c
eip=00411412 esp=0012fe84 ebp=0012ff68 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
Heap!wmain+0x52:
00411412 8bf4 mov esi,esp
0:000> dv /V
0012ff70 @ebp+0x08 argc = 1
0012ff74 @ebp+0x0c argv = 0x00342a10
0012ff60 @ebp-0x08 lpSource = 0x00415768 "This is the source of the ominious string which will overflow the heap"
0012ff54 @ebp-0x14 lpDest = 0x00152e40 ""

0:000> dt _HEAP_ENTRY 0x00152e40-8
ntdll!_HEAP_ENTRY
+0x000 Size : 4
+0x002 PreviousSize : 0x46
+0x000 SubSegmentCode : 0x00460004
+0x004 SmallTagIndex : 0x37 '7'
+0x005 Flags : 0x7 ''
+0x006 UnusedBytes : 0x18 ''
+0x007 SegmentIndex : 0 ''


0:000> !heap -a 00150000
Index Address Name Debugging options enabled
1: 00150000
Segment at 00150000 to 00250000 (00003000 bytes committed)
Flags: 50000062
ForceFlags: 40000060
Granularity: 8 bytes
Segment Reserve: 00100000
Segment Commit: 00002000
DeCommit Block Thres: 00000200
DeCommit Total Thres: 00002000
Total Free Size: 00000035
Max. Allocation Size: 7ffdefff
Lock Variable at: 00150608
Next TagIndex: 0000
Maximum TagIndex: 0000
Tag Entries: 00000000
PsuedoTag Entries: 00000000
Virtual Alloc List: 00150050
UCR FreeList: 00150598
FreeList Usage: 00000000 00200000 00000000 00000000
FreeList[ 35 ] at 00150320: 00152e60 . 00152e60
00152e58: 00020 . 001a8 [14] - free
Segment00 at 00150640:
Flags: 00000000
Base: 00150000
First Entry: 00150680
Last Entry: 00250000
Total Pages: 00000100
Total UnCommit: 000000fd
Largest UnCommit:000fd000
UnCommitted Ranges: (1)
00153000: 000fd000

Heap entries for Segment00 in Heap 00150000
00150000: 00000 . 00640 [01] - busy (640)
00150640: 00640 . 00040 [01] - busy (40)
00150680: 00040 . 01818 [07] - busy (1800), tail fill - unable to read heap entry extra at 00151e90
00151e98: 01818 . 00040 [07] - busy (22), tail fill - unable to read heap entry extra at 00151ed0
00151ed8: 00040 . 00068 [07] - busy (4e), tail fill - unable to read heap entry extra at 00151f38
00151f40: 00068 . 002f0 [07] - busy (2d8), tail fill - unable to read heap entry extra at 00152228
00152230: 002f0 . 00330 [07] - busy (314), tail fill - unable to read heap entry extra at 00152558
00152560: 00330 . 00330 [07] - busy (314), tail fill - unable to read heap entry extra at 00152888
00152890: 00330 . 00040 [07] - busy (24), tail fill - unable to read heap entry extra at 001528c8
001528d0: 00040 . 00040 [07] - busy (24), tail fill - unable to read heap entry extra at 00152908
00152910: 00040 . 00028 [07] - busy (10), tail fill - unable to read heap entry extra at 00152930
00152938: 00028 . 00058 [07] - busy (40), tail fill - unable to read heap entry extra at 00152988
00152990: 00058 . 00058 [07] - busy (40), tail fill - unable to read heap entry extra at 001529e0
001529e8: 00058 . 00030 [07] - busy (18), tail fill - unable to read heap entry extra at 00152a10
00152a18: 00030 . 000e0 [07] - busy (c4), tail fill - unable to read heap entry extra at 00152af0
00152af8: 000e0 . 00060 [07] - busy (44), tail fill - unable to read heap entry extra at 00152b50
00152b58: 00060 . 00020 [07] - busy (1), tail fill - unable to read heap entry extra at 00152b70
00152b78: 00020 . 00028 [07] - busy (10), tail fill - unable to read heap entry extra at 00152b98
00152ba0: 00028 . 00068 [07] - busy (4c), tail fill - unable to read heap entry extra at 00152c00
00152c08: 00068 . 00230 [07] - busy (214), tail fill - unable to read heap entry extra at 00152e30
00152e38: 00230 . 00020 [07] - busy (8), tail fill - unable to read heap entry extra at 00152e50 //Our Block
00152e58: 00020 . 001a8 [14] free fill
00153000: 000fd000 - uncommitted bytes.

Lets go past the strcpy (well what's with _tcscpy ?)

0:000> p

And check the Heap now...

0:000> !heap -a 00150000
....
00152e38: 00230 .
00020 [07] - busy (8), tail fill - unable to read heap entry extra at 00152e50
00152e58:
00308 . 00308 [00]
unable to read heap entry at 00153160


And see what has happened, the last block has become corrupt. The value after the heap block address is the size of the prev block. The value after the dot is the size of the next block. These dont match here !

Lets dump the heap entry :

0:000> dt _HEAP_ENTRY 00152e58
ntdll!_HEAP_ENTRY
+0x000 Size : 0x73
+0x002 PreviousSize : 0x6f
+0x000 SubSegmentCode : 0x006f0073
+0x004 SmallTagIndex : 0x75 'u'
+0x005 Flags : 0 ''
+0x006 UnusedBytes : 0x72 'r'
+0x007 SegmentIndex : 0 ''


Ohh...see the 'u' and the 'r' .. lets confirm :

0:000> du 00152e58
00152e58 "source of the ominious string wh"
00152e98 "ich will overflow the heap"

The string overwrote the Heap Metadata.

One should use a page heap to catch the corruption. This an be set by using the !gflag +hpa command in the debugger :

0:000> !gflag
Current NtGlobalFlag contents: 0x02000000
hpa - Place heap allocations at ends of pages

Or by the following command in th command line : gflags -r +hpa

When we run the system with these settings :

0:000> g


===========================================================
VERIFIER STOP 00000008: pid 0xB90: corrupted suffix pattern

00151000 : Heap handle
00255140 : Heap block
00000008 : Block size
00000000 :
===========================================================

(b90.b94): Break instruction exception - code 80000003 (first chance)

ChildEBP RetAddr
0012fb3c 7c954a15 ntdll!DbgBreakPoint
0012fb54 7c96943e ntdll!RtlApplicationVerifierStop+0x160
0012fbd0 7c96a517 ntdll!RtlpDphReportCorruptedBlock+0x17c
0012fbf4 7c96a71a ntdll!RtlpDphNormalHeapFree+0x2e
0012fc44 7c96d7bb ntdll!RtlpDebugPageHeapFree+0x79
0012fcb8 7c949e1c ntdll!RtlDebugFreeHeap+0x2c
0012fda0 7c927553 ntdll!RtlFreeHeapSlowly+0x37
*** WARNING: Unable to verify checksum for Heap.exe
0012fe70 0041146b ntdll!RtlFreeHeap+0xf9
0012ff68 00411a58 Heap!wmain+0xab [c:\users\admin\my documents\visual studio 2008\projects\heap\heap\heap.cpp @ 19]
0012ffb8 0041189f Heap!__tmainCRTStartup+0x1a8 [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 583]
0012ffc0 7c817067 Heap!wmainCRTStartup+0xf [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 403]
0012fff0 00000000 kernel32!BaseProcessStart+0x23


Note that the corruption is detected when freeing the Heap, because we are using Normal Page Heap. If we use Full Page Heap, we can detect the corruption during the overwrite itself.

Also, one can use the following flags to aid in the etsting of their apps :

0:000> !gflag
Current NtGlobalFlag contents: 0x00000070
htc - Enable heap tail checking
hfc - Enable heap free checking
hpc - Enable heap parameter checking

In the next blog, we will check out some of the options with a utility called Application Verifier.

Disclaimer : The information in this weblog is provided “AS IS” with no warranties, and confers no rights.

This is a personal blog, and does not represent the thoughts, intentions or plans of my employer/peers. It is solely my opinion.