Wednesday, January 29, 2014

x86 instruction encoding examples

/* To follow up from earlier, let's go through some examples of x86 instruction encoding, focusing on the "modrm" and "SIB" bytes.

  The calling convention is that the first four integers/pointers are in rcx, rdx, r8, r9.  
  cl -Zi -GS- -GL -O1t  2.c -FAsc -LD -link -nod -noentry && link /dump /symbols /disasm 2.dll | more
 */

  #include <stddef.h>
  typedef unsigned UINT; 
  #define EXPORT __declspec(dllexport)  /* to reduce line length */

  UINT b; 
  EXPORT UINT register_direct(UINT a) { return a+b; } 
  /* 
    3 C1 add eax ecx   
    3 is add (there are other add opcodes, keep reading)   
    c1 is 11000001 
    11 is register direct 
    000 is eax   
    001 is ecx   
  */ 


  EXPORT void register_indirect(UINT a, UINT * b) { *b += a; } 
  /* 
   1 A add dword ptr[rdx], ecx  
   1 is add (there are other add opcodes; in this case, direction is reversed)  
   A is 00001010  
   00 is register indirect  
   001 is ecx/rcx  
   010 is edx/rdx  
  */ 


  EXPORT void register_indirect_displacement8(UINT a, UINT * b) { b[0x78/4] += a; } 
  /* 
  1 4A 78 add dword ptr[rdx+78],ecx 
  1 is add again 
  4A is 01001010 
  01 is register indirect with 8 bit displacement 
  001 is ecx/rcx 
  010 is edx/rdx 
  78 is the displacement 
  */


  EXPORT void register_indirect_displacement32(UINT a, UINT * b) { b[0x1234/4] += a; }   
  /*   
  1 8A 34 12 00 00 add dword ptr[rdx+1234], ecx 
  1 is add 
  8A is 10001010 
  10 is register indirect with 32bit displacment 
  001 is ecx/rcx 
  010 is edx/rdx   
  */ 


  EXPORT UINT sib_without_displacement(UINT a, UINT * b) { return b[a]; } 
/*
   mov eax, ecx
 8B 04 82  mov eax, dword ptr[rdx+rax*4]
 8B is mov
 modrm & 7 == 4 means there is a SIB byte
  82 is the SIB byte
  82 is 10000010
  10 is scale = 1 << 10 == 4
  000 is index = rax, index is the one multiplied by scale
  010 is base = rdx
 It seems to me the compiler should have generated just one instruction:
  mov eax, dword ptr[rcx + rax*4]
  However this could be the compiler zero extending the lower 32bits. 
  We'll see in the next example. 
 */


  EXPORT UINT sib_without_displacement_size(size_t a, UINT * b) { return b[a]; }   
 /* Yes. Here we get: 
  8B 04 8A mov eax, dword ptr[rdx+rcx*4]
  8B is mov 
  modrm 04 = 00 000 100
  00 means register indirect with no displacement
  000 is the destination register eax
  100 means there is a SIB byte
  8A is the SIB byte, 10001010, 10 is scale = 1<<10 = 4, 001 is index rcx, 010 is base rdx
  */ 



  EXPORT UINT sib_displacement8(UINT a, UINT * b) { return (b+0x78/4)[a]; }
  /* 
  Again we have the mov eax, ecx, ok.
  8B 44 82  78 mov eax, dword ptr[rdx+rax*4+78]
 8B is mov
 modrm 44 = 01000100 = 01 000 100
  01 is register indirect with 8 bit displacement
  000 is the destination register eax
  100 for r/m means there is a SIB bte
 the SIB byte is 82 = 10000010
  10 is again scale = 4
  000 is the index = rax
  010 is the base = rdcx
  78 is the displacement (or offset)
  */


  EXPORT UINT sib_displacement32(UINT a, UINT * b) { return (b+0x1234/4)[a]; }
/*
 again the mov eax, ecx
 8B 84 82 34 12 00 00 mov eax, dword ptr[rdx+rax*4+1234]
 8B is mov
 modrm 84 = 10000100 = 10 000 100
 10 is register indirect with 32 bit displacement (or offset)
 000 is destination register eax
 100 means there is a SIB byte
 SIB = 82 = 10000010 = 10 000 010
  10 is scale = 4
  000 is index rax
  101 is base rdx
  34 12 00 00 are the displacement bytes
 */


  #if defined(_AMD64_) || defined(_M_AMD64)   
  UINT a[100];   
  // rip relative is very limited -- no scale/index/base/displacement   
  // just rip + offset   
  EXPORT UINT rip_relative() { return a[0]; }   
  /*   
    8B 5 .. .. .. ..  mov eax, dword ptr[a]  
    8B is mov   
    modrm 5 = 00 000 101  
    00 is register indirect with no displacement 
    000 is the destination register eax  
    101 means RIP relative, and is only allowed with mode == 00  
     Consider if there was a constant 8 or 32bit displacement, it could just be combined with the RIP-relative offset, except 
     it'd give you a little more distance you could cover (8 bit + 32bit) or double the distance (32 bit + 32bit)  
    Then there are 4 bytes for the offset.  
  */ 


   EXPORT void rip_relative2(UINT b) { a[0] += b; }    
   /* Almost the same, but I wanted to avoid a field of zeros for rax.  
   1 D .. .. .. .. add dword ptr[a], ecx  
   modrm = D = 00001101 = 00 001 101  
    00 mode register indirect  
    001 ecx  
    101 RIP relative  
   
   Notice that sometimes in these examples add is 1 and sometimes it is 3.  
   There are even more options. 
   Some opcodes have a "direction" in them. From these examples, we can see that is the second bit, the value 2.   
   */
  #endif 


   /* Now let's demonstrate register numbering. Here I am limited to a 32 bit system. 
   A good way to see how some bytes decode is to enter them in arbitrary memory in a debugger, a debugger 
    you started just for this. 
   I do this: 
     \bin\x86\cdb cmd  
     to start up the Windows console debugger on a new dummy command line process.  
      Then I use "eb" for edit bytes, "." for current instruction pointer (EIP or RIP), and "u" for unassemble (disassemble) and "L1" for length 1.   
      I suppose this is what people used to use MS-DOS "debug.exe" for. 
      Like this:   
    \bin\x86\cdb cmd   
   0:000> eb . 1 2 3 4 ; u . l1 
   0102            add     dword ptr [edx],eax
 
   1 is add 
   modrm 2 = 00000010 = 00 000 010  
   mode 00 register indirect with no displacement
   000 = eax
   010 is edx
 
 
   So let's see exactly how all the registers are numbered.  
  
    0:000> eb . 1 0<<3  ; u . l1 
     0100            add     dword ptr [eax],eax   
   
    0:000> eb . 1 1<<3  ; u . l1   
     0108            add     dword ptr [eax],ecx   
   
    0:000> eb . 1 2<<3  ; u . l1   
     0110            add     dword ptr [eax],edx   
  
    0:000> eb . 1 3<<3  ; u . l1 
     0118            add     dword ptr [eax],ebx   
   
    0:000> eb . 1 4<<3  ; u . l1   
     0120            add     dword ptr [eax],esp   
   
    0:000> eb . 1 5<<3  ; u . l1   
     0128            add     dword ptr [eax],ebp   
   
    0:000> eb . 1 6<<3  ; u . l1   
     0130            add     dword ptr [eax],esi   
   
    0:000> eb . 1 7<<3  ; u . l1   
     0138            add     dword ptr [eax],edi   
   
   
    Remember not to take this as the entire truth, because there are special cases to indicate RIP relative or SIB byte presence.  
    The special cases involve esp/rsp/4 and ebp/rbp/5.  
    Those registers are not quite as general as the others.  
 
 
    And yet I still haven't covered 64bit changes..  


 If you are really interested in this stuff, I encourage you to go through it all in complete detail and probably change the samples or write your own. A nice change is to reorder the parameters or add extra "dummy" parameters to push the values into other registers. I suggest no more than 4 parameters per function for learning purposes, otherwise you'll get extra instructions reading the values off of the stack and lose predictability as to which register is used.


   */  

Sunday, January 12, 2014

x86 instruction encoding

This is well documented in the manuals.
x86 instructions look like this:



 optional prefix bytes
 opcode bytes 
 modrm/sib
 displacement
 immediate



The maximum size is 15 bytes.
Prefix bytes include segment overrides, size overrides, lock prefix, repe/repne.
The presence of modrm and immediate is dependent on the opcode.
The presence of displacement depends on opcode and/or modrm.



Some opcodes are one byte, like push/pop.
Some opcodes have implied register use, like push/pop.
Some opcodes have no modrm/sib but do have displacement, like jmp/call.
Many opcodes have no immediate. An example that does have immediate is add.


Let's dig into modrm/sib.
modrm is a byte with three fields.
 two bit mode, let's call it mod.
 three bit reg, let's call it r
 three bit reg or memory, let's call it r/m


The layout is left to right, so the two bit mode 0-3 looks like 0, 0x40, 0x80, 0xC0.


0 is register indirect with no displacement
0x40 is register indirect with an 8 bit displacement
0x80 is register indirect with a 32bit displacement
0xC0 is register direct.



The three bit fields of course take on 8 values 0-7.
The registers are numbered, eax=0, ecx=1, edx=2, etc.


For example, let's suppose "add" is 0. (It sort of is.)


Let's use "0b" for binary.

add edx, [ecx]
would be 0b00 010 001

add ecx, [edx+4]
would be 0b00 001 010 4

add ecx, [edx+0x12345678]

would be 0b00 001 010 78 56 34 12

add ecx, edx
would be 0 0b00 001 010


If r/m is 4 (or 5? need to check this), the rules change slightly.
Instead of that being a register in the normal scheme, it means there is a "SIB" byte.
"SIB" is scale-index-base.

You can say things like:
 add eax, [4*edx+ecx] 
 where 8 is scale
 edx is index
 ecx is base


Imagine a function like:

int get_array_element(int * array, int index)
{
  return array[index];
}


Let's pretend array is in ebx, index is in ecx.


This would look like
 mov eax, [4 * ecx + ebx]


The SIB byte, similar to the modrm byte, has three fields:
  2 bit scale
  3 bit index
  3 bit base


scale 0: 1
scale 1: 2
scale 2: 4
scale 3: 8


There is a little more to this but I have to run for now.
There are values for the SIB fields that mean no register.


There is also extending this to 64bits and providing RIP-relative addressing therein.