Sunday, January 12, 2014

x86 instruction encoding

This is well documented in the manuals.
x86 instructions look like this:



 optional prefix bytes
 opcode bytes 
 modrm/sib
 displacement
 immediate



The maximum size is 15 bytes.
Prefix bytes include segment overrides, size overrides, lock prefix, repe/repne.
The presence of modrm and immediate is dependent on the opcode.
The presence of displacement depends on opcode and/or modrm.



Some opcodes are one byte, like push/pop.
Some opcodes have implied register use, like push/pop.
Some opcodes have no modrm/sib but do have displacement, like jmp/call.
Many opcodes have no immediate. An example that does have immediate is add.


Let's dig into modrm/sib.
modrm is a byte with three fields.
 two bit mode, let's call it mod.
 three bit reg, let's call it r
 three bit reg or memory, let's call it r/m


The layout is left to right, so the two bit mode 0-3 looks like 0, 0x40, 0x80, 0xC0.


0 is register indirect with no displacement
0x40 is register indirect with an 8 bit displacement
0x80 is register indirect with a 32bit displacement
0xC0 is register direct.



The three bit fields of course take on 8 values 0-7.
The registers are numbered, eax=0, ecx=1, edx=2, etc.


For example, let's suppose "add" is 0. (It sort of is.)


Let's use "0b" for binary.

add edx, [ecx]
would be 0b00 010 001

add ecx, [edx+4]
would be 0b00 001 010 4

add ecx, [edx+0x12345678]

would be 0b00 001 010 78 56 34 12

add ecx, edx
would be 0 0b00 001 010


If r/m is 4 (or 5? need to check this), the rules change slightly.
Instead of that being a register in the normal scheme, it means there is a "SIB" byte.
"SIB" is scale-index-base.

You can say things like:
 add eax, [4*edx+ecx] 
 where 8 is scale
 edx is index
 ecx is base


Imagine a function like:

int get_array_element(int * array, int index)
{
  return array[index];
}


Let's pretend array is in ebx, index is in ecx.


This would look like
 mov eax, [4 * ecx + ebx]


The SIB byte, similar to the modrm byte, has three fields:
  2 bit scale
  3 bit index
  3 bit base


scale 0: 1
scale 1: 2
scale 2: 4
scale 3: 8


There is a little more to this but I have to run for now.
There are values for the SIB fields that mean no register.


There is also extending this to 64bits and providing RIP-relative addressing therein.

No comments:

Post a Comment