Friday, December 1, 2017

Epilogues and esp. prologues are probably not what you think.

  The NT/amd64 ABI speaks of function prologues and epilogues, and the rest of the function. 
  Epilogues might not be what you think and prologues almost definitely are not what you think. 


  First, the easier clarification, is that a function an have any number of epilogues. 
  It can have zero epilogues, it can have one epilogue at the end, it can have 
  one epilogue not at the end, and it can have any number of epilogues. 

  An epilogue is not the code located at the end of the function, 
  it is the last thing a function runs -- it is about dynamic execution, 
  not static location. 

  A function will have zero epilogues if it never returns: 

  type no_epilogue.c 
  cl /LD /O2 /GL /GS- no_epilogue.c /link /incremental:no /export:no_epilogue /nod /noentry
  link /dump /disasm no_epilogue.dll  

  C:\> type no_epilogue.c
        cl /LD /O2 /GL /GS- no_epilogue.c /link /incremental:no /export:no_epilogue /nod /noentry
        link /dump /disasm no_epilogue.dll


   void no_epilogue(void (*f)()) { while (1) f(); } 

  Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64 

    0000000180001000: 40 53              push        rbx 
    0000000180001002: 48 83 EC 20        sub         rsp,20h 
    0000000180001006: 48 8B D9           mov         rbx,rcx 
    0000000180001009: 0F 1F 80 00 00 00  nop         dword ptr [rax+0000000000000000h] 
                      00 
    0000000180001010: FF D3              call        rbx 
    0000000180001012: EB FC              jmp         0000000180001010 


   A function can have multiple epilogues if it has an "early return":  
    
  C:\> type multiple_epilogues.c 
       cl /LD /O2 /GL /GS- multiple_epilogues.c /link /incremental:no /export:multiple_epilogues /nod /noentry
       link /dump /disasm multiple_epilogues.dll 

   int multiple_epilogues(int i, int (*f)(void), int (*g)(void)) 
   { 
     if (i) 
      return f(); 
     return g() + g() + g(); 
   } 

  Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64 
    0000000180001000:   push        rdi 
    0000000180001002:   sub         rsp,20h 
    0000000180001006:   mov         rdi,r8 
    0000000180001009:   test        ecx,ecx 
    000000018000100B:   je          0000000180001015 
    000000018000100D:   add         rsp,20h            <== possibly epilog  
    0000000180001011:   pop         rdi                <== epilog 
    0000000180001012:   jmp         rdx                <== epilog  
    0000000180001015:   mov         qword ptr [rsp+30h],rbx  
    000000018000101A:   call        rdi  
    000000018000101C:   mov         ebx,eax  
    000000018000101E:   call        rdi  
    0000000180001020:   add         ebx,eax  
    0000000180001022:   call        rdi  
    0000000180001024:   add         eax,ebx  
    0000000180001026:   mov         rbx,qword ptr [rsp+30h]  
    000000018000102B:   add         rsp,20h               <== possibly epilog  
    000000018000102F:   pop         rdi                   <== epilog 
    0000000180001030:   ret                               <== epilog
 

   And this multiple epiloge case accidentally demonstrates the next point. 

   Just as epilogue is not instructions located at the end of a function, 
   prologue is not instructions located at the start of a function. 

   The prologue *instructions* are the instructions that save nonvolatile 
   registers, or adjust rsp (prior to frame pointer establishment -- not alloca), 
   or establish the frame pointer (mov x, rsp). 

   The prologue instructions can and are interleaved with somewhat arbitrary 
   other instructions. The critical requirement is that nonvolatiles be saved 
   before nonvolatiles are changed -- as well as recording rsp adjustment 
   and frame pointer establishment -- such recording being a function 
   of executing the instruction marked as such in the "xdata". 

   The multi-prologue example above has such "dispersed" prologue. 
   Let's look at it again in more detail: 

  C:\> link /dump /unwindinfo /disasm multiple_epilogues.dll 

    Microsoft (R) COFF/PE Dumper Version 14.00.24215.1 

    0180001000:  push rdi               <=== prologue instruction, unsurprising  
    0180001002:  sub  rsp,20h           <=== prologue instruction, unsurprising  
    0180001006:  mov  rdi,r8 
    0180001009:  test ecx,ecx 
    018000100B:  je   0000000180001015 
    018000100D:  add  rsp,20h 
    0180001011:  pop  rdi 
    0180001012:  jmp  rdx 
    0180001015:  mov  qword ptr [rsp+30h],rbx  <=== also a prologue instruction  
    018000101A:  call rdi               <=== offset 1A in the unwind info below  
    018000101C:  mov  ebx,eax 
    018000101E:  call rdi 
    0180001020:  add  ebx,eax 
    0180001022:  call rdi 
    0180001024:  add  eax,ebx 
    0180001026:  mov  rbx,qword ptr [rsp+30h] 
    018000102B:  add  rsp,20h 
    018000102F:  pop  rdi 
    0180001030:  ret

  Function Table (1) 


             Begin    End      Info      Function Name 
    00000000 00001000 00001031 0000208C 
      Unwind version: 1 
      Unwind flags: None 
      Size of prologue: 0x1A     <== This is also telling.
      Count of codes: 4 
      Unwind codes: 
        1A: SAVE_NONVOL, register=rbx offset=0x30 
        06: ALLOC_SMALL, size=0x20 
        02: PUSH_NONVOL, register=rdi 
       /*\ 
        * 
        * 
        * look here 


   The critical information we want to look at is the left most column 
   of the unwind codes. These are the offsets just after prologue instructions. 
   They are reverse sorted by offset, and the underlying data is not fixed 
   size per line shown -- you must always walk them linearly from the start. 

   Offset 2 and 6 are what you expect -- the first two instructions. 
   But offset 1A is quite a bit into the function -- that is a bit surprising when you first see it.

 And another thing. While the specification is that the offsets are just after the instruction that does the nonvolatile save, etc., the requirement and reality are looser. The offset can be later than the save, as long as it is before a change. As well, the location of a save might change between the save and the recorded offset. For example, the compiler will move nonvolatiles into home space, and then adjust rsp, and then or at the same place record that the nonvolatile was saved.

This can be achieved with the multiple_prologue.c example just by compiling with /O1 instead of /O2. Let's see:


  cl /LD /O1 /GL /GS- multiple_epilogues.c /link /incremental:no /export:multiple_epilogues /nod /noentry 
  link /dump /disasm /unwindinfo multiple_epilogues.dll
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64
Microsoft (R) COFF/PE Dumper Version 14.00.24215.1

  0180001000:   mov   qword ptr [rsp+8],rbx   <==== rbx saved here
  0180001005:   push  rdi
  0180001006:   sub   rsp,20h                 <==== but recorded here
  018000100A:   mov   rdi,r8
  018000100D:   test  ecx,ecx
  018000100F:   je    0000000180001015
  0180001011:   call  rdx
  0180001013:   jmp   0000000180001021
  0180001015:   call  rdi
  0180001017:   mov   ebx,eax
  0180001019:   call  rdi
  018000101B:   add   ebx,eax
  018000101D:   call  rdi
  018000101F:   add   eax,ebx
  0180001021:   mov   rbx,qword ptr [rsp+30h]
  0180001026:   add   rsp,20h
  018000102A:   pop   rdi
  018000102B:   ret

Function Table (1)
           Begin    End      Info      Function Name
  00000000 00001000 0000102C 0000208C
    Unwind version: 1
    Unwind flags: None
    Size of prologue: 0x0A
    Count of codes: 4
    Unwind codes:
      0A: SAVE_NONVOL, register=rbx offset=0x30     <=== rbx save
      0A: ALLOC_SMALL, size=0x20                    <=== two unwind codes with same offset
      06: PUSH_NONVOL, register=rdi



And see how rbx is saved at rsp+8 but recorded as rsp+30, because
rsp changes by 28 between the save and the recorded position.

And this is all ok. If you take an exception between the save and recorded
position of the save, rbx has not been changed, and need not be restored.
Such an exception is rare -- maybe stack overflow -- but the ABI accounts
for exceptions and stack walks from arbitrary instructions.