Thursday, October 26, 2017

Windows AMD64 ABI nuances part 1 -- the point of pdata/xdata.

The Windows AMD64 ABI has several surprising nuances.

It is required reading material for this blog entry.

I am focusing here not on calling convention -- where
in registers/stack to place parameters, or sizeof(long) --
but on exception handling and "pdata" and "xdata".

"p" means procedure, or what most programmers now call functions.
Pascal calls them "procedures" for example.

"x" presumably means "exception", or "arbitrary but not p".

So, what is the point of all the pdata/xdata?

There are one or two or three basic purposes, depending
on what you consider the same thing.

pdata/xdata lets debuggers walk the stack.
This data could be relegated to symbols, if that
was the only point, and if symbol-less debugging
was allowed to degrade so much as to break stack walking.

Keep in mind that you usually only have some symbols, like
for your code, but don't have all the symbols for functions
on the stack. So carrying around a small amount of metadata
at runtime can greatly improve the debugging experience.

As well, pdata/xdata lets other components walk the stack at runtime.
Such as profilers or sampling profilers (ETW).
It is not particularly practical to expect ETW to find and read
symbols while profiling, let alone for all code on the stack.

pdata/xdata let exception handling dispatch walk the stack.

Now, "walk the stack" -- is that just retrieving return addresses?
For strictly stack walking, no, not exactly, and for exception dispatch,
definitely not.


Other than return addresses, stack walking must recover non-volatile registers
in order to retrieve frame pointers, in order to recover return addresses.


The basic stack walk method is "recover RSP and then dereference and increment it".
However "recover RSP" is not trivial.


This point about nonvolatile restoration feeding into frame/stack/return restoration
is left as kind of a "hint" and not fully elaborated here.


Think about it. Given that a function can leave rsp in some frame pointer..
what we might think of as rbp, but can be any nonvolatile, and then the function
can alloca() freely, and then call another function or arbitrary functions that
saves and changes arbitrary nonvolatiles, how do you walk the stack? You must
restore all nonvolatiles, iteratively, to restore frame pointers, to discover
stack pointers, to discover return addresses.


As well, when exceptions are dispatched, and handlers are called, and
exception resumed somewhere ("exception is caught"), other than
a correct stack pointer, code needs non-volatiles restored
because locals can be in non-volatiles and are expected to survive exceptions.

I claim these use-cases are all really one slightly general thing -- restore nonvolatile registers.
RSP and RIP can be considered essentially non-volatile.

When you return "normally" from a function, RIP, RSP, and all non-volatiles are restored to what they were before you were called. (You can quibble off-by-oneness.) Likewise, a debugger or exceptions simulate returning from a function, referred to as "unwinding", without running any of the "remaining" code in the function that would normally restore the registers. They can do this via the pdata/xdata.

pdata describes the start/end of a function, and refers to the xdata.
xdata holds the "unwind codes", that describe how to undo the affects of the function's prologue, restoring all non-volatile registers, including RSP, and therefore RIP (return address) as well.

Let's see an example where exception handling can be seen to restore non-volatile registers..well I was unable to get the C compiler to do it,
so this took a while and will end this first installment.
I do have more planned.

First let's provide a minimal C runtime for our assembly.
We are only building an import library, so we just need function names.
Calling printf in the modern C runtime is more involved so we will use the old one.

msvcrt.c:

void printf() { }
void exit() { }
void __C_specific_handler() { }

msvcrt.def:

EXPORTS
printf
exit
__C_specific_handler

To build this:
cl /LD msvcrt.c /link /def:msvcrt.def /noentry /nod
del msvcrt.dll


And now the assembly nvlocala.asm:

include ksamd64.inc

    extern printf:proc
    extern exit:proc
    extern RtlUnwindEx:proc
    altentry resume

.const
str1 db "hello %X %X %X %X", 10, 0

.code

; int handler(ExceptionRecord, EstablisherFrame, ContextRecord, DispatcherContext)
; RtlUnwindEx(TargetFrame, TargetIp, ExceptionRecord, ReturnValue, OriginalContext, HistoryTable)
;                 0            8          10           18           20                28
  nested_entry handler, _text
  ;int 3
  ; Save nonvolatiles just so we can trash them, to demonstrate the point.
  ; Note that even when we call RtlUnwindEx, our frame is properly unwound.
  push_reg r12  ; the last 4 registers are nonvolatile -- easy rule to remember
  push_reg r13
  push_reg r14
  push_reg r15
  alloc_stack 038h  ; establish room for 6 parameters and align

  end_prologue

; Trash nonvolatiles to help demonstrate the point.
  xor r12, r12
  xor r13, r13
  xor r14, r14
  xor r15, r15

; Dispatch or unwind?
  mov eax, ErExceptionFlags[rcx]
  test eax, EXCEPTION_UNWIND
  jne unwind

; dispatch -- always handle it, resuming at hardcoded location
  xor eax, eax
  mov [rsp + 028h], rax     ; HistoryTable is optional
  mov [rsp + 020h], r8      ; OriginalContext = ContextRecord
  mov r8, rcx               ; ExceptionRecord = ExceptionRecord
  mov rcx, rdx              ; TargetFrame = EstablisherFrame
  lea rdx, resume           ; TargetIp = resume
  mov r9, 05678h            ; ReturnValue, just to demonstrate the feature
  call RtlUnwindEx
  int 3 ; should never get here

unwind: ; We are called for unwind as about the last thing RtlUnwindEx
        ; does before restoring context to the Rip we specify.
  mov eax, ExceptionContinueSearch
  add rsp, 038h
  begin_epilogue
  pop r15
  pop r14
  pop r13
  pop r12
  ret

  nested_end handler, _text

  nested_entry entry, _text, handler
  ;int 3
  push_reg r12  ; the last 4 registers are nonvolatile -- easy rule to remember
  push_reg r13
  push_reg r14
  push_reg r15
  alloc_stack 028h  ; room for 5 parameters and aligned
  end_prologue

  ; Cache some values in nonvolatiles -- the point of this exercise.
  mov r12, 0123h
  mov r13, 0234h
  mov r14, 0456h
  mov r15, 0789h

  lea rcx, str1 ; 0
  mov rdx, r12  ; 8
  mov r8,  r13  ; 10
  mov r9,  r14  ; 18
  mov [rsp + 020h], r15
  call printf

  lea rcx, str1
  mov rdx, r12
  mov r8,  r13
  mov r9,  r14
  mov [rsp + 020h], r15
  call printf

; Produce an access violation, which will be caught.
  call qword ptr[0]
  int 3 ; should never get here

; Exception will resume here, because this is hardcoded in the handler.
 resume:
  lea rcx, str1
  mov rdx, r12
  mov r8,  r13
  mov r9,  rax      ; ReturnValue to RtlUnwindEx
  mov [rsp + 020h], r15
  call printf

  lea rcx, str1
  mov rdx, r13
  mov r8,  r12
  mov r9,  r14      ; again with the other nonvolatile
  mov [rsp + 020h], r15
  call printf

  mov ecx, 3
  call exit
  int 3 ; should not get here

  add rsp, 038h
  begin_epilogue
  pop r15
  pop r14
  pop r13
  pop r12
  ret
  nested_end entry, _text

end

Build and run:

ml64 nvlocala.asm /link /entry:entry /subsystem:console .\msvcrt.lib kernel32.lib
nvlocala.exe


 - Jay

No comments:

Post a Comment