.Trust our experience in High-Power Network Tools and Communication Tools.

64-bit _alloca. How to use from FPC and Delphi?

The C/C++ _alloca function allocates size bytes of space from the Stack. The variation of _alloca presented here will, in addition, align data to a requested value, between 16 and 4096 bytes (in powers of 2). While this _alloca can be used with advantage from C/C++, and most other programming languages including assembly language itself,  it was developed with Delphi and Free Pascal Compiler in mind – two compilers that have no similar feature.

Stack reservation-to-commitment mechanism

Each new thread of an application receives a certain amount of contiguous stack space. By default, 1 MB is reserved for the thread, but only the first page (each page being 4096 bytes in size) is initially committed and the next contiguous page is marked as a guard page. When an application reads or writes to the guard page, an exception is triggered causing the OS to commit the guard page and turning the page further down into the new guard page. This is the mechanism to turn reserved into committed stack memory.

When using _alloca to allocate stack memory we need to have in mind this reservation-to-commitment mechanism.

The ASM Code

option casemap:none
IFDEF __JWASM__
; compile with uasm64 -c -win64 -Zp8 -archSSE allocafunc64.asm
option frame:auto ;generate SEH-compatible prologues and epilogues
OPTION STACKBASE:RBP
ELSE
; Microsoft MASM: Compile with ml64" -c -Zp8 allocafunc64.asm
ENDIF

.code

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

; rcx=thesize
; rdx=alignm
; r8=accum - optional
_alloca proc public thesize:dword, alignm:dword, accum : ptr
  mov r9, [rsp] ; return address
  mov ecx, ecx ; zero-extend
  mov edx, edx ; zero-extend
 
  cmp rdx, 16
  jge @F
  mov rdx, 16 ; Minimum alignment to consider in Win 64 is 16 bytes
@@:
  cmp rdx, 4096
  jle @F
  mov rdx, 4096
@@:
  lea rax, [rcx]

  lea r10, [rsp+8]
  sub r10, rax
  neg rdx
  and r10, rdx
 
  xor r11, r11
  lea rax, [rsp+8h]
  sub rax, r10
 
  cmovb r10,r11
  mov r11,qword ptr gs:[10h] ; Register gs points to the TEB in Windows 64-bit.
           ; TEB's StackLimit is in gs:[10h]. See below.
  cmp  r10,r11
  jae @exit
  and r10w,0F000h
@@:
  lea  r11,[r11-1000h]
  mov byte ptr [r11],0
  cmp r10,r11
  jne @B
@exit:
  sub rsp, rax
 
  cmp r8, 0
  jz @F
  add dword ptr [r8], eax
@@:
 
  mov [rsp], r9
  mov rax, rsp
  add rax, 8
  ret
_alloca endp

_dealloca proc public accum : ptr
  mov rdx, [rsp]
  mov r8d, dword ptr [rcx]
  mov dword ptr [rcx], 0
  add rsp, r8
  mov [rsp], rdx
  ret
_dealloca endp

end

TEB/TIB seen with WinDbg, showing StackLimit at offset 0x10.

ntdll!_NT_TIB
 +0x000 ExceptionList : Ptr64 _EXCEPTION_REGISTRATION_RECORD
 +0x008 StackBase : Ptr64 Void
 +0x010 StackLimit : Ptr64 Void
 +0x018 SubSystemTib : Ptr64 Void
 +0x020 FiberData : Ptr64 Void</span>

 

Download Sources and FPC Demo

Download Sources and Delphi Demo