64-bit _alloca. How to use from FPC and Delphi?
The C/C++ _alloca function allocates size bytes of space from the Stack. The variation of _alloca presented here will, in addition, align data to a requested value, between 16 and 4096 bytes (in powers of 2). While this _alloca can be used with advantage from C/C++, and most other programming languages including assembly language itself, it was developed with Delphi and Free Pascal Compiler in mind – two compilers that have no similar feature.
Stack reservation-to-commitment mechanism
Each new thread of an application receives a certain amount of contiguous stack space. By default, 1 MB is reserved for the thread, but only the first page (each page being 4096 bytes in size) is initially committed and the next contiguous page is marked as a guard page. When an application reads or writes to the guard page, an exception is triggered causing the OS to commit the guard page and turning the page further down into the new guard page. This is the mechanism to turn reserved into committed stack memory.
When using _alloca to allocate stack memory we need to have in mind this reservation-to-commitment mechanism.
The ASM Code
; compile with uasm64 -c -win64 -Zp8 -archSSE allocafunc64.asm
option frame:auto ;generate SEH-compatible prologues and epilogues
; Microsoft MASM: Compile with ml64" -c -Zp8 allocafunc64.asm
; r8=accum - optional
_alloca proc public thesize:dword, alignm:dword, accum : ptr
mov r9, [rsp] ; return address
mov ecx, ecx ; zero-extend
mov edx, edx ; zero-extend
cmp rdx, 16
mov rdx, 16 ; Minimum alignment to consider in Win 64 is 16 bytes
cmp rdx, 4096
mov rdx, 4096
lea rax, [rcx]
lea r10, [rsp+8]
sub r10, rax
and r10, rdx
xor r11, r11
lea rax, [rsp+8h]
sub rax, r10
mov r11,qword ptr gs:[10h] ; Register gs points to the TEB in Windows 64-bit.
; TEB's StackLimit is in gs:[10h]. See below.
mov byte ptr [r11],0
sub rsp, rax
cmp r8, 0
add dword ptr [r8], eax
mov [rsp], r9
mov rax, rsp
add rax, 8
_dealloca proc public accum : ptr
mov rdx, [rsp]
mov r8d, dword ptr [rcx]
mov dword ptr [rcx], 0
add rsp, r8
mov [rsp], rdx
TEB/TIB seen with WinDbg, showing StackLimit at offset 0x10.
+0x000 ExceptionList : Ptr64 _EXCEPTION_REGISTRATION_RECORD
+0x008 StackBase : Ptr64 Void
+0x010 StackLimit : Ptr64 Void
+0x018 SubSystemTib : Ptr64 Void
+0x020 FiberData : Ptr64 Void</span>