The ARP32 CPU has a 32-bit stack pointer register (SP) and its content is always assumed to be a byte address. Any address computation using the SP value considers content of the SP as a byte address while computing an effective address of a byte/halfword/word access.
The ARP32 CPU is a descending stack machine. The ARP32 CPU programming model requires that the programmer and/or the compiler/toolchain maintain the following conventions:
- Stack pointer (SP) is initialized to a high address, it reduces to a lower address as it proceeds through function calls during program execution.
- Stack pointer (SP) must be always aligned to a 4-byte word address. It is software responsibility to keep the SP always aligned to 4-byte word address when allocating/de-allocating stack – there is no hardware support to do this alignment automatically. Since the ARP32 CPU does not support unaligned access, a stack pointer value not aligned to 4-byte boundary may result in access to unaligned addresses and thus produce unexpected result.
- In all cases where the CPU itself modifies the SP (call, return, LDRF/STRF), it always increments/decrements it by 4 (word aligned).
- Stack pointer (SP) always points to the next free location where an entry is stacked. For pushing anything onto stack, the CPU writes to the location the SP is currently pointing to and then post-decrements the SP. Similarly, while popping anything from the stack, the CPU pre-decrements the SP and then reads from the location (pre-decremented) the SP points to. For example:
- Every CALL saves the return address to the location the SP is currently pointing to and then post-decrements the SP.
- Every RET pre-decrements the SP and then loads the return address from the location updated SP points to.
- To allocate stack in a function (in the function prolog), reduce the SP by an amount equal to local stack frame size.
- Local variables, input arguments to functions and spilled variables all are accessed via the SP relative positive offset load/store instructions.
- To de-allocate stack in a function (in the function epilog), increase the SP by an amount equal to local stack frame size.
The ARP32 core does not save any context on stack during interrupt processing. The same stack layout and convention is applied to the interrupt handler function/procedure – except for additional register save requirements.