SPRAB89 Application note

SPRAB89A September 2011 – March 2014

1 Introduction
1. 1.1 ABIs for the C6000
2. 1.2 Scope
3. 1.3 ABI Variants
4. 1.4 Toolchains and Interoperability
5. 1.5 Libraries
6. 1.6 Types of Object Files
7. 1.7 Segments
8. 1.8 C6000 Architecture Overview
9. 1.9 Reference Documents
10. 1.10 Code Fragment Notation
2 Data Representation
1. 2.1 Basic Types
2. 2.2 Data in Registers
3. 2.3 Data in Memory
4. 2.4 Complex Types
5. 2.5 Structures and Unions
6. 2.6 Arrays
7. 2.7 Bit Fields
  1. 2.7.1 Volatile Bit Fields
8. 2.8 Enumeration Types
3 Calling Conventions
1. 3.1 Call and Return
2. 3.2 Register Conventions
3. 3.3 Argument Passing
4. 3.4 Return Values
5. 3.5 Structures and Unions Passed and Returned by Reference
6. 3.6 Conventions for Compiler Helper Functions
7. 3.7 Scratch Registers for Inter-Section Calls
8. 3.8 Setting Up DP
4 Data Allocation and Addressing
1. 4.1 Data Sections and Segments
2. 4.2 Allocation and Addressing of Static Data
3. 4.3 Automatic Variables
4. 4.4 Frame Layout
5. 4.5 Heap-Allocated Objects
5 Code Allocation and Addressing
1. 5.1 Computing the Address of a Code Label
2. 5.2 Branching
3. 5.3 Calls
4. 5.4 Addressing Compact Instructions
6 Addressing Model for Dynamic Linking
1. 6.1 Terms and Concepts
2. 6.2 Overview of Dynamic Linking Mechanisms
3. 6.3 DSOs and DLLs
4. 6.4 Preemption
5. 6.5 PLT Entries
6. 6.6 The Global Offset Table
  1. 6.6.1 GOT-Based Reference Using Near DP-Relative Addressing
  2. 6.6.2 GOT-Based Reference Using Far DP-Relative Addressing
7. 6.7 The DSBT Model
8. 6.8 Performance Implications of Dynamic Linking
7 Thread-Local Storage Allocation and Addressing
1. 7.1 About Multi-Threading and Thread-Local Storage
2. 7.2 Terms and Concepts
3. 7.3 User Interface
4. 7.4 ELF Object File Representation
5. 7.5 TLS Access Models
6. 7.6 Thread-Local Symbol Resolution and Weak References
8 Helper Function API
1. 8.1 Floating-Point Behavior
2. 8.2 C Helper Function API
3. 8.3 Special Register Conventions for Helper Functions
4. 8.4 Helper Functions for Complex Types
5. 8.5 Floating-Point Helper Functions for C99
9 Standard C Library API
1. 9.1 Reserved Symbols
2. 9.2 <assert.h> Implementation
3. 9.3 <complex.h> Implementation
4. 9.4 <ctype.h> Implementation
5. 9.5 <errno.h> Implementation
6. 9.6 <float.h> Implementation
7. 9.7 <inttypes.h> Implementation
8. 9.8 <iso646.h> Implementation
9. 9.9 <limits.h> Implementation
10. 9.10 <locale.h> Implementation
11. 9.11 <math.h> Implementation
12. 9.12 <setjmp.h> Implementation
13. 9.13 <signal.h> Implementation
14. 9.14 <stdarg.h> Implementation
15. 9.15 <stdbool.h> Implementation
16. 9.16 <stddef.h> Implementation
17. 9.17 <stdint.h> Implementation
18. 9.18 <stdio.h> Implementation
19. 9.19 <stdlib.h> Implementation
20. 9.20 <string.h> Implementation
21. 9.21 <tgmath.h> Implementation
22. 9.22 <time.h> Implementation
23. 9.23 <wchar.h> Implementation
24. 9.24 <wctype.h> Implementation
10C++ ABI
1. 10.1 Limits (GC++ABI 1.2)
2. 10.2 Export Template (GC++ABI 1.4.2)
3. 10.3 Data Layout (GC++ABI Chapter 2)
4. 10.4 Initialization Guard Variables (GC++ABI 2.8)
5. 10.5 Constructor Return Value (GC++ABI 3.1.5)
6. 10.6 One-Time Construction API (GC++ABI 3.3.2)
7. 10.7 Controlling Object Construction Order (GC++ ABI 3.3.4)
8. 10.8 Demangler API (GC++ABI 3.4)
9. 10.9 Static Data (GC++ ABI 5.2.2)
10. 10.10 Virtual Tables and the Key function (GC++ABI 5.2.3)
11. 10.11 Unwind Table Location (GC++ABI 5.3)
11Exception Handling
1. 11.1 Overview
2. 11.2 PREL31 Encoding
3. 11.3 The Exception Index Table (EXIDX)
4. 11.4 The Exception Handling Instruction Table (EXTAB)
5. 11.5 Unwinding Instructions
6. 11.6 Descriptors
7. 11.7 Special Sections
8. 11.8 Interaction With Non-C++ Code
  1. 11.8.1 Automatic EXIDX Entry Generation
  2. 11.8.2 Hand-Coded Assembly Functions
9. 11.9 Interaction With System Features
10. 11.10 Assembly Language Operators in the TI Toolchain
12DWARF
1. 12.1 DWARF Register Names
2. 12.2 Call Frame Information
3. 12.3 Vendor Names
4. 12.4 Vendor Extensions
13ELF Object Files (Processor Supplement)
1. 13.1 Registered Vendor Names
2. 13.2 ELF Header
3. 13.3 Sections
4. 13.4 Symbol Table
5. 13.5 Relocation
14ELF Program Loading and Dynamic Linking (Processor Supplement)
1. 14.1 Program Header
2. 14.2 Program Loading
3. 14.3 Dynamic Linking
4. 14.4 Bare-Metal Dynamic Linking Model
15Linux ABI
1. 15.1 File Types
2. 15.2 ELF Identification
3. 15.3 Program Headers and Segments
4. 15.4 Data Addressing
  1. 15.4.1 Data Segment Base Table (DSBT)
  2. 15.4.2 Global Offset Table (GOT)
5. 15.5 Code Addressing
6. 15.6 Lazy Binding
7. 15.7 Visibility
8. 15.8 Preemption
9. 15.9 Import-as-Own Preemption
10. 15.10 Program Loading
11. 15.11 Dynamic Information
12. 15.12 Initialization and Termination Functions
13. 15.13 Summary of the Linux Model
16Symbol Versioning
1. 16.1 ELF Symbol Versioning Overview
2. 16.2 Version Section Identification
17Build Attributes
1. 17.1 C6000 ABI Build Attribute Subsection
2. 17.2 C6000 Build Attribute Tags
18Copy Tables and Variable Initialization
1. 18.1 Copy Table Format
2. 18.2 Compressed Data Formats
  1. 18.2.1 RLE
  2. 18.2.2 LZSS Format
3. 18.3 Variable Initialization
19Extended Program Header Attributes
1. 19.1 Encoding
2. 19.2 Attribute Tag Definitions
3. 19.3 Extended Program Header Attributes Section Format
20Revision History

8.2 C Helper Function API

The compiler generates calls to helper functions to perform operations that need to be supported by the compiler, but are not supported directly by the architecture, such as floating-point operations on devices that lack dedicated hardware. These helper functions must be implemented in the RTS library of any toolchain that conforms to the ABI.

Helper functions are named using the prefix _ _C6000_. Any identifier with this prefix is reserved for the ABI. In addition, the _ _tls_get_addr() helper function is needed to support dynamic linking access to thread-local storage.

The helper functions adhere to the standard calling conventions, except as indicated in Section 8.4.

The following tables specify the helper functions using C notation and syntax. The types in the table correspond to the generic data types specified in Section 2.2.

The functions in Table 8-1 convert floating-point values to integer values, in accordance with C's conversion rules and the floating-point behavior specified by Section 8.2.

Table 8-1 C6000 Floating Point to Integer Conversions

Signature	Description
int32 _ _C6000_fixdi(float64 x);	Convert float64 to int32
int40 _ _C6000_fixdli(float64 x);	Convert float64 to int40
int64 _ _C6000_fixdlli(float64 x);	Convert float64 to int64
uint32 _ _C6000_fixdu(float64 x);	Convert float64 to uint32
uint40 _ _C6000_fixdul(float64 x);	Convert float64 to uint40
uint64 _ _C6000_fixdull(float64 x);	Convert float64 to uint64
int32 _ _C6000_fixfi(float32 x);	Convert float32 to int32
int40 _ _C6000_fixfli(float32 x);	Convert float32 to int40
int64 _ _C6000_fixflli(float32 x);	Convert float32 to int64
uint32 _ _C6000_fixfu(float32 x);	Convert float32 to uint32
uint40 _ _C6000_fixful(float32 x);	Convert single-precision float to uint40
uint64 _ _C6000_fixfull(float32 x);	Convert single-precision float to uint64

The functions in Table 8-2 convert integer values to floating-point values, in accordance with C's conversion rules and the floating-point behavior specified by Section 8.2.

Table 8-2 C6000 Integer to Floating Point Conversions

Signature	Description
float64 _ _C6000_fltid(int32 x);	Convert int32 to double-precision float
float64 _ _C6000_fltlid(int40 x);	Convert int40 to double-precision float
float64 _ _C6000_fltllid(int64 x);	Convert int64 to double-precision float
float64 _ _C6000_fltud(uint32 x);	Convert uint32 to double-precision float
float64 _ _C6000_fltuld(uint40 x);	Convert uint40 to double-precision float
float64 _ _C6000_fltulld(uint64 x);	Convert uint64 to double-precision float
float32 _ _C6000_fltif(int32 x);	Convert int32 to single-precision float
float32 _ _C6000_fltlif(int40 x);	Convert int40 to single-precision float
float32 _ _C6000_fltllif(int64 x);	Convert int64 to single-precision float
float32 _ _C6000_fltuf(uint32 x);	Convert uint32 to single-precision float
float32 _ _C6000_fltulf(uint40 x);	Convert uint40 to single-precision float
float32 _ _C6000_fltullf(uint64 x);	Convert uint64 to single-precision float

The functions in Table 8-3 convert floating-point values from one format to another, in accordance with C's conversion rules and the floating-point behavior specified by Section 8.2.

Table 8-3 C6000 Floating-Point Format Conversions

Signature	Description
float32 _ _C6000_cvtdf(float64 x);	Convert double-precision float to single-precision
float64 _ _C6000_cvtfd(float32 x);	Convert single-precision float to double-precision

The functions in Table 8-4 perform floating-point arithmetic, in accordance with C semantics and the floating-point behavior specified by Section 8.2.

Table 8-4 C6000 Floating-Point Arithmetic

Signature	Description
float64 _ _C6000_absd(float64 x);	Return absolute value of double-precision float
float32 _ _C6000_absf(float32 x);	Return absolute value of single-precision float
float64 _ _C6000_addd(float64 x, float64 y);	Add two double-precision floats (x+y)
float32 _ _C6000_addf(float32 x, float32 y);	Add two single-precision floats (x+y)
float64 _ _C6000_divd(float64 x, float64 y);	Divide two double-precision floats (x/y)
float32 _ _C6000_divf(float32 x, float32 y);	Divide two single-precision floats (x/y)
float64 _ _C6000_mpyd(float64 x, float64 y);	Multiply two double-precision floats (x*y)
float32 _ _C6000_mpyf(float32 x, float32 y);	Multiply two single-precision floats (x*y)
float64 _ _C6000_negd(float64 x);	Return negated double-precision float (-x)
float32 _ _C6000_negf(float32 x);	Return negated single-precision float (-x)
float64 _ _C6000_subd(float64 x, float64 y);	Subtract two double-precision floats (x-y)
float32 _ _C6000_subf(float32 x, float32 y);	Subtract two single-precision floats (x-y)
int64 _ _C6000_trunc(float64 x);	Truncate double-precision float toward zero
int32 _ _C6000_truncf(float32 x);	Truncate single-precision float toward zero

The functions in Table 8-5 perform floating-point comparisons in accordance with C semantics and the floating-point behavior specified by Section 8.2.

The _ _C6000_cmp* functions return an integer less than 0 if x is less than y, 0 if the values are equal, or an integer greater than 0 of x is greater than y. If either operand is NaN, the result is undefined.

The explicit comparison functions operate correctly with unordered (NaN) operands. That is, they return non-zero if the comparison is true even if one of the operands is NaN, or 0 otherwise.

Table 8-5 Floating-Point Comparisons

Signature	Description
int32 _ _C6000_cmpd(float64 x, float64 y);	Double-precision comparison
int32 _ _C6000_cmpf(float32 x, float32 y);	Single-precision comparison
int32 _ _C6000_unordd(float64 x, float64 y);	Double-precision check for unordered operands
int32 _ _C6000_unordf(float32 x, float32 y);	Single-precision check for unordered operands
int32 _ _C6000_eqd(float64 x, float64 y);	Double-precision comparison: x == y
int32 _ _C6000_eqf(float32 x, float32 y);	Single-precision comparison: x == y
int32 _ _C6000_neqd(float64 x, float64 y);	Double-precision comparison: x != y
int32 _ _C6000_neqf(float32 x, float32 y);	Single-precision comparison: x != y
int32 _ _C6000_ltd(float64 x, float64 y);	Double-precision comparison: x < y
int32 _ _C6000_ltf(float32 x, float32 y);	Single-precision comparison: x < y
int32 _ _C6000_gtd(float64 x, float64 y);	Double-precision comparison: x > y
int32 _ _C6000_gtf(float32 x, float32 y);	Single-precision comparison: x > y
int32 _ _C6000_led(float64 x, float64 y);	Double-precision comparison: x <= y
int32 _ _C6000_lef(float32 x, float32 y);	Single-precision comparison: x <= y
int32 _ _C6000_ged(float64 x, float64 y);	Double-precision comparison: x >= y
int32 _ _C6000_gef(float32 x, float32 y);	Single-precision comparison: x >= y

The integer divide and remainder functions in Table 8-6 operate according to C semantics.

The _ _C6000_divremi and _ _C6000_divremu functions compute both a quotient (x/y) and remainder (x%y). The quotient is returned in A4 and the remainder in A5. The _ _C6000_divremll and _ _C6000_divremull functions compute the quotient (x/y) and remainder (x%y) of 64-bit integers. The quotient is returned in A5:A4 and the remainder in B5:B4.

Table 8-6 C6000 Integer Divide and Remainder

Signature	Description
int32 _ _C6000_divi(int32 x, int32 y);	32-bit signed integer division (x/y)
int40 _ _C6000_divli(int40 x, int40 y);	40-bit signed integer division (x/y)
int64 _ _C6000_divlli(int64 x, int64 y);	64-bit signed integer division (x/y)
uint32 _ _C6000_divu(uint32 x, uint32 y);	32-bit unsigned integer division (x/y)
uint40 _ _C6000_divlu(uint40 x, uint40 y);	40-bit unsigned integer division (x/y)
uint64 _ _C6000_divllu(uint64 x, uint64 y);	64-bit unsigned integer division (x/y)
int32 _ _C6000_remi(int32 x, int32 y);	32-bit signed integer modulo (x%y)
int40 _ _C6000_remli(int40 x, int40 y);	40-bit signed integer modulo (x%y)
int64 _ _C6000_remlli(int64x. int64 y);	64-bit signed integer modulo (x%y)
uint32 _ _C6000_remu(uint32 x, uint32 y);	32-bit unsigned integer modulo (x%y)
uint40 _ _C6000_remul(uint40, uint40);	40-bit unsigned integer modulo (x%y)
uint64 _ _C6000_remull(uint64, uint64);	64-bit unsigned integer modulo (x%y)
_ _C6000_divremi(int32 x, int32 y);	32-bit combined divide and modulo
_ _C6000_divremu(uint32 x, uint32 y);	32-bit unsigned combined divide and modulo
_ _C6000_divremull(uint64 x, uint64 y);	64-bit unsigned combined divide and modulo

The wide integer arithmetic functions in Table 8-7 operate according to C semantics.

Table 8-7 C6000 Wide Integer Arithmetic

Signature	Description
int64 _ _C6000_negll(int64 x);	64-bit integer negate
uint64 _ _C6000_mpyll(uint64 x, uint64 y);	64x64 bit multiply
int64 _ _C6000_mpyiill(int32 x, int32 y);	32x32 bit multiply
uint64 _ _C6000_mpyuiill(uint32 x, uint32 y);	32x32 bit unsigned multiply
int64 _ _C6000_llshr(int64 x, uint32 y);	64-bit signed right shift (x>>y)
uint64 _ _C6000_llshru(uint64 x, uint32 y);	64-bit unsigned right shift (x>>y)
uint64 _ _C6000_llshl(uint64 x, uint32 y);	64-bit left shift (x<<y)

The miscellaneous helper functions in Table 8-8 are described in the sections that follow.

Table 8-8 C6000 Miscellaneous Helper Functions

Signature	Description
void _ _C6000_strasgi(int32 dst, const int32 src, uint32 cnt);	Interrupt safe block copy; cnt >= 28
void _ _C6000_strasgi_64plus(int32, const inst32, uint32) ;	Interrupt safe block copy; cnt >= 28
void _ _C6000_abort_msg(const char *string);	Report failed assertion
void _ _C6000_push_rts(void);	Push all callee-saved registers
void _ _C6000_pop_rts(void);	Pop all callee-saved registers
void _ _C6000_call_stub(void);	Save caller-save registers; call B31
void _ _C6000_weak_return(void);	Resolution target for imported weak calls
void _ _C6000_get_addr(ptrdiff_t TPR_offst);	Get the address of the thread-pointer register (TPR) offset.
void _ _C6000_get_tp(void);	Get the thread pointer value of the current thread.
void * _ _tls_get_addr(struct TLS_descriptor);	Get the address of a thread-local variable.

_ _C6000_strasgi

The function _ _C6000_strasgi is generated by the compiler for efficient out-of-line structure or array copy operations. The cnt argument is the size in bytes, which must be a multiple of 4 greater than or equal to 28 (7 words). It makes the following assumptions:

The src and dst addresses are word-aligned.
The source and destination objects do not overlap.

The 7-word minimum is the threshold that allows a software-pipelined loop to be used on C64x+. For smaller objects, the compiler typically generates an inline sequence of load/store instructions. _ _C6000_strasgi does not disable interrupts and can be safely interrupted.

The function _ _C6000_strasgi_64plus is a version of _ _C6000_strasgi optimized for C64x+ architectures.

_ _C6000_abort_msg

The function _ _C6000_abort_msg is generated to print a diagnostic message when a run-time assertion (for example, the C assert macro) fails. It must not return. That is, it must call abort or terminate the program by other means.

_ _C6000_push_rts and _ _C6000_pop_rts

The function _ _c6x_push_rts is used on C64x+ architectures when optimizing for code size. Many functions save and restore most or all of the callee-saved registers. To avoid duplicating the save code in the prolog and restore code in the epilog of each such function, the compiler can employ this library function instead. The function pushes all 13 callee-saved registers on the stack, decrementing SP by 56 bytes, according to the protocol in Section 4.5.4.

The function _ _c6x_push_rts is implemented as shown:

   __c6xabi_push_rts:
           STW       B14, *B15--[2]
           STDW  A15:A14, *B15--
           STDW  B13:B12, *B15--
           STDW  A13:A12, *B15--
           STDW  B11:B10, *B15--
           STDW  A11:A10, *B15--
           STDW  B3:B2,   *B15--
           B     A3

(This is a serial, unscheduled representation. Refer to the source code in the TI run-time library for the actual implementation.)

The function _ _C6000_pop_rts restores the callee-saved registers as pushed by _ _C6000_push_rts and increments (pops) the stack by 56 bytes.

_ _C6000_call_stub

The function _ _C6000_call_stub is also used to help optimize c64x+ functions for code size. Many call sites have several caller-save registers that are live across the call. These registers are not preserved by the call and therefore must be saved and restored by the caller. The compiler can route the call through _ _C6000_call_stub, which performs the following sequence of operations:

Save selected caller-save registers on the stack
Call the function
Restore the saved registers
Return

In this way the selected registers are preserved across the call without the caller having to save and restore them. The registers preserved by _ _C6000_call_stub are: A0, A1, A2, A6, A7, B0, B1, B2, B4, B5, B6, B7.

The caller invokes _ _C6000_call_stub by placing the address of the function to be called in B31, then branching to _ _C6000_call_stub. (The return address is in B3 as usual.)

The function _ _C6000_call_stub is implemented as shown:

   __c6xabi_call_stub:
           STW    A2, *B15--[2]
           STDW   A7:A6, *B15--
           STDW   A1:A0, *B15--
           STDW   B7:B6, *B15--
           STDW   B5:B4, *B15--
           STDW   B1:B0, *B15--
           STDW   B3:B2, *B15--
           ADDKPC __STUB_RET, B3, 0
           CALL   B31
   __STUB_RET:
           LDDW   *++B15, B3:B2
           LDDW   *++B15, B1:B0
           LDDW   *++B15, B5:B4
           LDDW   *++B15, B7:B6
           LDDW   *++B15, A1:A0
           LDDW   *++B15, A7:A6
           LDW    *++B15[2], A2
           B      B3

(This is a serial, unscheduled representation. Refer to the source code in the TI run-time library for the actual implementation.)

Since _ _C6000_call_stub uses non-standard conventions, it cannot be called via a PLT entry. Its definition in the library must be marked as STV_INTERNAL or STV_HIDDEN to prevent it from being importable from a shared library.

_ _C6000_weak_return

The function _ _C6000_weak_return is a function that simply returns. The linker shall include it in a dynamic executable or shared object that contains any unresolved calls to imported weak symbols. The dynamic linker can use it to resolve those calls if they remain unresolved at dynamic load time.

_ _C6000_get_addr

The function _ _C6000_get_addr accepts 32-bit TPR offset and returns the address of the thread-local. A special value of -1 is used to indicate a weak undefined reference and a zero is returned in this case. This function is used when compiling for the Static Executable and Bare Metal Dynamic TLS access models. See Chapter 7 for details about thread-local storage.

_ _C6000_get_tp

The function _ _C6000_get_tp returns the thread pointer value for the current thread. This function does not modify any register other than the return register A4. This function can be called via PLT and hence the caller should assume B30 and B31 are modified by the call to this function. See Chapter 7 and Section 14.2.4 for details about thread-local storage.

_ _tls_get_addr

The function _ _tls_get_addr returns the address of a thread-local variable. See Section 7.6.1.1 for details about this function and the TLS_descriptor structure passed to it to specify the offset of a thread-local variable. This function is used when compiling for all access models other than the Static Executable and Bare Metal Dynamic TLS access models. See Chapter 7 for details about thread-local storage.