SPRUI04F july   2015  – april 2023

 

  1.   Read This First
    1.     About This Manual
    2.     Notational Conventions
    3.     Related Documentation
    4.     Related Documentation From Texas Instruments
    5.     Trademarks
  2. Introduction to the Software Development Tools
    1. 1.1 Software Development Tools Overview
    2. 1.2 Compiler Interface
    3. 1.3 ANSI/ISO Standard
    4. 1.4 Output Files
    5. 1.5 Utilities
  3. Getting Started with the Code Generation Tools
    1. 2.1 How Code Composer Studio Projects Use the Compiler
    2. 2.2 Compiling from the Command Line
  4. Using the C/C++ Compiler
    1. 3.1  About the Compiler
    2. 3.2  Invoking the C/C++ Compiler
    3. 3.3  Changing the Compiler's Behavior with Options
      1. 3.3.1  Linker Options
      2. 3.3.2  Frequently Used Options
      3. 3.3.3  Miscellaneous Useful Options
      4. 3.3.4  Run-Time Model Options
      5. 3.3.5  Selecting Target CPU Version (--silicon_version Option)
      6. 3.3.6  Symbolic Debugging and Profiling Options
      7. 3.3.7  Specifying Filenames
      8. 3.3.8  Changing How the Compiler Interprets Filenames
      9. 3.3.9  Changing How the Compiler Processes C Files
      10. 3.3.10 Changing How the Compiler Interprets and Names Extensions
      11. 3.3.11 Specifying Directories
      12. 3.3.12 Assembler Options
    4. 3.4  Controlling the Compiler Through Environment Variables
      1. 3.4.1 Setting Default Compiler Options (C6X_C_OPTION)
      2. 3.4.2 Naming One or More Alternate Directories (C6X_C_DIR)
    5. 3.5  Controlling the Preprocessor
      1. 3.5.1  Predefined Macro Names
      2. 3.5.2  The Search Path for #include Files
        1. 3.5.2.1 Adding a Directory to the #include File Search Path (--include_path Option)
      3. 3.5.3  Support for the #warning and #warn Directives
      4. 3.5.4  Generating a Preprocessed Listing File (--preproc_only Option)
      5. 3.5.5  Continuing Compilation After Preprocessing (--preproc_with_compile Option)
      6. 3.5.6  Generating a Preprocessed Listing File with Comments (--preproc_with_comment Option)
      7. 3.5.7  Generating Preprocessed Listing with Line-Control Details (--preproc_with_line Option)
      8. 3.5.8  Generating Preprocessed Output for a Make Utility (--preproc_dependency Option)
      9. 3.5.9  Generating a List of Files Included with #include (--preproc_includes Option)
      10. 3.5.10 Generating a List of Macros in a File (--preproc_macros Option)
    6. 3.6  Passing Arguments to main()
    7. 3.7  Understanding Diagnostic Messages
      1. 3.7.1 Controlling Diagnostic Messages
      2. 3.7.2 How You Can Use Diagnostic Suppression Options
    8. 3.8  Other Messages
    9. 3.9  Generating Cross-Reference Listing Information (--gen_cross_reference_listing Option)
    10. 3.10 Generating a Raw Listing File (--gen_preprocessor_listing Option)
    11. 3.11 Using Inline Function Expansion
      1. 3.11.1 Inlining Intrinsic Operators
      2. 3.11.2 Inlining Restrictions
      3. 3.11.3 Unguarded Definition-Controlled Inlining
        1. 3.11.3.1 Using the Inline Keyword
      4. 3.11.4 Guarded Inlining and the _INLINE Preprocessor Symbol
        1. 3.11.4.1 Header File string.h
        2. 3.11.4.2 Library Definition File
    12. 3.12 Interrupt Flexibility Options (--interrupt_threshold Option)
    13. 3.13 Using Interlist
    14. 3.14 Generating and Using Performance Advice
    15. 3.15 About the Application Binary Interface
    16. 3.16 Enabling Entry Hook and Exit Hook Functions
  5. Optimizing Your Code
    1. 4.1  Invoking Optimization
    2. 4.2  Controlling Code Size Versus Speed
    3. 4.3  Performing File-Level Optimization (--opt_level=3 option)
      1. 4.3.1 Creating an Optimization Information File (--gen_opt_info Option)
    4. 4.4  Program-Level Optimization (--program_level_compile and --opt_level=3 options)
      1. 4.4.1 Controlling Program-Level Optimization (--call_assumptions Option)
      2. 4.4.2 Optimization Considerations When Mixing C/C++ and Assembly
    5. 4.5  Automatic Inline Expansion (--auto_inline Option)
    6. 4.6  Optimizing Software Pipelining
      1. 4.6.1 Turn Off Software Pipelining (--disable_software_pipeline Option)
      2. 4.6.2 Software Pipelining Information
        1. 4.6.2.1 Software Pipelining Information
        2. 4.6.2.2 Software Pipelining Information Terms
        3. 4.6.2.3 Loop Disqualified for Software Pipelining Messages
        4. 4.6.2.4 Pipeline Failure Messages
        5. 4.6.2.5 Register Usage Table Generated by the --debug_software_pipeline Option
      3. 4.6.3 Collapsing Prologs and Epilogs for Improved Performance and Code Size
        1. 4.6.3.1 Speculative Execution
        2. 4.6.3.2 Selecting the Best Threshold Value
    7. 4.7  Redundant Loops
    8. 4.8  Utilizing the Loop Buffer Using SPLOOP
    9. 4.9  Reducing Code Size (--opt_for_space (or -ms) Option)
    10. 4.10 Using Feedback Directed Optimization
      1. 4.10.1 Feedback Directed Optimization
        1. 4.10.1.1 Phase 1 -- Collect Program Profile Information
        2. 4.10.1.2 Phase 2 -- Use Application Profile Information for Optimization
        3. 4.10.1.3 Generating and Using Profile Information
        4. 4.10.1.4 Example Use of Feedback Directed Optimization
        5. 4.10.1.5 The .ppdata Section
        6. 4.10.1.6 Feedback Directed Optimization and Code Size Tune
        7. 4.10.1.7 Instrumented Program Execution Overhead
        8. 4.10.1.8 Invalid Profile Data
      2. 4.10.2 Profile Data Decoder
      3. 4.10.3 Feedback Directed Optimization API
      4. 4.10.4 Feedback Directed Optimization Summary
    11. 4.11 Using Profile Information to Get Better Program Cache Layout and Analyze Code Coverage
      1. 4.11.1 Background and Motivation
      2. 4.11.2 Code Coverage
        1. 4.11.2.1 Phase1 -- Collect Program Profile Information
        2. 4.11.2.2 Phase 2 -- Generate Code Coverage Reports
      3. 4.11.3 What Performance Improvements Can You Expect to See?
        1. 4.11.3.1 Evaluating L1P Cache Performance
      4. 4.11.4 Program Cache Layout Related Features and Capabilities
        1. 4.11.4.1 Path Profiler
        2. 4.11.4.2 Analysis Options
        3. 4.11.4.3 Environment Variables
        4. 4.11.4.4 Program Cache Layout Tool, clt6x
        5. 4.11.4.5 Linker
        6. 4.11.4.6 Linker Command File Operator unordered()
      5. 4.11.5 Program Instruction Cache Layout Development Flow
        1. 4.11.5.1 Gather Dynamic Profile Information
        2. 4.11.5.2 Generate Preferred Function Order from Dynamic Profile Information
        3. 4.11.5.3 Utilize Preferred Function Order in Re-Build of Application
      6. 4.11.6 Comma-Separated Values (CSV) Files with Weighted Call Graph (WCG) Information
      7. 4.11.7 Linker Command File Operator - unordered()
        1. 4.11.7.1 Output Section for unordered() Operator
        2.       124
        3. 4.11.7.2 Generated Linker Map File for
        4. 4.11.7.3 About Dot (.) Expressions in the Presence of unordered()
          1. 4.11.7.3.1 Respecting Position of a . Expression
          2.        128
        5. 4.11.7.4 GROUPs and UNIONs
          1. 4.11.7.4.1 Applying unordered() to GROUPs
        6.       131
      8. 4.11.8 Things to be Aware of
    12. 4.12 Indicating Whether Certain Aliasing Techniques Are Used
      1. 4.12.1 Use the --aliased_variables Option When Certain Aliases are Used
      2. 4.12.2 Use the --no_bad_aliases Option to Indicate That These Techniques Are Not Used
      3. 4.12.3 Using the --no_bad_aliases Option With the Assembly Optimizer
    13. 4.13 Prevent Reordering of Associative Floating-Point Operations
    14. 4.14 Use Caution With asm Statements in Optimized Code
    15. 4.15 Using Performance Advice to Optimize Your Code
      1. 4.15.1  Advice #27000
      2. 4.15.2  Advice #27001 Increase Optimization Level
      3. 4.15.3  Advice #27002 Do not turn off software pipelining
      4. 4.15.4  Advice #27003 Avoid compiling with debug options
      5. 4.15.5  Advice #27004 No Performance Advice generated
      6. 4.15.6  Advice #30000 Prevent Loop Disqualification due to call
      7. 4.15.7  Advice #30001 Prevent Loop Disqualification due to rts-call
      8. 4.15.8  Advice #30002 Prevent Loop Disqualification due to asm statement
      9. 4.15.9  Advice #30003 Prevent Loop Disqualification due to complex condition
      10. 4.15.10 Advice #30004 Prevent Loop Disqualification due to switch statement
      11. 4.15.11 Advice #30005 Prevent Loop Disqualification due to arithmetic operation
      12. 4.15.12 Advice #30006 Prevent Loop Disqualification due to call(2)
      13. 4.15.13 Advice #30007 Prevent Loop Disqualification due to rts-call(2)
      14. 4.15.14 Advice #30008 Improve Loop; Qualify with restrict
      15. 4.15.15 Advice #30009 Improve Loop; Add MUST_ITERATE pragma
      16. 4.15.16 Advice #30010 Improve Loop; Add MUST_ITERATE pragma(2)
      17. 4.15.17 Advice #30011 Improve Loop; Add _nasssert()
    16. 4.16 Using the Interlist Feature With Optimization
    17. 4.17 Debugging and Profiling Optimized Code
      1. 4.17.1 Profiling Optimized Code
    18. 4.18 What Kind of Optimization Is Being Performed?
      1. 4.18.1  Cost-Based Register Allocation
      2. 4.18.2  Alias Disambiguation
      3. 4.18.3  Branch Optimizations and Control-Flow Simplification
      4. 4.18.4  Data Flow Optimizations
      5. 4.18.5  Expression Simplification
      6. 4.18.6  Inline Expansion of Functions
      7. 4.18.7  Function Symbol Aliasing
      8. 4.18.8  Induction Variables and Strength Reduction
      9. 4.18.9  Loop-Invariant Code Motion
      10. 4.18.10 Loop Rotation
      11. 4.18.11 Vectorization (SIMD)
      12. 4.18.12 Instruction Scheduling
      13. 4.18.13 Register Variables
      14. 4.18.14 Register Tracking/Targeting
      15. 4.18.15 Software Pipelining
  6. Using the Assembly Optimizer
    1. 5.1 Code Development Flow to Increase Performance
    2. 5.2 About the Assembly Optimizer
    3. 5.3 What You Need to Know to Write Linear Assembly
      1. 5.3.1 Linear Assembly Source Statement Format
      2. 5.3.2 Register Specification for Linear Assembly
        1. 5.3.2.1 Linear Assembly Code for Computing a Dot Product
        2.       183
        3. 5.3.2.2 C Code for Computing a Dot Product
        4.       185
        5. 5.3.2.3 Specifying a Register Pair
        6.       187
        7. 5.3.2.4 Specifying a Register Quad (C6600 Only)
        8.       189
      3. 5.3.3 Functional Unit Specification for Linear Assembly
      4. 5.3.4 Using Linear Assembly Source Comments
        1. 5.3.4.1 Lmac Function Code Showing Comments
      5. 5.3.5 Assembly File Retains Your Symbolic Register Names
    4. 5.4 Assembly Optimizer Directives
      1.      .call
      2.      .circ
      3.      .cproc/.endproc
      4.      .map
      5.      .mdep
      6.      .mptr
      7.      .no_mdep
      8.      .pref
      9.      .proc/.endproc
      10.      .reg
      11.      .rega/.regb
      12.      .reserve
      13.      .return
      14.      .trip
      15.      .volatile
      16. 5.4.1 Instructions That Are Not Allowed in Procedures
    5. 5.5 Avoiding Memory Bank Conflicts With the Assembly Optimizer
      1. 5.5.1 Preventing Memory Bank Conflicts
        1. 5.5.1.1 Load and Store Instructions That Specify Memory Bank Information
      2. 5.5.2 A Dot Product Example That Avoids Memory Bank Conflicts
        1. 5.5.2.1 C Code for Dot Product
        2. 5.5.2.2 Linear Assembly for Dot Product
        3. 5.5.2.3 Dot Product Software-Pipelined Kernel
        4.       218
        5. 5.5.2.4 Dot Product From Unrolled to Prevent Memory Bank Conflicts
        6.       220
        7. 5.5.2.5 Unrolled Dot Product Kernel From
        8.       222
      3. 5.5.3 Memory Bank Conflicts for Indexed Pointers
        1. 5.5.3.1 Using .mptr for Indexed Pointers
      4. 5.5.4 Memory Bank Conflict Algorithm
    6. 5.6 Memory Alias Disambiguation
      1. 5.6.1 How the Assembly Optimizer Handles Memory References (Default)
      2. 5.6.2 Using the --no_bad_aliases Option to Handle Memory References
      3. 5.6.3 Using the .no_mdep Directive
      4. 5.6.4 Using the .mdep Directive to Identify Specific Memory Dependencies
        1. 5.6.4.1 Annotating a Memory Reference
        2.       232
        3. 5.6.4.2 Software Pipeline Using .mdep ld1, st1
        4.       234
        5. 5.6.4.3 Software Pipeline Using .mdep st1, ld1 and .mdep ld1, st1
        6.       236
      5. 5.6.5 Memory Alias Examples
  7. Linking C/C++ Code
    1. 6.1 Invoking the Linker Through the Compiler (-z Option)
      1. 6.1.1 Invoking the Linker Separately
      2. 6.1.2 Invoking the Linker as Part of the Compile Step
      3. 6.1.3 Disabling the Linker (--compile_only Compiler Option)
    2. 6.2 Linker Code Optimizations
      1. 6.2.1 Conditional Linking
      2. 6.2.2 Generating Function Subsections (--gen_func_subsections Compiler Option)
      3. 6.2.3 Generating Aggregate Data Subsections (--gen_data_subsections Compiler Option)
    3. 6.3 Controlling the Linking Process
      1. 6.3.1 Including the Run-Time-Support Library
        1. 6.3.1.1 Automatic Run-Time-Support Library Selection
          1. 6.3.1.1.1 Using the --issue_remarks Option
        2. 6.3.1.2 Manual Run-Time-Support Library Selection
        3. 6.3.1.3 Library Order for Searching for Symbols
      2. 6.3.2 Run-Time Initialization
      3. 6.3.3 Global Object Constructors
      4. 6.3.4 Specifying the Type of Global Variable Initialization
      5. 6.3.5 Specifying Where to Allocate Sections in Memory
      6. 6.3.6 A Sample Linker Command File
  8. C/C++ Language Implementation
    1. 7.1  Characteristics of TMS320C6000 C
      1. 7.1.1 Implementation-Defined Behavior
    2. 7.2  Characteristics of TMS320C6000 C++
    3. 7.3  Data Types
      1. 7.3.1 Size of Enum Types
      2. 7.3.2 Vector Data Types
    4. 7.4  File Encodings and Character Sets
    5. 7.5  Keywords
      1. 7.5.1 The complex Keyword
      2. 7.5.2 The const Keyword
      3. 7.5.3 The __cregister Keyword
        1. 7.5.3.1 Define and Use Control Registers
      4. 7.5.4 The __interrupt Keyword
      5. 7.5.5 The __near and __far Keywords
        1. 7.5.5.1 Near and Far Data Objects
        2. 7.5.5.2 Near and Far Function Calls
      6. 7.5.6 The restrict Keyword
      7. 7.5.7 The volatile Keyword
    6. 7.6  C++ Exception Handling
    7. 7.7  Register Variables and Parameters
    8. 7.8  The __asm Statement
    9. 7.9  Pragma Directives
      1. 7.9.1  The CALLS Pragma
      2. 7.9.2  The CODE_ALIGN Pragma
      3. 7.9.3  The CODE_SECTION Pragma
      4. 7.9.4  The DATA_ALIGN Pragma
      5. 7.9.5  The DATA_MEM_BANK Pragma
        1. 7.9.5.1 Using the DATA_MEM_BANK Pragma
      6. 7.9.6  The DATA_SECTION Pragma
        1. 7.9.6.1 Using the DATA_SECTION Pragma C Source File
        2. 7.9.6.2 Using the DATA_SECTION Pragma C++ Source File
        3. 7.9.6.3 Using the DATA_SECTION Pragma Assembly Source File
      7. 7.9.7  The Diagnostic Message Pragmas
      8. 7.9.8  The FORCEINLINE Pragma
      9. 7.9.9  The FORCEINLINE_RECURSIVE Pragma
      10. 7.9.10 The FUNC_ALWAYS_INLINE Pragma
      11. 7.9.11 The FUNC_CANNOT_INLINE Pragma
      12. 7.9.12 The FUNC_EXT_CALLED Pragma
      13. 7.9.13 The FUNC_INTERRUPT_THRESHOLD Pragma
      14. 7.9.14 The FUNC_IS_PURE Pragma
      15. 7.9.15 The FUNC_IS_SYSTEM Pragma
      16. 7.9.16 The FUNC_NEVER_RETURNS Pragma
      17. 7.9.17 The FUNC_NO_GLOBAL_ASG Pragma
      18. 7.9.18 The FUNC_NO_IND_ASG Pragma
      19. 7.9.19 The FUNCTION_OPTIONS Pragma
      20. 7.9.20 The INTERRUPT Pragma
      21. 7.9.21 The LOCATION Pragma
      22. 7.9.22 The MUST_ITERATE Pragma
        1. 7.9.22.1 The MUST_ITERATE Pragma Syntax
        2. 7.9.22.2 Using MUST_ITERATE to Expand Compiler Knowledge of Loops
      23. 7.9.23 The NMI_INTERRUPT Pragma
      24. 7.9.24 The NOINIT and PERSISTENT Pragmas
      25. 7.9.25 The NOINLINE Pragma
      26. 7.9.26 The NO_HOOKS Pragma
      27. 7.9.27 The once Pragma
      28. 7.9.28 The pack Pragma
      29. 7.9.29 The PROB_ITERATE Pragma
      30. 7.9.30 The RETAIN Pragma
      31. 7.9.31 The SET_CODE_SECTION and SET_DATA_SECTION Pragmas
      32. 7.9.32 The STRUCT_ALIGN Pragma
      33. 7.9.33 The UNROLL Pragma
    10. 7.10 The _Pragma Operator
    11. 7.11 Application Binary Interface
    12. 7.12 Object File Symbol Naming Conventions (Linknames)
    13. 7.13 Changing the ANSI/ISO C/C++ Language Mode
      1. 7.13.1 C99 Support (--c99)
      2. 7.13.2 C11 Support (--c11)
      3. 7.13.3 Strict ANSI Mode and Relaxed ANSI Mode (--strict_ansi and --relaxed_ansi)
    14. 7.14 GNU and Clang Language Extensions
      1. 7.14.1 Extensions
      2. 7.14.2 Function Attributes
      3. 7.14.3 For Loop Attributes
      4. 7.14.4 Variable Attributes
      5. 7.14.5 Type Attributes
      6. 7.14.6 Built-In Functions
    15. 7.15 Operations and Functions for Vector Data Types
      1. 7.15.1 Vector Literals and Concatenation
      2. 7.15.2 Unary and Binary Operators for Vectors
      3. 7.15.3 Swizzle Operators for Vectors
      4. 7.15.4 Conversion Functions for Vectors
      5. 7.15.5 Re-Interpretation Functions for Vectors
      6. 7.15.6 Using printf() with Vectors
      7. 7.15.7 Built-In Vector Functions
  9. Run-Time Environment
    1. 8.1  Memory Model
      1. 8.1.1 Sections
      2. 8.1.2 C/C++ System Stack
      3. 8.1.3 Dynamic Memory Allocation
      4. 8.1.4 Data Memory Models
        1. 8.1.4.1 Determining the Data Address Model
        2. 8.1.4.2 DP-Relative Vs. Absolute Addressing
        3. 8.1.4.3 Const Objects as Far
      5. 8.1.5 Trampoline Generation for Function Calls
      6. 8.1.6 Position Independent Data
    2. 8.2  Object Representation
      1. 8.2.1 Data Type Storage
        1. 8.2.1.1 char and short Data Types (signed and unsigned)
        2. 8.2.1.2 enum, int, and long Data Types (signed and unsigned)
        3. 8.2.1.3 float Data Type
        4. 8.2.1.4 The __int40_t Data Type (signed and unsigned)
        5. 8.2.1.5 long long Data Types (signed and unsigned)
        6. 8.2.1.6 double and long double Data Types
        7. 8.2.1.7 Pointer to Data Member Types
        8. 8.2.1.8 Pointer to Member Function Types
        9. 8.2.1.9 Structures and Arrays
      2. 8.2.2 Bit Fields
      3. 8.2.3 Character String Constants
      4.      366
    3. 8.3  Register Conventions
    4. 8.4  Function Structure and Calling Conventions
      1. 8.4.1 How a Function Makes a Call
      2. 8.4.2 How a Called Function Responds
      3. 8.4.3 Accessing Arguments and Local Variables
    5. 8.5  Accessing Linker Symbols in C and C++
    6. 8.6  Interfacing C and C++ With Assembly Language
      1. 8.6.1  Using Assembly Language Modules With C/C++ Code
      2. 8.6.2  Accessing Assembly Language Functions From C/C++
        1. 8.6.2.1 Calling an Assembly Language Function From a C/C++ Program
        2. 8.6.2.2 Assembly Language Program Called by
        3.       378
      3. 8.6.3  Accessing Assembly Language Variables From C/C++
        1. 8.6.3.1 Accessing Assembly Language Global Variables
          1. 8.6.3.1.1 Assembly Language Variable Program
          2. 8.6.3.1.2 C Program to Access Assembly Language From
        2.       383
        3. 8.6.3.2 Accessing Assembly Language Constants
          1. 8.6.3.2.1 Accessing an Assembly Language Constant From C
          2. 8.6.3.2.2 Assembly Language Program for
          3.        387
      4. 8.6.4  Sharing C/C++ Header Files With Assembly Source
      5. 8.6.5  Using Inline Assembly Language
      6. 8.6.6  Using Intrinsics to Access Assembly Language Statements
      7. 8.6.7  The __x128_t Container Type
        1. 8.6.7.1 The __x128_t Container Type
        2.       393
      8. 8.6.8  The __float2_t Container Type
      9. 8.6.9  Using Intrinsics for Interrupt Control and Atomic Sections
      10. 8.6.10 Using Unaligned Data and 64-Bit Values
        1. 8.6.10.1 Using the _mem8 Intrinsic
      11. 8.6.11 Using MUST_ITERATE and _nassert to Enable SIMD and Expand Compiler Knowledge of Loops
      12. 8.6.12 Methods to Align Data
        1. 8.6.12.1 Base Address of an Array
        2. 8.6.12.2 Offset from the Base of an Array
        3. 8.6.12.3 Dynamic Memory Allocation
        4. 8.6.12.4 Member of a Structure or Class
          1. 8.6.12.4.1 An Array in a Structure
          2. 8.6.12.4.2 An Array in a Class
          3.        406
      13. 8.6.13 SAT Bit Side Effects
      14. 8.6.14 IRP and AMR Conventions
      15. 8.6.15 Floating Point and Saturation Control Register Side Effects
    7. 8.7  Interrupt Handling
      1. 8.7.1 Saving the SGIE Bit
      2. 8.7.2 Saving Registers During Interrupts
      3. 8.7.3 Using C/C++ Interrupt Routines
      4. 8.7.4 Using Assembly Language Interrupt Routines
    8. 8.8  Run-Time-Support Arithmetic Routines
    9. 8.9  System Initialization
      1. 8.9.1 Boot Hook Functions for System Pre-Initialization
      2. 8.9.2 Automatic Initialization of Variables
        1. 8.9.2.1 Zero Initializing Variables
        2. 8.9.2.2 Direct Initialization
        3. 8.9.2.3 Autoinitialization of Variables at Run Time
        4. 8.9.2.4 Autoinitialization Tables
          1. 8.9.2.4.1 Length Followed by Data Format
          2. 8.9.2.4.2 Zero Initialization Format
          3. 8.9.2.4.3 Run Length Encoded (RLE) Format
          4. 8.9.2.4.4 Lempel-Ziv-Storer-Szymanski Compression (LZSS) Format
          5. 8.9.2.4.5 Sample C Code to Process the C Autoinitialization Table
        5. 8.9.2.5 Initialization of Variables at Load Time
        6. 8.9.2.6 Global Constructors
    10. 8.10 Support for Multi-Threaded Applications
      1. 8.10.1 Compiling with OpenMP
      2. 8.10.2 Multi-Threading Runtime Support
        1. 8.10.2.1 Runtime Thread Safety
        2. 8.10.2.2 Thread Creation, Initialization, and Termination
        3. 8.10.2.3 Thread Local Storage (TLS)
        4. 8.10.2.4 Accessing Shared Data
  10. Using Run-Time-Support Functions and Building Libraries
    1. 9.1 C and C++ Run-Time Support Libraries
      1. 9.1.1 Linking Code With the Object Library
      2. 9.1.2 Header Files
      3. 9.1.3 Modifying a Library Function
      4. 9.1.4 Support for String Handling
      5. 9.1.5 Minimal Support for Internationalization
      6. 9.1.6 Support for Time and Clock Functions
      7. 9.1.7 Allowable Number of Open Files
      8. 9.1.8 Library Naming Conventions
    2. 9.2 The C I/O Functions
      1. 9.2.1 High-Level I/O Functions
        1. 9.2.1.1 Formatting and the Format Conversion Buffer
      2. 9.2.2 Overview of Low-Level I/O Implementation
        1.       open
        2.       close
        3.       read
        4.       write
        5.       lseek
        6.       unlink
        7.       rename
      3. 9.2.3 Device-Driver Level I/O Functions
        1.       DEV_open
        2.       DEV_close
        3.       DEV_read
        4.       DEV_write
        5.       DEV_lseek
        6.       DEV_unlink
        7.       DEV_rename
      4. 9.2.4 Adding a User-Defined Device Driver for C I/O
        1. 9.2.4.1 Mapping Default Streams to Device
      5. 9.2.5 The device Prefix
        1.       add_device
        2.       470
        3. 9.2.5.1 Program for C I/O Device
    3. 9.3 Handling Reentrancy (_register_lock() and _register_unlock() Functions)
    4. 9.4 Library-Build Process
      1. 9.4.1 Required Non-Texas Instruments Software
      2. 9.4.2 Using the Library-Build Process
        1. 9.4.2.1 Automatic Standard Library Rebuilding by the Linker
        2. 9.4.2.2 Invoking mklib Manually
          1. 9.4.2.2.1 Building Standard Libraries
          2. 9.4.2.2.2 Shared or Read-Only Library Directory
          3. 9.4.2.2.3 Building Libraries With Custom Options
          4. 9.4.2.2.4 The mklib Program Option Summary
      3. 9.4.3 Extending mklib
        1. 9.4.3.1 Underlying Mechanism
        2. 9.4.3.2 Libraries From Other Vendors
  11. 10C++ Name Demangler
    1. 10.1 Invoking the C++ Name Demangler
    2. 10.2 Sample Usage of the C++ Name Demangler
  12.   A Glossary
    1.     A.1 Terminology
  13.   B Revision History
  14.   491
  15.   492
  16.   B Earlier Revisions

Using Intrinsics to Access Assembly Language Statements

The C6000 compiler recognizes a number of intrinsic operators. Intrinsics allow you to express the meaning of certain assembly statements that would otherwise be cumbersome or inexpressible in C/C++. Intrinsics are used like functions; you can use C/C++ variables with these intrinsics, just as you would with any normal function.

The intrinsics are specified with a leading underscore, and are accessed by calling them as you do a function. For example:

int x1, x2, y;
y = _sadd(x1, x2);
Note: Intrinsic Instructions in C Versus Assembly Language

In some instances, an intrinsic’s exact corresponding assembly language instruction may not be used by the compiler. When this is the case, the meaning of the program does not change.

The tables that list intrinsics apply to device families as follows:

Table 8-3 Device Families and Intrinsics Tables
FamilyTable 8-5Table 8-6Table 8-7
C6400+Yes
C6740YesYes
C6600YesYesYes

Table 8-4 provides a summary of the C6000 intrinsics clarifying which devices support which intrinsics.

Table 8-4 C6000 C/C++ Intrinsics Support by Device
IntrinsicC6400+C6740C6600
_absYesYesYes
_abs2YesYesYes
_add2YesYesYes
_add4YesYesYes
_addsubYesYesYes
_addsub2YesYesYes
_amem2YesYesYes
_amem2_constYesYesYes
_amem4YesYesYes
_amem4_constYesYesYes
_amem8YesYesYes
_amem8_constYesYesYes
_amem8_f2YesYesYes
_amem8_f2_constYesYesYes
_amemd8YesYesYes
_amemd8_constYesYesYes
_avg2YesYesYes
_avgu4YesYesYes
_bitc4YesYesYes
_bitrYesYesYes
_ccmatmpyYes
_ccmatmpyr1Yes
_ccmpy32r1Yes
_clrYesYesYes
_clrrYesYesYes
_cmatmpyYes
_cmatmpyr1Yes
_cmpeq2YesYesYes
_cmpeg4YesYesYes
_cmpgt2YesYesYes
_cmpgtu4YesYesYes
_cmplt2YesYesYes
_cmpltu4YesYesYes
_cmpyYesYesYes
_cmpy32r1Yes
_cmpyrYesYesYes
_cmpyr1YesYesYes
_cmpyspYes
_complex_conjugate_mpyspYes
_complex_mpyspYes
_crot270Yes
_crot90Yes
_daddYes
_dadd2Yes
_daddspYes
_dadd_cYes
_dapys2Yes
_davg2Yes
_davgnr2Yes
_davgnru4Yes
_davgu4Yes
_dccmpyr1Yes
_dcmpeq2Yes
_dcmpeq4Yes
_dcmpgt2Yes
_dcmpgtu4Yes
_dccmpyYes
_dcmpyYes
_dcmpyr1Yes
_dcrot90Yes
_dcrot270Yes
_ddotp4YesYesYes
_ddotp4hYes
_ddotph2YesYesYes
_ddotph2rYesYesYes
_ddotpl2YesYesYes
_ddotpl2rYesYesYes
_ddotpsu4hYes
_dealYesYesYes
_dinthspYes
_dinthspuYes
_dintspYes
_dintspuYes
_dmax2Yes
_dmaxu4Yes
_dmin2Yes
_dminu4Yes
_dmpy2Yes
_dmpyspYes
_dmpysu4Yes
_dmpyu2Yes
_dmpyu4Yes
_dmvYesYesYes
_dmvdYes
_dotp2YesYesYes
_dotp4hYes
_dotp4hllYes
_dotpn2YesYesYes
_dotpnrsu2YesYesYes
_dotpnrus2YesYesYes
_dotprsu2YesYesYes
_dotpsu4YesYesYes
_dotpus4YesYesYes
_dotpsu4hYes
_dotpsu4hllYes
_dotpu4YesYesYes
_dpack2YesYesYes
_dpackh2Yes
_dpackh4Yes
_dpacklh2Yes
_dpacklh4Yes
_dpackl2Yes
_dpackl4Yes
_dpackx2YesYesYes
_dpintYesYes
_dsaddYes
_dsadd2Yes
_dshlYes
_dshl2Yes
_dshrYes
_dshr2Yes
_dshruYes
_dshru2Yes
_dsmpy2Yes
_dspacku4Yes
_dspintYes
_dspinthYes
_dssubYes
_dssub2Yes
_dsubYes
_dsub2Yes
_dsubspYes
_dtolYesYesYes
_dtollYesYesYes
_dxpnd2Yes
_dxpnd4Yes
_extYesYesYes
_extrYesYesYes
_extuYesYesYes
_exturYesYesYes
_f2tolYesYes
_f2tollYesYes
_fabsYesYes
_fabsfYesYes
_fdmvd_f2Yes
_fdmv_f2YesYesYes
_ftoiYesYesYes
_gmpyYesYesYes
_gmpy4YesYesYes
_hiYesYesYes
_hillYesYesYes
_itodYesYesYes
_itofYesYesYes
_itollYesYesYes
_labsYesYesYes
_landYes
_landnYes
_ldotp2YesYesYes
_lmbdYesYesYes
_lnormYesYesYes
_loYesYesYes
_lollYesYesYes
_lorYes
_lsaddYesYesYes
_lssubYesYesYes
_ltodYesYesYes
_lltodYesYesYes
_lltof2YesYes
_ltof2YesYes
_max2YesYesYes
_maxu4YesYesYes
_mfenceYes
_min2YesYesYes
_minu4YesYesYes
_mem2YesYesYes
_mem2_constYesYesYes
_mem4YesYesYes
_mem4_constYesYesYes
_mem8YesYesYes
_mem8_constYesYesYes
_mem8_f2YesYes
_mem8_f2_constYesYes
_memd8YesYesYes
_memd8_constYesYesYes
_mpyYesYesYes
_mpy2irYesYesYes
_mpy2llYesYesYes
_mpy32YesYesYes
_mpy32llYesYesYes
_mpy32suYesYesYes
_mpy32uYesYesYes
_mpy32usYesYesYes
_mpyhYesYesYes
_mpyhillYesYesYes
_mpyihllYesYesYes
_mpyilllYesYesYes
_mpyhirYesYesYes
_mpyihrYesYesYes
_mpyilrYesYesYes
_mpyhlYesYesYes
_mpyhluYesYesYes
_mpyhsluYesYesYes
_mpyhsuYesYesYes
_myphuYesYesYes
_mpyhulsYesYesYes
_mpyhusYesYesYes
_mpyidllYesYes
_mpylhYesYesYes
_mpylhuYesYesYes
_mpylillYesYesYes
_mpylirYesYesYes
_mpylshuYesYesYes
_mpyluhsYesYesYes
_mpysp2dpYesYes
_mpyspdpYesYes
_mpysuYesYesYes
_mpysu4llYesYesYes
_mpyus4llYesYesYes
_mpyuYesYesYes
_mpyu2Yes
_mpyu4llYesYesYes
_mpyusYesYesYes
_mvdYesYesYes
_nassertYesYesYes
_normYesYesYes
_pack2YesYesYes
_packh2YesYesYes
_packh4YesYesYes
_packhl2YesYesYes
_packl4YesYesYes
_packlh2YesYesYes
_qmpy32Yes
_qmpyspYes
_qsmpy32r1Yes
_rcpdpYesYes
_rcpspYesYes
_rsqrdpYesYes
_rsqrspYesYes
_rotlYesYesYes
_rpack2YesYesYes
_saddYesYesYes
_sadd2YesYesYes
_saddsubYesYesYes
_saddsub2YesYesYes
_saddu4YesYesYes
_saddus2YesYesYes
_saddsu2YesYesYes
_satYesYesYes
_setYesYesYes
_setrYesYesYes
_shflYesYesYes
_shfl3YesYesYes
_shl2Yes
_shlmbYesYesYes
_shr2YesYesYes
_shrmbYesYesYes
_shru2YesYesYes
_smpyYesYesYes
_smpy2llYesYesYes
_smpy32YesYesYes
_smpyhYesYesYes
_smpyhlYesYesYes
_smpylhYesYesYes
_spack2YesYesYes
_spacku4YesYesYes
_spintYesYes
_sshlYesYesYes
_sshvlYesYesYes
_sshvrYesYesYes
_ssubYesYesYes
_ssub2YesYesYes
_sub2YesYesYes
_sub4YesYesYes
_subabs4YesYesYes
_subcYesYesYes
_swap2YesYesYes
_swap4YesYesYes
_unpkbu4Yes
_unpkh2Yes
_unpkhu2Yes
_unpkhu4YesYesYes
_unpklu4YesYesYes
_xorll_cYes
_xormpyYesYesYes
_xpnd2YesYesYes
_xpnd4YesYesYes

The intrinsics listed in Table 8-5 can be used on all C6000 devices. They correspond to the indicated C6000 assembly language instruction(s). See the TMS320C6000 CPU and Instruction Set Reference Guide for more information.

See Table 8-6 for a list of intrinsics that are specific to C6740 and C6600. See Table 8-7 for a list of C6600-specifiic intrinsics.

Some items listed in the following tables are actually defined in the c6x.h header file as macros that point to intrinsics. This header file is provided in the compiler's "include" directory. Your code must include this header file in order to use the noted macros.

Table 8-5 TMS320C6000 C/C++ Compiler Intrinsics
C/C++ Compiler IntrinsicAssembly
Instruction
Description
int _abs (int src);
__int40_t _labs (__int40_t src);
ABSReturns the saturated absolute value of src
int _abs2 (int src);ABS2Calculates the absolute value for each 16-bit value
int _add2 (int src1, int src2);ADD2Adds the upper and lower halves of src1 to the upper and lower halves of src2 and returns the result. Any overflow from the lower half add does not affect the upper half add.
int _add4 (int src1, int src2);ADD4Performs 2s-complement addition to pairs of packed 8-bit numbers
long long _addsub (int src1, int src2);ADDSUBPerforms an addition and subtraction in parallel.
long long _addsub2 (int src1, int src2);ADDSUB2Performs an ADD2 and SUB2 in parallel.
ushort & _amem2 (void *ptr);LDHU
STH
Allows aligned loads and stores of 2 bytes to memory. The pointer must be aligned to a two-byte boundary.(1)
const ushort & _amem2_const (const void *ptr);LDHUAllows aligned loads of 2 bytes from memory. The pointer must be aligned to a two-byte boundary.(1)
unsigned & _amem4 (void *ptr);LDW
STW
Allows aligned loads and stores of 4 bytes to memory. The pointer must be aligned to a four-byte boundary.(1)
const unsigned & _amem4_const (const void *ptr);LDWAllows aligned loads of 4 bytes from memory. The pointer must be aligned to a four-byte boundary.(1)
long long & _amem8 (void *ptr);LDDW
STDW
Allows aligned loads and stores of 8 bytes to memory. The pointer must be aligned to an eight-byte boundary. An LDDW or STDW instruction will be used.
const long long & _amem8_const (const void *ptr);LDW/LDW
LDDW
Allows aligned loads of 8 bytes from memory. The pointer must be aligned to an eight-byte boundary.(2)
__float2_t & _amem8_f2(void * ptr);LDDW
STDW
Allows aligned loads and stores of 8 bytes to memory. The pointer must be aligned to an eight-byte boundary. This is defined as a macro. You must include c6x.h. (2)(1)
const __float2_t & _amem8_f2_const(void * ptr);LDDWAllows aligned loads of 8 bytes from memory. The pointer must be aligned to an eight-byte boundary. This is defined as a macro. You must include c6x.h. (2)(1)
double & _amemd8 (void *ptr);LDDW
STDW
Allows aligned loads and stores of 8 bytes to memory. The pointer must be aligned to an eight-byte boundary.(1)(2)
An LDDW or STDW instruction will be used.
const double & _amemd8_const (const void *ptr);LDW/LDW
LDDW
Allows aligned loads of 8 bytes from memory. The pointer must be aligned to an eight-byte boundary.(1)(2)
int _avg2 (int src1, int src2);AVG2Calculates the average for each pair of signed 16-bit values
unsigned _avgu4 (unsigned src1, unsigned src2);AVGU4Calculates the average for each pair of unsigned 8-bit values
unsigned _bitc4 (unsigned src);BITC4For each of the 8-bit quantities in src, the number of 1 bits is written to the corresponding position in the return value
unsigned _bitr (unsigned src);BITRReverses the order of the bits
unsigned _clr (unsigned src2, unsigned csta,
unsigned cstb);
CLRClears the specified field in src2. The beginning and ending bits of the field to be cleared are specified by csta and cstb, respectively.
unsigned _clrr (unsigned src2, int src1);CLRClears the specified field in src2. The beginning and ending bits of the field to be cleared are specified by the lower 10 bits of src1.
int _cmpeq2 (int src1, int src2);CMPEQ2Performs equality comparisons on each pair of 16-bit values. Equality results are packed into the two least-significant bits of the return value.
int _cmpeq4 (int src1, int src2);CMPEQ4Performs equality comparisons on each pair of 8-bit values. Equality results are packed into the four least-significant bits of the return value.
int _cmpgt2 (int src1, int src2);CMPGT2Compares each pair of signed 16-bit values. Results are packed into the two least-significant bits of the return value.
unsigned _cmpgtu4 (unsigned src1, unsigned src2);CMPGTU4Compares each pair of unsigned 8-bit values. Results are packed into the four least-significant bits of the return value.
int _cmplt2 (int src1, int src2);CMPLT2Swaps operands and calls _cmpgt2. This is defined as a macro. You must include c6x.h.
unsigned _cmpltu4 (unsigned src1, unsigned src2);CMPLTU4Swaps operands and calls _cmpgtu4. This is defined as a macro. You must include c6x.h.
long long _cmpy (unsigned src1, unsigned src2);
unsigned _cmpyr (unsigned src1, unsigned src2);
unsigned _cmpyr1 (unsigned src1, unsigned src2 );
CMPY
CMPYR
CMPYR1
Performs various complex multiply operations.
long long _ddotp4 (unsigned src1, unsigned src2);DDOTP4Performs two DOTP2 operations simultaneously.
long long _ddotph2 (long long src1, unsigned src2);
long long _ddotpl2 (long long src1, unsigned src2);
unsigned _ddotph2r (long long src1, unsigned src2);
unsigned _ddotpl2r (long long src1, unsigned src2);
DDOTPH2
DDOTPL2
DDOTPH2R
DDOTPL2
Performs various dual dot-product operations between two pairs of signed, packed 16-bit values.
unsigned _deal (unsigned src);DEALThe odd and even bits of src are extracted into two separate 16-bit values.
long long _dmv (int src1, int src2);DMVPlaces src1 in the 32 MSBs of the long long and src2 in the 32 LSBs of the long long. See also _itoll().
int _dotp2 (int src1, int src2);
__int40_t _ldotp2 (int src1, int src2);
DOTP2
DOTP2
The product of the signed lower 16-bit values of src1 and src2 is added to the product of the signed upper 16-bit values of src1 and src2. In the case of _dotp2, the signed result is written to a single 32-bit register. In the case of _ldotp2, the signed result is written to a 64-bit register pair.
int _dotpn2 (int src1, int src2);DOTPN2The product of the signed lower 16-bit values of src1 and src2 is subtracted from the product of the signed upper 16-bit values of src1 and src2.
int _dotpnrsu2 (int src1, unsigned src2);DOTPNRSU2The product of the lower 16-bit values of src1 and src2 is subtracted from the product of the upper 16-bit values of src1 and src2. The values in src1 are treated as signed packed quantities; the values in src2 are treated as unsigned packed quantities. 2^15 is added and the result is sign shifted right by 16.
int _dotpnrus2 (unsigned src1, int src2);DOTPNRUS2Swaps the operands and calls _dotpnrsu2. This is defined as a macro. You must include c6x.h.
int _dotprsu2 (int src1, unsigned src2);DOTPRSU2The product of the lower 16-bit values of src1 and src2 is added to the product of the upper 16-bit values of src1 and src2. The values in src1 are treated as signed packed quantities; the values in src2 are treated as unsigned packed quantities. 2^15 is added and the result is sign shifted by 16.
int _dotpsu4 (int src1, unsigned src2);
int _dotpus4 (unsigned src1, int src2);
unsigned _dotpu4 (unsigned src1, unsigned src2);
DOTPSU4
DOTPUS4
DOTPU4
For each pair of 8-bit values in src1 and src2, the 8-bit value from src1 is multiplied with the 8-bit value from src2. The four products are summed together.
_dotpus4 is defined as a macro. You must include c6x.h.
long long _dpack2 (unsigned src1, unsigned src2);DPACK2PACK2 and PACKH2 operations performed in parallel.
long long _dpackx2 (unsigned src1, unsigned src2);DPACKX2PACKLH2 and PACKX2 operations performed in parallel.
__int40_t _dtol (double src);Reinterprets double register pair src as an __int40_t (stored as a register pair).
long long _dtoll (double src);Reinterprets double register pair src as a long long register pair.
int _ext (int src2, unsigned csta, unsigned cstb);EXTExtracts the specified field in src2, sign-extended to 32 bits. The extract is performed by a shift left followed by a signed shift right; csta and cstb are the shift left and shift right amounts, respectively.
int _extr (int src2, int src1);EXTExtracts the specified field in src2, sign-extended to 32 bits. The extract is performed by a shift left followed by a signed shift right; the shift left and shift right amounts are specified by the lower 10 bits of src1.
unsigned _extu (unsigned src2, unsigned csta ,
unsigned cstb);
EXTUExtracts the specified field in src2, zero-extended to 32 bits. The extract is performed by a shift left followed by a unsigned shift right; csta and cstb are the shift left and shift right amounts, respectively.
unsigned _extur (unsigned src2, int src1);EXTUExtracts the specified field in src2, zero-extended to 32 bits. The extract is performed by a shift left followed by a unsigned shift right; the shift left and shift right amounts are specified by the lower 10 bits of src1.
__float2_t _fdmv_f2(float src1, float src2);DMVPlaces src1 in the 32 LSBs of the __float2_t and src2 in the 32 MSBs of the __float2_t. See also _itoll(). This is defined as a macro. You must include c6x.h.
unsigned _ftoi (float src);Reinterprets the bits in the float as an unsigned. For example:
_ftoi (1.0) == 1065353216U
unsigned _gmpy (unsigned src1, unsigned src2);GMPYPerforms the Galois Field multiply.
int _gmpy4 (int src1, int src2);GMPY4Performs the Galois Field multiply on four values in src1 with four parallel values in src2. The four products are packed into the return value.
unsigned _hi (double src);Returns the high (odd) register of a double register pair
unsigned _hill (long long src);Returns the high (odd) register of a long long register pair
double _itod (unsigned src2, unsigned src1);Builds a new double register pair by reinterpreting two unsigned values, where src2 is the high (odd) register and src1 is the low (even) register
float _itof (unsigned src);Reinterprets the bits in the unsigned as a float. For example:
_itof (0x3f800000) = 1.0
long long _itoll (unsigned src2, unsigned src1);Builds a new long long register pair by reinterpreting two unsigned values, where src2 is the high (odd) register and src1 is the low (even) register
unsigned _lmbd (unsigned src1, unsigned src2);LMBDSearches for a leftmost 1 or 0 of src2 determined by the LSB of src1. Returns the number of bits up to the bit change.
unsigned _lo (double src);Returns the low (even) register of a double register pair
unsigned _loll (long long src);Returns the low (even) register of a long long register pair
double _ltod (__int40_t src);Reinterprets an __int40_t register pair src as a double register pair.
double _lltod (long long src);Reinterprets long long register pair src as a double register pair.
int _max2 (int src1, int src2);
int _min2 (int src1, int src2);
unsigned _maxu4 (unsigned src1, unsigned src2);
unsigned _minu4 (unsigned src1, unsigned src2);
MAX2
MIN2
MAXU4
MINU4
Places the larger/smaller of each pair of values in the corresponding position in the return value. Values can be 16-bit signed or 8-bit unsigned.
ushort & _mem2 (void * ptr);LDB/LDB
STB/STB
Allows unaligned loads and stores of 2 bytes to memory(1)
const ushort & _mem2_const (const void * ptr);LDB/LDBAllows unaligned loads of 2 bytes to memory(1)
unsigned & _mem4 (void * ptr);LDNW
STNW
Allows unaligned loads and stores of 4 bytes to memory(1)
const unsigned & _mem4_const (const void * ptr);LDNWAllows unaligned loads of 4 bytes from memory(1)
long long & _mem8 (void * ptr);LDNDW
STNDW
Allows unaligned loads and stores of 8 bytes to memory(1)
const long long & _mem8_const (const void * ptr);LDNDWAllows unaligned loads of 8 bytes from memory(1)
double & _memd8 (void * ptr);LDNDW
STNDW
Allows unaligned loads and stores of 8 bytes to memory(2)(1)
const double & _memd8_const (const void * ptr);LDNDWAllows unaligned loads of 8 bytes from memory(2)(1)
int _mpy (int src1, int src2);
int _mpyus (unsigned src1, int src2);
int _mpysu (int src1, unsigned src2);
unsigned _mpyu (unsigned src1, unsigned src2);
MPY
MPYUS
MPYSU
MPYU
Multiplies the 16 LSBs of src1 by the 16 LSBs of src2 and returns the result. Values can be signed or unsigned.
long long _mpy2ir (int src1, int src2);MPY2IRPerforms two 16 by 32 multiplies. Both results are shifted right by 15 bits to produce a rounded result.
long long _mpy2ll (int src1, int src2);MPY2Returns the products of the lower and higher 16-bit values in src1 and src2
int _mpy32 (int src1, int src2);MPY32Returns the 32 LSBs of a 32 by 32 multiply.
long long _mpy32ll (int src1, int src2);
long long _mpy32su (int src1, int src2);
long long _mpy32us (unsigned src1, int src2);
long long _mpy32u (unsigned src1, unsigned src2);
MPY32
MPY32SU
MPY32US
MPY32U
Returns all 64 bits of a 32 by 32 multiply. Values can be signed or unsigned.
int _mpyh (int src1, int src2);
int _mpyhus (unsigned src1, int src2);
int _mpyhsu (int src1, unsigned src2);
unsigned _mpyhu (unsigned src1, unsigned src2);
MPYH
MPYHUS
MPYHSU
MPYHU
Multiplies the 16 MSBs of src1 by the 16 MSBs of src2 and returns the result. Values can be signed or unsigned.
long long _mpyhill (int src1, int src2);
long long _mpylill (int src1, int src2);

MPYHI
MPYLI
Produces a 16 by 32 multiply. The result is placed into the lower 48 bits of the return type. Can use the upper or lower 16 bits of src1.
int _mpyhir (int src1, int src2);
int _mpylir (int src1, int src2);
MPYHIR
MPYLIR
Produces a signed 16 by 32 multiply. The result is shifted right by 15 bits. Can use the upper or lower 16 bits of src1.
int _mpyhl (int src1, int src2);
int _mpyhuls (unsigned src1, int src2);
int _mpyhslu (int src1, unsigned src2);
unsigned _mpyhlu (unsigned src1, unsigned src2);
MPYHL
MPYHULS
MPYHSLU
MPYHLU
Multiplies the 16 MSBs of src1 by the 16 LSBs of src2 and returns the result. Values can be signed or unsigned.
long long _mpyihll (int src1, int src2);
long long _mpyilll (int src1, int src2);

MPYIH
MPYIL
Swaps operands and calls _mpyhill. This is defined as a macro. You must include c6x.h.
Swaps operands and calls _mpylill. This is defined as a macro. You must include c6x.h.
int _mpyihr (int src1, int src2);
int _mpyilr (int src1, int src2);
MPYIHR
MPYILR
Swaps operands and calls _mpyhir. This is defined as a macro. You must include c6x.h.
Swaps operands and calls _mpylir. This is defined as a macro. You must include c6x.h.
int _mpylh (int src1, int src2);
int _mpyluhs (unsigned src1, int src2);
int _mpylshu (int src1, unsigned src2);
unsigned _mpylhu (unsigned src1, unsigned src2);
MPYLH
MPYLUHS
MPYLSHU
MPYLHU
Multiplies the 16 LSBs of src1 by the 16 MSBs of src2 and returns the result. Values can be signed or unsigned.
long long _mpysu4ll (int src1, unsigned src2);
long long _mpyus4ll (unsigned src1, int src2);
long long _mpyu4ll (unsigned src1, unsigned src2);
MPYSU4
MPYUS4
MPYU4
For each 8-bit quantity in src1 and src2, performs an 8-bit by 8-bit multiply. The four 16-bit results are packed into a 64-bit result. The results can be signed or unsigned.
_mpyus4ll is defined as a macro. You must include c6x.h.
int _mvd (int src2);MVDMoves the data from src2 to the return value over four cycles using the multiplier pipeline
void _nassert (int src);Generates no code. Tells the optimizer that the expression declared with the assert function is true; this gives a hint to the optimizer as to what optimizations might be valid.
unsigned _norm (int src);
unsigned _lnorm (__int40_t src);
NORMReturns the number of bits up to the first nonredundant sign bit of src
unsigned _pack2 (unsigned src1, unsigned src2);
unsigned _packh2 (unsigned src1, unsigned src2);
PACK2
PACKH2
The lower/upper halfwords of src1 and src2 are placed in the return value.
unsigned _packh4 (unsigned src1, unsigned src2);
unsigned _packl4 (unsigned src1, unsigned src2);
PACKH4
PACKL4
Packs alternate bytes into return value. Can pack high or low bytes.
unsigned _packhl2 (unsigned src1, unsigned src2);
unsigned _packlh2 (unsigned src1, unsigned src2);
PACKHL2
PACKLH2
The upper/lower halfword of src1 is placed in the upper halfword the return value. The lower/upper halfword of src2 is placed in the lower halfword the return value.
unsigned _rotl (unsigned src1, unsigned src2);ROTLRotates src1 to the left by the amount in src2
int _rpack2 (int src1, int src2);RPACK2Shifts src1 and src2 left by 1 with saturation. The 16 MSBs of the shifted src1 is placed in the 16 MSBs of the 32-bit output. The 16 MSBs of the shifted src2 is placed in the 16 LSBs of the 32-bit output.
int _sadd (int src1, int src2);
__int40_t _lsadd (int src1, __int40_t src2);
SADDAdds src1 to src2 and saturates the result. Returns the result.
int _sadd2 (int src1, int src2);
int _saddus2 (unsigned src1, int src2);
int _saddsu2 (int src1, unsigned src2);
SADD2
SADDUS2
SADDSU2
Performs saturated addition between pairs of 16-bit values in src1 and src2. Values for src1 can be signed or unsigned.
_saddsu2 is defined as a macro. You must include c6x.h.
long long _saddsub (unsigned src1, unsigned src2);SADDSUBPerforms a saturated addition and a saturated subtraction in parallel.
long long _saddsub2 (unsigned src1, unsigned src2);SADDSUB2Performs a SADD2 and a SSUB2 in parallel.
unsigned _saddu4 (unsigned src1, unsigned src2);SADDU4Performs saturated addition between pairs of 8-bit unsigned values in src1 and src2.
int _sat (__int40_t src2);SATConverts a 40-bit long to a 32-bit signed int and saturates if necessary.
unsigned _set (unsigned src2, unsigned csta ,
unsigned cstb);
SETSets the specified field in src2 to all 1s and returns the src2 value. The beginning and ending bits of the field to be set are specified by csta and cstb, respectively.
unsigned _setr (unit src2, int src1); SETSets the specified field in src2 to all 1s and returns the src2 value. The beginning and ending bits of the field to be set are specified by the lower ten bits of src1.
unsigned _shfl (unsigned src2);SHFLThe lower 16 bits of src2 are placed in the even bit positions, and the upper 16 bits of src are placed in the odd bit positions.
long long _shfl3 (unsigned src1, unsigned src2);SHFL3Takes two 16-bit values from src1 and 16 LSBs from src2 to perform a 3-way interleave, creating a 48-bit result.
unsigned _shlmb (unsigned src1, unsigned src2);
unsigned _shrmb (unsigned src1, unsigned src2);
SHLMB
SHRMB
Shifts src2 left/right by one byte, and the most/least significant byte of src1 is merged into the least/most significant byte position.
int _shr2 (int src1, unsigned src2);
unsigned _shru2 (unsigned src1, unsigned src2);
SHR2
SHRU2
For each 16-bit quantity in src1, the quantity is arithmetically or logically shifted right by src2 number of bits. src1 can contain signed or unsigned values.
int _smpy (int src1, int src2);
int _smpyh (int src1, int src2);
int _smpyhl (int src1, int src2);
int _smpylh (int src1, int src2);
SMPY
SMPYH
SMPYHL
SMPYLH
Multiplies src1 by src2, left shifts the result by 1, and returns the result. If the result is 0x80000000, saturates the result to 0x7FFFFFFF
long long _smpy2ll (int src1, int src2);SMPY2Performs 16-bit multiplication between pairs of signed packed 16-bit values, with an additional 1 bit left-shift and saturate into a 64-bit result.
int _smpy32 (int src1, int src2);SMPY32Returns the 32 MSBs of a 32 by 32 multiply shifted left by 1.
int _spack2 (int src1, int src2);SPACK2Two signed 32-bit values are saturated to 16-bit values and packed into the return value
unsigned _spacku4 (int src1, int src2);SPACKU4Four signed 16-bit values are saturated to 8-bit values and packed into the return value
int _sshl (int src2, unsigned src1);SSHLShifts src2 left by the contents of src1, saturates the result to 32 bits, and returns the result
int _sshvl (int src2, int src1);
int _sshvr (int src2, int src1);
SSHVL
SSHVR
Shifts src2 to the left/right src1 bits. Saturates the result if the shifted value is greater than MAX_INT or less than MIN_INT.
int _ssub (int src1, int src2);
__int40_t _lssub (int src1, __int40_t src2);
SSUBSubtracts src2 from src1, saturates the result, and returns the result.
int _ssub2 (int src1, int src2);SSUB2Subtracts the upper and lower halves of src2 from the upper and lower halves of src1 and saturates each result.
int _sub4 (int src1, int src2);SUB4Performs 2s-complement subtraction between pairs of packed 8-bit values
int _subabs4 (int src1, int src2);SUBABS4Calculates the absolute value of the differences for each pair of packed unsigned 8-bit values
unsigned _subc (unsigned src1, unsigned src2);SUBCConditional subtract divide step
int _sub2 (int src1, int src2);SUB2Subtracts the upper and lower halves of src2 from the upper and lower halves of src1, and returns the result. Borrowing in the lower half subtract does not affect the upper half subtract.
unsigned _swap4 (unsigned src);SWAP4Exchanges pairs of bytes (an endian swap) within each 16-bit value.
unsigned _swap2 (unsigned src);SWAP2Calls _packlh2. This is defined as a macro. You must include c6x.h.
unsigned _unpkhu4 (unsigned src);UNPKHU4Unpacks the two high unsigned 8-bit values into unsigned packed 16-bit values
unsigned _unpklu4 (unsigned src);UNPKLU4Unpacks the two low unsigned 8-bit values into unsigned packed 16-bit values
unsigned _xormpy (unsigned src1, unsigned src2);XORMPYPerforms a Galois Field multiply
unsigned _xpnd2 (unsigned src);XPND2Bits 1 and 0 of src are replicated to the upper and lower halfwords of the result, respectively.
unsigned _xpnd4 (unsigned src);XPND4Bits 3 and 0 of src are replicated to bytes 3 through 0 of the result.
See the TMS320C6000 Programmer's Guide for more information.
See Section 8.6.10 for details on manipulating 8-byte data quantities.

   

The intrinsics listed in Table 8-6 can be used for C6740 and C6600 devices, but not C6400+ devices. The intrinsics listed correspond to the indicated C6000 assembly language instruction(s). See the TMS320C6000 CPU and Instruction Set Reference Guide for more information.

See Table 8-5 for a list of generic C6000 intrinsics. See Table 8-7 for a list of C6600-specific intrinsics.

Table 8-6 TMS320C6740 and C6600 C/C++ Compiler Intrinsics
C/C++ Compiler IntrinsicAssembly InstructionDescription
int _dpint (double src);DPINTConverts 64-bit double to 32-bit signed integer, using the rounding mode set by the CSR register.
__int40_t _f2tol(__float2_t src);Reinterprets a __float2_t register pair src as an __int40_t (stored as a register pair). This is defined as a macro. You must include c6x.h.
__float2_t _f2toll(__float2_t src);Reinterprets a __float2_t register pair as a long long register pair. This is defined as a macro. You must include c6x.h.
double _fabs (double src);
float _fabsf (float src);
ABSDP
ABSSP
Returns absolute value of src.
__float2_t _lltof2(long long src);Reinterprets a long long register pair as a __float2_t register pair. This is defined as a macro. You must include c6x.h.
__float2_t _ltof2(__int40_t src);Reinterprets an __int40_t register pair as a __float2_t register pair. This is defined as a macro. You must include c6x.h.
__float2_t & _mem8_f2(void * ptr);LDNDW
STNDW
Allows unaligned loads and stores of 8 bytes to memory.(1) This is defined as a macro. You must include c6x.h.
const __float2_t & _mem8_f2_const(void * ptr);LDNDW
STNDW
Allows unaligned loads of 8 bytes from memory.(1) This is defined as a macro. You must include c6x.h.
long long _mpyidll (int src1, int src2);MPYIDProduces a signed integer multiply. The result is placed in a register pair.
double_mpysp2dp (float src1, float src2);MPYSP2DPProduces a double-precision floating-point multiply. The result is placed in a register pair.
double_mpyspdp (float src1, double src2);MPYSPDPProduces a double-precision floating-point multiply. The result is placed in a register pair.
double _rcpdp (double src);RCPDPComputes the approximate 64-bit double reciprocal.
float _rcpsp (float src);RCPSPComputes the approximate 32-bit float reciprocal.
double _rsqrdp (double src);RSQRDPComputes the approximate 64-bit double square root reciprocal.
float _rsqrsp (float src);RSQRSPComputes the approximate 32-bit float square root reciprocal.
int _spint (float src);SPINTConverts 32-bit float to 32-bit signed integer, using the rounding mode set by the CSR register.
See Section 8.6.10 for details on manipulating 8-byte data quantities.

    

The intrinsics listed in Table 8-7 are supported only for C6600 devices. These intrinsics are in addition to those listed in Table 8-5 and Table 8-6. The intrinsics listed correspond to the indicated assembly language instruction(s). See the TMS320C6000 CPU and Instruction Set Reference Guide for more information.

Table 8-7 TMS320C6600 C/C++ Compiler Intrinsics
C/C++ Compiler IntrinsicAssembly InstructionDescription
ADDDPNo intrinsic. Use native C: a + b where a and b are doubles.
ADDSPNo intrinsic. Use native C: a + b where a and b are floats.
ANDNo intrinsic: Use native C: "a & b" where a and b are long longs.
ANDNNo intrinsic: Use native C: "a & ~b" where a and b are long longs.
FMPYDPNo intrinsic. Use native C: a * b where a and b are doubles.
ORNo intrinsic: Use native C: "a | b" where a and b are long longs.
SUBDPNo intrinsic. Use native C: a - b where a and b are doubles.
SUBSPNo intrinsic. Use native C: a - b where a and b are floats.
XORNo intrinsic: Use native C: "a ^ b" where a and b are long longs. See also _xorll_c().
__x128_t _ccmatmpy (long long src1, __x128_t src2);CCMATMPYMultiply the conjugate of 1x2 complex vector by a 2x2 complex matrix, producing two 64-bit results. For details on the __x128_t container type see Section 8.6.7.
long long _ccmatmpyr1 (long long src1,
__x128_t src2);
CCMATMPYR1Multiply the complex conjugate of a 1x2 complex vector by a 2x2 complex matrix, producing two 32-bit complex results.
long long _ccmpy32r1 (long long src1, long long src2);CCMPY32R132-bit complex conjugate multiply of Q31 numbers with rounding.
__x128_t _cmatmpy (long long src1, __x128_t src2);CMATMPYMultiply a 1x2 vector by a 2x2 complex matrix, producing two 64-bit complex results.
long long _cmatmpyr1 (long long src1, __x128_t src2);CMATMPYR1Multiply a 1x2 complex vector by a 2x2 complex matrix, producing two 32-bit complex results.
long long _cmpy32r1 (long long src1, long long src2);CMPY32R132-bit complex multiply of Q31 numbers with rounding.
__x128_t _cmpysp (__float2_t src1, __float2_t src2);CMPYSPPerform the multiply operations for a complex multiply of two complex numbers (See also _complex_mpysp and _complex_conjugate_mpysp.)
double _complex_conjugate_mpysp (double src1,
double src2);
CMPYSP
DSUBSP
Performs a complex conjugate multiply by performing a CMPYSP and DSUBSP.
double _complex_mpysp (double src1, double src2);CMPYSP
DADDSP
Performs a complex multiply by performing a CMPYSP and DADDSP.
int _crot90 (int src);CROT90Rotate complex number by 90 degrees.
int _crot270 (int src);CROT270Rotate complex number by 270 degrees.
long long _dadd (long long src1, long long src2);DADDTwo-way SIMD addition of signed 32-bit values producing two signed 32-bit results.
long long _dadd2 (long long src1, long long src2);DADD2Four-way SIMD addition of packed signed 16-bit values producing four signed 16-bit results. (Two-way _add2)
__float2_t _daddsp (__float2_t src1, __float2_t src2);DADDSPTwo-way SIMD addition of 32-bit single precision numbers.
long long _dadd_c (scst5 immediate src1,
long long src2);
DADDAddition of two signed 32-bit values by a single constant in src2 (-16 to 15) producing two signed 32-bit results.
long long _dapys2 (long long src1, long long src2);DAPYS2Use the sign bit of src1 to determine whether to multiply the four 16-bit values in src2 by 1 or -1. Yields four signed 16-bit results. (If src1 and src2 are the same register pair, it is equivalent to a two-way _abs2).
long long _davg2 (long long src1, long long src2);DAVG2Four-way SIMD average of signed 16-bit values, with rounding. (Two-way _avg2)
long long _davgnr2 (long long src1, long long src2);DAVGNR2Four-way SIMD average of signed 16-bit values, without rounding.
long long _davgnru4 (long long src1, long long src2);DAVGNRU4Eight-way SIMD average of unsigned 8-bit values, without rounding.
long long _davgu4 (long long src1, long long src2);DAVGU4Eight-way SIMD average of unsigned 8-bit values, with rounding. (Two-way _avgu4)
long long _dccmpyr1 (long long src1, long long src2);DCCMPYR1Two-way SIMD complex multiply with rounding (_cmpyr1) with complex conjugate of src2.
unsigned _dcmpeq2 (long long src1, long long src2);DCMPEQ2Four-way SIMD comparison of signed 16-bit values. Results are packed into the four least-significant bits of the return value. (Two-way _cmpeq2)
unsigned _dcmpeq4 (long long src1, long long src2);DCMPEQ4Eight-way SIMD comparison of unsigned 8-bit values. Results are packed into the eight least-significant bits of the return value. (Two-way _cmpeq4)
unsigned _dcmpgt2 (long long src1, long long src2);DCMPGT2Four-way SIMD comparison of signed 16-bit values. Results are packed into the four least-significant bits of the return value. (Two-way _cmpgt2)
unsigned _dcmpgtu4 (long long src1, long long src2);DCMPGTU4Eight-way SIMD comparison of unsigned 8-bit values. Results are packed into the eight least-significant bits of the return value. (Two-way _cmpgtu4)
__x128_t _dccmpy (long long src1, long long src2);DCCMPYTwo complex multiply operations on two sets of packed complex numbers, with complex conjugate of src2.
__x128_t _dcmpy (long long src1, long long src2);DCMPYPerforms two complex multiply operations on two sets of packed complex numbers. (Two-way SIMD _cmpy)
long long _dcmpyr1 (long long src1, long long src2);DCMPYR1Two-way SIMD complex multiply with rounding (_cmpyr1).
long long _dcrot90 (long long src);DCROT90Two-way SIMD version of _crot90.
long long _dcrot270 (long long src);DCROT270Two-way SIMD version of _crot270.
long long _ddotp4h (__x128_t src1, __x128_t src2 );DDOTP4HPerforms two dot-products between four sets of packed 16-bit values. (Two-way _dotp4h)
long long _ddotpsu4h (__x128_t src1, __x128_t src2 );DDOTPSU4HPerforms two dot-products between four sets of packed 16-bit values. (Two-way _dotpsu4h)
__float2_t _dinthsp (int src);DINTHSPConverts two packed signed 16-bit values into two single-precision floating point values.
__float2_t _dinthspu (unsigned src);DINTHSPUConverts two packed unsigned 16-bit values into two single-precision float point values.
__float2_t _dintsp(long long src);DINTSPConverts two 32-bit signed integers to two single-precision float point values.
__float2_t _dintspu(long long src);DINTSPUConverts two 32-bit unsigned integers to two single-precision float point values.
long long _dmax2 (long long src1, long long src2);DMAX2Four-way SIMD maximum of 16-bit signed values producing four signed 16-bit results. (Two-way _max2)
long long _dmaxu4 (long long src1, long long src2);DMAXU48-way SIMD maximum of unsigned 8-bit values producing eight unsigned 8-bit results. (Two-way _maxu4)
long long _dmin2 (long long src1, long long src2);DMIN2Four-way SIMD minimum of signed 16-bit values producing four signed 16-bit results. (Two-way _min2)
long long _dminu4 (long long src1, long long src2);DMINU48-way SIMD minimum of unsigned 8-bit values producing eight unsigned 8-bit results. (Two-way _minu4)
__x128_t _dmpy2 (long long src1, long long src2);DMPY2Four-way SIMD multiply of signed 16-bit values producing four signed 32-bit results. (Two-way _mpy2)
__float2_t _dmpysp (__float2_t src1, __float2_t src2);DMPYSPTwo-way single precision floating point multiply producing two single-precision results.
__x128_t _dmpysu4 (long long src1, long long src2);DMPYSU4Eight-way SIMD multiply of signed 8-bit values by unsigned 8-bit values producing eight signed 16-bit results. (Two-way _mpysu4)
__x128_t _dmpyu2 (long long src1, long long src2);DMPYU2Four-way SIMD multiply of unsigned 16-bit values producing four unsigned 32-bit results. (Two-way _mpyu2)
__x128_t _dmpyu4 (long long src1, long long src2);DMPYU4Eight-way SIMD multiply of signed 8-bit values producing eight signed 16-bit results. (Two-way _mpyu4)
long long _dmvd (int src1, int src2 );DMVDPlaces src1 in the low register of the long long and src2 in the high register of the long long. Takes four cycles. See also _dmv(), _fdmv_f2, and _itoll().
int _dotp4h (long long src1, long long src2 );DOTP4HMultiply two sets of four signed 16-bit values and return the 32-bit sum.
long long _dotp4hll (long long src1, long long src2 );DOTP4HMultiply two sets of four signed 16-bit values and return the 64-bit sum.
int _dotpsu4h (long long src1, long long src2);DOTPSU4HMultiply four signed 16-bit values by four unsigned 16-bit values and return the 32-bit sum.
long long _dotpsu4hll (long long src1, long long src2);DOTPSU4HMultiply four signed 16-bit values by four unsigned 16-bit values and return the 64-bit sum.
long long _dpackh2 (long long src1, long long src2);DPACKH2Two-way _packh2.
long long _dpackh4 (long long src1, long long src2);DPACKH4Two-way _packh4.
long long _dpacklh2 (long long src1, long long src2);DPACKLH2Two-way _packlh2.
long long _dpacklh4 (unsigned src1, unsigned src2);DPACKLH4Performs a _packl4 and a _packh4. The output of the _packl4 is in the low register of the result and the output of the _packh4 is in the high register of the result.
long long _dpackl2 (long long src1, long long src2);DPACKL2Two-way _packl2.
long long _dpackl4 (long long src1, long long src2);DPACKL4Two-way _packl4.
long long _dsadd (long long src1, long long src2);DSADDTwo-way SIMD saturated addition of signed 32-bit values producing two signed 32-bit results. (Two-way _sadd)
long long _dsadd2 (long long src1, long long src2);DSADD2Four-way SIMD saturated addition of signed 16-bit values producing four signed 16-bit results. (Two-way _sadd2)
long long _dshl (long long src1, unsigned src2);DSHLShift-left of two signed 32-bit values by a single value in the src2 argument.
long long _dshl2 (long long src1, unsigned src2);DSHL2Shift-left of four signed 16-bit values by a single value in the src2 argument. (Two-way _shl2)
long long _dshr (long long src1, unsigned src2);DSHRShift-right of two signed 32-bit values by a single value in the src2 argument.
long long _dshr2 (long long src1, unsigned src2);DSHR2Shift-right of four signed 16-bit values by a single value in the src2 argument. (Two-way _shr2)
long long _dshru (long long src1, unsigned src2);DSHRUShift-right of two unsigned 32-bit values by a single value in the src2 argument.
long long _dshru2 (long long src1, unsigned src2);DSHRU2Shift-right of four unsigned 16-bit values by a single value in the src2 argument. (Two-way _shru2)
__x128_t _dsmpy2 (long long src1, long long src2);DSMPY2Four-way SIMD multiply of signed 16-bit values with 1-bit left-shift and saturate producing four signed 32-bit results. (Two-way _smpy2)
long long _dspacku4 (long long src1, long long src2);DSPACKU4Two-way _spacku4.
long long _dspint (__float2_t src);DSPINTConverts two packed single-precision floating point values to two signed 32-bit values.
unsigned _dspinth (__float2_t src);DSPINTHConverts two packed single-precision floating point values to two packed signed 16-bit values.
long long _dssub (long long src1, long long src2);DSSUBTwo-way SIMD saturated subtraction of 32-bit signed values producing two signed 32-bit results.
long long _dssub2 (long long src1, long long src2);DSSUB2Four-way SIMD saturated subtraction of signed 16-bit values producing four signed 16-bit results. (Two-way _ssub2)
long long _dsub (long long src1, long long src2);DSUBTwo-way SIMD subtraction of 32-bit signed values producing two signed 32-bit results.
long long _dsub2 (long long src1, long long src2);DSUB2Four-way SIMD subtraction of signed 16-bit values producing four signed 16-bit results. (Two-way _sub2)
__float2_t _dsubsp (__float2_t src1, __float2_t src2);DSUBSPTwo-way SIMD subtraction of 32-bit single precision numbers.
long long _dxpnd2 (unsigned src);DXPND2Expand four lower bits to four 16-bit fields.
long long _dxpnd4 (unsigned src);DXPND4Expand eight lower bits to eight 8-bit fields.
__float2_t _fdmvd_f2(float src1, float src2);DMVDPlaces src1 in the low register of the __float2_t and src2 in the high register of the __float2_t. Takes four cycles. See also _dmv(), _dmvd(), and _itoll(). This is defined as a macro. You must include c6x.h.
int _land (int src1, int src2);LANDLogical AND of src1 and src2.
int _landn (int src1, int src2);LANDNLogical AND of src1 and NOT of src2; i.e. src1 AND ~src2.
int _lor (int src1, int src2);LORLogical OR of src1 and src2.
void _mfence();MFENCEStall CPU while memory system is busy.
long long _mpyu2 (unsigned src1, unsigned src2 );MPYU2Two-way SIMD multiply of unsigned 16-bit values producing two unsigned 32-bit results.
__x128_t _qmpy32 (__x128_t src1, __x128_t src2);QMPY32Four-way SIMD multiply of signed 32-bit values producing four 32-bit results. (Four-way _mpy32)
__x128_t _qmpysp (__x128_t src1, __x128_t src2);QMPYSPFour-way SIMD 32-bit single precision multiply producing four 32-bit single precision results.
__x128_t _qsmpy32r1 (__x128_t src1, __x128_t src2);QSMPY32R14-way SIMD fractional 32-bit by 32-bit multiply where each result value is shifted right by 31 bits and rounded. This normalizes the result to lie within -1 and 1 in a Q31 fractional number system.
unsigned _shl2 (unsigned src1, unsigned src2);SHL2Shift-left of two signed 16-bit values by a single value in the src2 argument.
long long _unpkbu4 (unsigned src);UNPKBU4Unpack four unsigned 8-bit values into four unsigned 16-bit values. (See also _unpklu4 and _unpkhu4)
long long _unpkh2 (unsigned src);UNPKH2Unpack two signed 16-bit values to two signed 32-bit values.
long long _unpkhu2 (unsigned src);UNPKHU2Unpack two unsigned 16-bit values to two unsigned 32-bit values.
long long _xorll_c (scst5 immediate src1, long long src2);XORXOR src1 with the upper and lower 32-bit portions of src2 (SIMD XOR by constant).