SPRUJ17 User guide

SPRUJ17H March 2022 – October 2024 AM2631 , AM2631-Q1 , AM2632 , AM2632-Q1 , AM2634 , AM2634-Q1

7.3.4.4.1.1 PKA Introduction and Features

The Public Key Accelerator (PKA) module provides a high-performance public key engine to accelerate the large vector math processing that is required for Public Key computations. PKA also includes HW acceleration for Elliptic Curve Cryptography (ECC) such as binary field ECC point addition, inversion, multiplication and ECC prime field point addition, inversion and multiplication. The ECC prime field engine GF(p) supports all NIST (FIPS 186-3) recommended prime curves upto 521-bit key length. PKA module provides the following basic operations:

Basic Public Key crypto operations that use the 32-bit PKCP engine:
- Large vector addition, substraction, and combined addition/substraction
- Large vector compare and copy
- Large vector bit shift right or left
- Large vector multiplication, modulo, and division
Dual LNME engine for Montgomery multiplication and exponentiation:
- Y = X * Y * R^-1 mod N
- Y[1] = X * Y[0] * R^-1 mod N
- B = X * Y * R^-1 mod N
- Y = X^B * R^-1 mod N
Binary field GF2m engine to accelerate ECC binary field GF(2^m) operations such as add, multiply and modular inversion
ECC prime field GF(p) operations such as point addition, multiplication, doubling over all NIST recommended prime curves
Complex operations under control of an embedded Sequencer microcontroller using locally stored firmware:
- Large vector unsigned value modular exponentiation.
- Large vector unsigned value modular exponentiation using the 'Chinese Remainders Theorem' (CRT) method with pre-calculated Q inverse vector.
- Modular inversion: given A and M, calculate B such that ((A * B) MOD M) = 1.
- Prime field ECC Point Addition/Doubling on elliptic curve y² = x³ + ax + b (mod p), with 'p' a prime number and 'a' input value to the operation ('b' is not used), adding two identical points automatically performs point doubling. Both input and output points are in projective format: a, b, p, (X1, Y1, Z1), (X2, Y2, Z2) —> (X3, Y3, Z3).
- Prime field ECC Point Multiplication by scalar 'k' on elliptic curve y² = x³ + ax + b (mod p), with 'p' a prime number and 'a' and 'b' input values to the operation. Both input and output points are in projective format. The y-coordinate will always be provided as output parameter: a, b, p, k, (X1, Y1, Z1) —> (X2, Y2, Z2). Input Z1 is restricted to value '1'.
- Binary field ECC Point Addition with automatic switching to Doubling. Both input and output points are in projective format a, b, p, (X1, Y1, Z1), (X2, Y2, Z2) —> (X3, Y3, Z3)
- Binary field ECC Point Multiplication. Both input and output points are in projective format: a, b, p, k, (X1, Y1, Z1) —> (X2, Y2, Z2). Input Z1 is restricted to value '1'.
- Binary field Modular Inversion: A, P —> A^-1 mod P, where P is the prime polynomial for the binary field.
- Exponent recoding techniques for modular exponentiation operations by means of a pre-calculated odd powers table

PKA module addresses the following use cases:

RSA use cases: The RSA algorithm is used for public key encryption and decryption as well as public key signature generation and verification.
ECDH key exchange used to produce a shared secret that is used to derive session keys for AES, SHA etc.
ECDSA which is used for signing of a message with a private key and authenticating the message using the matching public key.

The PKA module supports modulus sizes up to 4096-bit, and all key lengths (192, 224, 256, 384, 521) as recommended by NIST for ECC over prime fields.

Figure 7-104 is simplified a top level block diagram of the PKA core module.

Figure 7-104 PKA Block Diagram

The PKA core module consists of the following components:

A PKA Engine containing:
- A dual LNME module (a Montgomery multiplication and exponentiation unit) based on a scalable systolic array of Processing Elements (PEs). Each LNME unit requires access to the PKA RAM and a dedicated LNME FIFO (implemented as an embedded array of registers).
- A 32-bit Public Key Co-Processor (PKCP), that is able to perform a suite of big number (vector) operations typically encountered in public key cryptography applications. Both arguments and results are stored in PKA RAM, a memory block shared between the PKA Engine and its Host.
- A GF2m Engine for binary field ECC acceleration, supporting Addition, Multiplication and support functions for Inverse operations in the GF(2m) field.
- A Sequencer module, controlling modular exponentiation, Elliptic Curve Cryptography (ECC) and modular inversion operations on big numbers in PKA RAM. One of its main tasks is to hide the fact that most of these operations are actually done with numbers in Montgomery form. This module uses a Program RAM as code store.
- A peripheral configuration/MMR interface for both control of the module, firmware loading, and access to the PKA RAM, program RAM, and the internal registers.
Program RAM for storage of the Sequencer firmware, with size of 9KB.
Two two-port PKA RAMs, with total size of 4KB.

The PKA core module runs on the PKA_IN_CLK clock which is asynchronous to the X1_CLK and X2_CLK clocks. Internally PKA has one or more gated clocks for different sub-modules to allow for smart idle mode where the module decides, if the gated clocks need to run by activating or deactivating the respective clock enables. The gated clocks are dynamically controlled by the PKA module. All clock enables are provided in the PKA_CLK_CTRL register.