SPRUJ53B April 2024 – September 2024 TMS320F28P550SJ , TMS320F28P559SJ-Q1
Figure 31-1 shows the AES block diagram. A single-core or dual-interface architecture is used.
AES is an efficient implementation of the Rijndael cipher (the AES algorithm) and a 128-bit polynomial multiplication (referred to here as GHASH, according to the AES-GCM specification). Rijndael is a block cipher in which each data block is 128 bits. The polynomial multiplication multiplies two 128-bit vectors using the smallest 128-bit irreducible polynomial, represented by the following 128-bit string: {0 120}||10000111. The two implementations are combined into the AES wide-bus engine.
Depending on the availability of context and data, the AES wide-bus engine is automatically triggered to process the data. The AES wide-bus engine is directly connected to the context and data registers, so that the AES engine can immediately start processing when all data is available. The AES wide-bus engine also interfaces to the I/O control FSM/µDMA request interface.
AES comprises the following major functional blocks:
The AES wide-bus engine, which is the major top-level component, comprises the following functional blocks:
AES encryption requires a specific number of rounds, depending on the key length. The supported key lengths are 128-, 192-, and 256-bit, which require 10, 12, and 14 rounds, respectively, or 32-, 38-, and 44-clock cycles, respectively, because {number of clock cycles} = 2 + 3 × {number of rounds}.
The larger key lengths provide greater encryption strength at the expense of additional rounds, and therefore reduced throughput. The overall throughput of the AES executing polynomial multiplication is adjusted based on the overall cryptographic performance. The AES module contains one ECB core and a dedicated 32-cycle polynomial multiplication module for performing GHASH operations. Polynomial multiplication operates in parallel with the AES core, if data is available for both modules.
Depending on the key size (128, 192, or 256 bits), this core requires 32-, 38-, or 44-clock cycles to process one 128-bit data block. While one data block processes, the next block can be preloaded immediately. When a block is preloaded, the previous block must finish before additional data can be loaded. Therefore, when the pipeline is full, sequential data blocks can be passed every 32-, 38-, or 44-clock cycles.