SPRADE1 March 2024 AM2434
The macro in Figure 2-3 shows one of the ways in which serial data can be sent. The delay compensation allows for adjusting different bit rates. The transmit time for a single bit with no delay compensation should be 7 PRU clock cycles. If the PRU core is running at 333MHz, the cycle time is 21ns, or 47.6MHz.
One point to consider is that after the frame data exceeds the maximum length that the PRU core general-purpose registers can hold, additional latency in extracting data from external memory or PRU inside shared memory can directly affect the communication bandwidth. The bandwidth has to be reduced in the case of the higher throughout. In contrast to the pipelined CPU, the PRU_ICSSG integrates Broadside RAM (BS RAM) that can be accessed through instructions Xin and XOUT (10.6 GB/s). This data processing accelerator allows the PRU to read or write up to 32 bytes of data from the BS RAM in a single PRU clock cycle. This is useful in situations with large data volumes, or high communication bandwidth. Data can be temporarily placed within the BS RAM, waiting for the frame data to be transferred, before moving the data out to external memory or shared memory inside the PRU. The steps in Figure 2-4 are performed by the PRU firmware to write to the BS RAM.
It should be noted that Auto Index enable causes each PRU or RTU_PRU core’s write or read to the BS RAM causes the RAM Address to increment by 1. Each increment of the RAM address is the equivalent of 32 Bytes. The next read or write address will always be 32 Bytes later, regardless of the read or write data size. Based on the above discussion, Figure 2-5 and Figure 2-6 show the system block diagram for the PRU GPIO to implement serial communication based on direct mode and the software workflow.
The application core initializes the PRU core before the PRU can start normally, for example, PRU initialize the GPIO interface and Industrial Ethernet Peripheral (IEP), which is hardware work required for Industrial Ethernet functions. The IEP module features an industrial Ethernet timer with 16 compare events, industrial Ethernet sync generator and latch capture, industrial Ethernet watchdog timer, and a digital I/O port (DIGIO). In this case, the data in Tightly Coupled Memory (TCM) can be periodically query through event triggered by the IEP. IEP timer trigger period is set to 10kHz (100μs) to poll the valid data in TCM in the task. It is important to protect context with the Xin, Xout instructions in the task to avoid overlap with register, then use the XFR2VBUS DMA widget to load the data at the address specified by tightly coupled memory (TCM). The XFR2VBUS is a simple hardware accelerator which is used to get the lowest read round trip latency from Memory Subsystem Multi-core (MSMC) and to decouple the latency seen by the PRU. When application core stores the frame data into the TCM, PRU will uses the header data to determine if a data transmission needs to be performed, and pre-stores the data into BSRAM before performing the data transmission to eliminate the effects of memory access latency.
Another method to implement a serial port is GPIO shift out and in mode. In shift out mode, data is shifted out of PRU_DATAOUT pin on every rising edge of PRU_CLK. The shift rate is controlled by the effective divisor of two cascaded dividers applied to the PRU core clock. To achieve a 100Mbit data rate with serial communication, a 100 MHz clock has to be used with a 250 MHz core clock divided by three. To avoid the effect of the data loading process on serial communication, the PRU shift data mode provides two shadow registers (GPO_SH0 and GPO_SH1) that can be used to support ping-pong buffers. The shadow register can be programmed independently via PRUx_R30[29:30], when PRUx_R30[29:30] is set, the data in PRUxR30[0:15]/ [15:0] is loaded into GPO_SH0 and GPO_SH1. The PRU shift-out mode can be set to Free Running mode and Fixed Clock Count Mode. For Free Running Clock Mode, the shift operation will continue until PRUx_ENABLE_SHIFT(PRUxR30[31]) is cleared. When PRUx_ENABLE_SHIFT is cleared, the shift operation will finish shifting out the current shadow register, stop, and then reset. For Fixed Clock Count mode, the number of data bits to be shifted out is defined by configuration register. Figure 2-7 shows the system architecture of the shift out mode.
In 28-bit shift-in mode, general-purpose input pin PRU_DATAIN is sampled and shifted into a 28-bit shift register on an internal clock pulse. The register fills in least-significant bit (LSB) order (from bit 0 to 27) and then overflows into a bit bucket. The 28-bit register is mapped to PRU_R31[0:27], after the start bit (1/0 ) is detected by the shift in, Cnt_16 is set for every 16 shift clocks and serial data mapped to R31 registers can be copied to other system registers. Similarly, to shift-out mode, a 100MHz sampling clock with 250-MHz core clock is selected. Figure 2-8 shows the block diagram of the PRU GPI shift-in mode. In this application, the transmission is run on Free-Running mode. Figure 2-9 shows the programming work flow.
It should be noted that SH0 and SH1 have a depth of 16 bits, and with divide factor of 3, the data load and process time must less than 48 PRU cycles. In shift in mode, the data input node does not know the total byte length of this communication, so it is possible to include the total data length in the start frame to define the number of cycles for the shift in mode.
For more PRUs for serial communication, see the Intra Drive Communication Using 8b-10b Line Code with Programmable Real Time Unit, and FSI Bandwidth-Optimization for Multi-axis Servo Control