SPRUJ28E November 2021 – September 2024 AM68 , AM68A , TDA4AL-Q1 , TDA4VE-Q1 , TDA4VL-Q1
The proxy provides a mechanism for software to access the RA coherently when the processor does not support large bursts. The RA requires a single burst for each operation so that it maintains atomicity and coherence by relying on the atomicity of the bursts on the bus interconnect fabric, since it only delivers a single burst to the RA before the next. Since processors are usually limited in the size of a burst they can deliver natively, such as 32 to 128 bits, they cannot be used directly with the data access region of the RA for larger element sizes. The proxy solves this gap by providing a temporary space for the software to form a larger burst of data before it is sent to the RA. The proxy allows smaller processor accesses to the data and only forwards to the RA when the entire data burst is complete, as directed by the software. Each proxy hardware can support multiple threads of software (whether on the same or different processors) operating on bursts at the same time, and they do not interfere with each other, as long as each thread of software only operates on its thread in the proxy. Each proxy thread also supports access to any RA queue via different offsets similar to how the RA provides access to the queues via different offsets.
For writing data to the RA, the software will write the data in native sized writes to the proxy thread it has been assigned. The proxy simply accepts the writes into a buffer reserved for that thread. Only when the software writes to the completion byte offset (last byte of the burst) then will the proxy take the entire burst of data written (including the final write data) and send it to the RA as a single write burst. After the completion write, the software can begin building a new data burst. For reading data from the RA, the software will read the data in native sized reads to the proxy thread. For the first read, the proxy will read the entire burst from the RA, so that it atomically gets the next element off the queue. This entire data is stored in the proxy thread buffer, and the requested portion is returned to software. Then software can read any location within the burst and the proxy will read it from the buffer. When software is done reading the data, it must read the completion byte offset so that the proxy 'knows' that the read burst is completed, and that for the next read it must read a burst from the RA queue again. A single proxy thread can only be used for a single data burst at a time, so once starting a write it should complete before starting another write or a read, and similarly once starting a read, it should complete before starting another read or a write. If software needs to read and write at the same time, it should use two proxy threads. Similarly if there are multiple threads of software that need to access RA at the same time, they should use separate proxy threads.