The following is the sequence of events that FFI go through from failure to
recovery:
- CPU crash: One or more of the masters in ASIL-B domain (C7x/ARM/DRU)
crashes. The crash is manifested as the master not being able to respond to
commands. For example, CPU1 crashes and stops responding to commands.
- Crash detection: The timeout gasket connected to the malfunctioning CPU1
generates timeout interrupt gskt_cpu1_if_timed_out_intr. For proper
interrupt generation make sure gskt_cpu1_if_enable_timeout_detect is driven
high and a valid timeout value on gskt_cpu1_if_timeout_val is driven as
well. These MSMC-level tieoffs are driven by SoC level control registers.
gskt_cpu1_if_timed_out_intr is sticky and needs to be cleared during
recovery.
- FFI req generation: When the SoC interrupt controller receives the timeout
interrupt, gskt_cpu1_if_timed_out_intr, it will trigger the corresponding
FFI req, ffi_cpu1_if_ffi_req.
- MSMC FFI mode: MSMC receives ffi_cpu1_if_ffi_req and enters FFI mode where
it will kill the commands from and responses to CPU1 at the interface. It
will internally unwind snoop responses and credit returns that CPU1 is
otherwise responsible for. During this time ASIL-D to ASIL-D traffic will be
uninterrupted. For example, SoC to DDR or MSMC SRAM will proceed. Any access
to CPU1 (SDMA) from other masters will not receive responses during this
time. Since software is aware that CPU1 is FFI, it will stop sending new
commands to CPU1. There is a scenario where ASIL-B master could have sent a
command to CPU1 just before it entered FFI, that’s the reason all ASIL-B
masters are forced to FFI in a typical use case. If ASIL-D master sends the
command to CPU1 just before it enters FFI, then the ASIL-D master’s timeout
gaskets will trigger unwinding of that ASIL-D master.
- Power-down and Reset: FFI req is followed by the usual power-down and reset
sequence. So, ffi_cpu1_if_ffi_req is followed by issuing
cpu1_pwr_scr_disable_req and waiting for cpu1_pwr_scr_disable_ack. Once the
ack is received CPU1 can be safely reset.
- Recovery: When CPU1 is in reset, ffi_cpu1_if_ffi_req will be de-asserted ,
followed by reset de-assertion and cpu1_pwr_scr_disable_req de-assertion.
Now, CPU1 can resume normal functional operation.