SSZT722 may 2018 AWR1243
The robustness and reliability of a self-driving car’s computer vision system has received a lot of news coverage. As a vision software engineer at TI helping customers implement advanced driver assistance systems (ADAS) on our TDAx platform, I know how hard it is to design a robust vision system that performs well in any environmental condition.
When you think about it, engineers have been trying to mimic the visual system of human beings. As Leonardo DaVinci said, “Human subtlety will never devise an invention more beautiful, more simple or more direct than does nature, because in her inventions nothing is lacking and nothing is superfluous.”
Indeed, I was recently painfully reminded of this fact. On March 2, 2018, I caught an eye infection which was eventually diagnosed as a severe adenovirus. After a month, my vision is almost back to normal. Throughout my health ordeal, I learned a few things about the human visual system that are applicable to our modern-day challenge of making self-driving cars.
From these observations, I can draw these conclusions about autonomous driving: for each position around the car where vision is used for object detection, there should be multiple cameras (at least two) pointing to the line of sight. This setup should be in place even when the vision algorithms only need monovision data.
Sensor multiplicity allows for failure detection of the primary camera by comparing images with auxiliary cameras. The primary camera feeds its data to the vision algorithm. If the system detects a failure of the primary camera, it should be able to reroute one of the auxiliary cameras’ data to the vision algorithm.
With multiple cameras available, the vision algorithm should also take advantage of stereo vision. Collecting depth data at a lower resolution and lower frame rate will conserve the processing power. Even when processing is monocamera by nature, depth information can speed up object classification by reducing the number of scales that need processing based on the minimum and maximum distances of objects in the scene.
TI has planned for such requirements by equipping its TDAx line of automotive processors with the necessary technology to handle at least eight camera inputs, and to perform state-of-the-art stereo-vision processing through a vision accelerator pack.
If the system detects low-light conditions, the vision algorithm should switch to a low-light mode. This mode could be implemented as a deep learning network that is trained using low-light images only. Low-light mode should also rely on data from an offline map or an offline world view. A low-light vision algorithm can provide cues in order to find the correct location on a map and reconstruct a scene from an offline world view, which should be enough for navigation in static environments. In dynamic environments, however, with moving or new objects that were not previously recorded, fusion with other sensors (LIDAR, radar, thermal cameras, etc.) will be necessary in order to ensure optimum performance.
TI’s TDA2P and TDA3x processors have a hardware image signal processor supporting wide-dynamic-range sensors for low-light image processing. The TI Deep Learning (TIDL) library implemented using the vision accelerator pack can take complex deep learning networks designed with the Caffe or Tensor flow frameworks and execute them in real time within a 2.5W power envelope. Semantic segmentation and single-shot detector are among the networks successfully demonstrated on TDA2x processors.
To complement our vision technology, TI has been ramping efforts to develop radar technology tailored for ADAS and the autonomous driving market. The results include:
If you are involved in developing self-driving technology, I hope my experience will inspire you to make your computer-vision systems more robust.