SPRACX2 April 2021 TDA4VM , TDA4VM-Q1
In Computer Vision, the position of an object with respect to a vehicle is ascertained using images from two cameras, mounted on known disparate locations, looking at the object in question. In particular, key points from the object in both images are extracted and matched, and then using a process known as Triangulation the locations of the points that make up the object are deciphered. The process of distinguishing the position of a point in space using two cameras is known in the Computer Vision community as Stereo Vision, or Stereo Depth Estimation, and the set of points generated from all the correspondences in the two images is referred to as a point-cloud. Even as Stereo Vision is widely used by the automotive and robotics communities, it comes at a high system cost in terms of both dollars and image processing requirements, because it requires two high-precision cameras, capturing images at a relatively high frequency.
In contrast, Structure From Motion, or SFM, is an algorithm that can generate a point-cloud from a single camera in motion. As the name implies, in SFM, we have one camera which due to motion is in two distinct locations at two consecutive time instances, which effectively is the same as placing two cameras in distinct locations, given the objects in the frame have not moved between the two time instances, and given we know the relative motion of the camera. Thus, one can effectively use the same theory as in Stereo Vision to generate a point-cloud, from just one camera.
SFM algorithms come in two primary flavors, traditional Computer Vision based and Deep Learning based. Even though both flavors can be executed on TDA4VM, in this document, the focus in on the former, an algorithm based on traditional Computer Vision techniques. The point cloud generated by the SFM algorithm needs to then be utilized to generate a map of the surroundings, and in the application described here a 2D OG mapping approach is utilized for the mapping task.