SPRADE2 White paper

SPRADE2 October 2023 AM69A

2.2 Graph SLAM

Figure 2-1 Graph With Poses of the Robot and Observations

The graph SLAM treats the SLAM problem as non-linear optimization with a graph structure, where nodes represent the poses of the robot and observations at different times and edges represent constraints between the poses. Figure 2-1 represents a toy example illustrating how the graph is built with the poses and observations of the robot as the robot moves. x_i, x_j, and x_k are the poses of the robot and P₀ and P₁ are the landmarks, objects, or points observed by camera. Here it is assumed that P₀ is observed from x_i and x_j and P₁ from x_j and x_k. And p_nt, where n = 0, 1 and t = i, j, k, is defined as the position of P_n when observed from x_t. The goal of the graph SLAM is to find the transformations between the poses of the robot, that is, z’_ij and z’_jk so that the same observations from different poses overlap maximally. Therefore, determine x_t, where t = i, j, k, so that (p_0i – p_0j)² + (p_1j – p_1k)² is minimized.

Figure 2-2 Graph SLAM Process

Figure 2-2 shows the graph SLAM process, which comprises two main components, the front end and the back end. The front end processes input sensor data to estimate the poses of the mobile robot and the distinctive features around the robot. The front-end processing consists of the following steps:

Data acquisition: Data is captured continuously from sensors such as a camera and LiDAR. The captured sensor data is pre-processed here. For example, ISP, lens distortion correction, rectification of stereo image, and motion compensation of LiDAR point cloud are performed during data acquisition. The captured data within a fixed time interval forms a frame.
Feature extraction: Distinct and descriptive features are extracted from each frame. These features can be key points in images (visual SLAM) or geometric features such as edges, planes and scanned 2D or 3D points themselves (LiDAR SLAM).
Feature association: The mobile robot associates the extracted features across different frames to determine which features correspond to the same ones in the environment.
Pose estimation: Based on the associated features, the mobile robot estimates the changes of its position and orientation between two frames and therefore estimates the new pose of the mobile robot. Once the pose is estimated, the map can be updated using the associated features.

The poses of the mobile robot estimated in the front end are not error free and the errors are accumulated as the mobile robot traverses, which can result in huge drift errors. The back end is responsible for refining the estimated poses and updating the map. This refinement consists of the following steps:

Loop closure detection: This is the process that determines if the current location has been visited before. Identifying the visited locations is critical in reducing drift error with graph optimization.
Graph optimization: A graph is maintained in the graph SLAM as shown in Figure 2-1. The goal is to update the previously-estimated poses so that the observation, for example, features, from one pose maximally overlap with the same observation from other poses. Non-linear optimization techniques like Newton, Gauss-Newton, and Levenberg-Marquardt (LM) optimizations are used for this purpose.

In general, the created map is a set of the extracted features with their positions and descriptors in visual SLAM and a set of geometric features or points themselves in LiDAR SLAM. In many use cases, the map is also saved in the form of occupancy grid map after post-processing to make obstacle detection and path planning easier.