Information Gain Is Not All You Need

Ludvig Ericson, José Pedro, Patric Jensfelt

These authors contributed equally to this work.

Autonomous exploration in mobile robotics is driven by two competing objectives: coverage, to exhaustively observe the environment; and path length, to do so with the shortest path possible. Though it is difficult to evaluate the best course of action without knowing the unknown, the unknown can often be understood through models, maps, or common sense. However, previous work has shown that improving estimates of information gain through such prior knowledge leads to greedy behavior and ultimately causes backtracking, which degrades coverage performance. In fact, any information gain maximization will exhibit this behavior, even without prior knowledge. Information gained at task completion is constant, and cannot be maximized for. It is therefore an unsuitable choice as an optimization objective. Instead, information gain is a decision criterion for determining which candidate states should still be considered for exploration. The task therefore becomes to reach completion with the shortest total path. Since determining the shortest path is typically intractable, it is necessary to rely on a heuristic or estimate to identify candidate states that minimize the total path length. To address this, we propose a heuristic that reduces backtracking by preferring candidate states that are close to the robot, but far away from other candidate states. We evaluate the performance of the proposed heuristic in simulation against an information gain-based approach and frontier exploration, and show that our method significantly decreases total path length, both with and without prior knowledge of the environment.

Figure 1: Information Gain and Travel Distance.

Distance at completion dT for a selection of gain affinities λ, where a higher λ means stronger preference for gain and a lesser concern with the length of the path to acquire it. Naive gain refers to the assumption that unknown space is occlusion-free, i.e., yields maximal gain; in true gain, the real would-be sensor scan is used for gain computation. Tellingly, negative affinities, i.e., minimizing gain, results in a lower dT than maximization, and no choice is substantially better than nearest frontier, i.e., λ = 0.

Figure 2: Distance Advantage.

Illustration of distance advantage in the beginning of exploration. The robot (star) preferentially explores frontiers (solid coloring) with higher distance advantage. It is heading towards a closed off room because it is nearer that region than it would be from most other places. By contrast, its distance to the corridor is higher than it would be elsewhere, repelling it from that region.

Figure 3: Environments.

Data is collected in three diverse environments: a large office from a real-world floor plan with both small cubicles and large lecture halls, a non-rectilinear cave environment with many small pockets, and a labyrinth-like maze with both shallow and deep dead-ends. Pink circles indicate starting locations, the light blue region depicts a sensor scan from the point of view of an example starting location indicated by the brown star polygon. The zoomed region is the same size as a local window for the planner.

Table 1: Results.

Distance at completion when there is mismatch between the predictions and the environment, due to clutter. The environment and prediction clutter are independently sampled. Data collected across 10 runs for each method/environment/prediction tuple from different starting locations.

Figure 3: Coverage and Frontier Size.

The effect of prediction range cp on completion distance dT in the office environment. Data collected across 10 runs from different starting locations, for each method and prediction range. Error bars represent one standard deviation.

Figure 4: Coverage and Frontier Size.

Comparison of coverage c(d) and total frontier size f(d) as functions of distance traveled d. The shaded areas indicates an 80 % confidence interval, the solid line indicates the mean. Data collected across 10 runs for each method from different starting locations in the office environment.