2. 中山大学 广州 510275
2. School of Data and Computer Science, Sun Yatsen University, Guangzhou 510275, China
Environmental mapping has always been one of the most active areas in mobile robotics research for decades. Mobile robots use sensors to perceive the surrounding environment and build environment maps, which provide essential information for the navigation, localization, and path planning of mobile robots.
Many scholars have been dedicated to the environment mapping research of mobile robots to promote their applications in various fields. The primary models of this field include metric models [1], topology models [2], and hybrid models [3], which are used in different environments. Metric models, which include occupancy grids and feature maps, are often applied to small environments, e.g., a room or a small building, and metrical methods are effective in building accurate geometrical models of smallscale environments. A topological map models the world as a collection of nodes, which can be paths or places. Thus, topological maps can describe a largescale environment easily. However, topological representations lack details such as the structures associated with the nodes and the relationships described as links. Hybrid models combine the strengths of metrical and topological approaches and are suitable for modeling largescale spaces. These proposed models are used to describe the environment with 2D maps, which are widely adopted in mobile robots because of their relatively small computation requirements [4][6].
Although 2D maps have the advantages of simplicity and small computation, these maps provide limited information, particularly height information; thus, they are unsuitable for use in complex environments. For example, obstacles above the ground (mobile robots can pass through) are difficult to be described by using 2D maps (e.g., fourlegged tables in a room, holetype passages, and external components of tunnels). Many researchers have expanded these 2D representations to 3D maps [7][10], which may be used to safely navigate a complex environment and perform other useful tasks (e.g., local localization).
Point cloud models and voxel based grid models are the major existing 3D models. For example, the 3D laser mapping [11] and 3D RGBD camera mapping [12]. However, the huge amounts of data have to be processed for the point cloud model. It greatly reduces the efficiency of the system. Also, the point cloud data are easily influenced by noise and cause the accuracy of modeling to deteriorate. Grid maps are widely used in the environment modeling. However, the runtime and accuracy highly depend on the degree of discretization. Environment mapping for mobile robots brings great challenges, especially when robots work in an outdoor environment. For example, uncertainty of the environment, illumination variation, etc., which may deteriorate the accuracy of modeling. This paper uses the saliency based modeling to improve these performances in construction of environment 3D maps.
Visual attention is the ability of human vision system to rapidly select the most relevant information in the visual field [13]. In computer vision, the advancements obtained by simulating visual attention have accelerated the development of computer models. In some applications of mobile robots such as navigation, simultaneous localization and mapping (SLAM), and local planning, the robots need to model only nearby objects at every moment (we consider these objects as relevant) and not all the objects in the environment. Thus, visual attention models are suitable for mobile robots in modeling the environment. One of the most popular visual saliency models was proposed by Itti et al.[13], namely, the datadriven attention model, which computes the multiscales feature contrasts of input images (such as intensity, color, and orientation features) by using a difference of Gaussian filter and linearly combines the features conspicuous maps to produce a master saliency map. After nearly 20 years of development, researchers have proposed many computational visual attention models [14], including Bayesian surprise models, taskdriven models, etc. These models have been applied in many fields of robotics [15].
Motivated by the advantages of visual attention models, we intend to apply these models to robot environment modeling. In this paper, we first present a visual saliency model that extracts the visual features from the scene and combines the features with the mean shift algorithm by using motion contrasts to detect the saliency of objects. We then adopt a stereo visual method and Bayesian estimation algorithm to construct and update the 3D grid model of the environment. The evaluated system and the block diagram of the proposed method are shown in Figs. 1 and 2. The dash box in Fig. 2 is used to create the saliency model.
The remainder of this paper is organized as follows. Section 2 provides an overview of related works. Section 3 describes the proposed visual saliency models. Section 4 explains the construction of the 3D environment. Section 5 provides some experiments to verify the effectiveness of our proposed method. Finally, Section 5 concludes.
2 Related WorkAttentiondriven visual saliency allows humans to focus on conspicuous objects. By utilizing this visual attention mechanism, mobile robots can be made to adapt to complex environments. Currently, visual saliency models are used in many fields of mobile robots, such as visual simultaneous localization and mapping (VSLAM), environment modeling, conspicuous object detection, and local localization [15].
Extracting useful landmarks is one of the main tasks of VSLAM. Frintrop et al. [16] presented a method of selecting landmarks on the basis of attentiondriven mechanisms that favor salient regions. The feature vectors of salient regions are considered landmarks, and their results have shown that their method facilitates the easy detection, matching, and tracing of landmarks compared with standard detectors such as HarrisLaplacians and scaleinvariant feature transform (SIFT) key points. Newman et al. [17] used the entropy of the saliency field as landmarks and applied it to the loopclosing task. However, the features extracted with the saliency model are distinct and sparse and are unsuitable for environment mapping.
Ouerhani et al. [18] adopted a multiscale saliencybased model of visual attention to automatically acquire robust visual landmarks and constructed a topological map of the navigation environment. Einhorn et al. [19] applied attentiondriven saliency models to 3D environment modeling and mainly focus on the feature selection of image areas where the obstacle situation is unclear and a detailed scene reconstruction is necessary. Roberts et al. [20] presented an approach of tree mapping in unstructured outdoor environments for visionbased autonomous aerial robots. The visual attention models they used target the saliency of nearby trees and ignore innocuous objects that are far away from the robots. This approach is related to our work; however, the motion parallax was used to describe the saliency in these models and the camera rotation was to be estimated.
Compared with 2D maps, 3D maps can provide detailed information and are suitable for mobile robot applications, such as local localization and navigation. Threedimensional grid maps are one of the most used metric approaches because of their simplicity and suitability for all types of sensors. Souza et al. [7] used a grid occupancy probability method to construct the 3D environment of mobile robots. They used stereo vision to interpret the depth of image pixels and deduce a sensorial uncertainty model related to the disparity to correct the 3D maps; however, the cost of computation was expensive. By using the data acquired by laser range finders, Hähnel et al. [8] combined robot pose estimation with an algorithm to approximate environments by using flat surfaces to reconstruct the 3D environment. Given the large data and to accelerate the computation, they used the splitandmerge technique to extract the lines out of the individual 2D range scans; however, their method requires expensive laser sensors. Pirker et al. [9] presented a fast and accurate 3D environment modeling method. They used depth image pyramids to accelerate the computation time and a weighted interpolation scheme between neighboring pyramids layers to boost the model accuracy. The algorithm is real time but is implemented on the GPU. Kim et al. [10] proposed a framework of building continuous occupancy maps. They used a coarsetofine clustering method and applied Gaussian processes to each local cluster to reduce the highcomputational complexity. The experiment results with real 3D data in largescale environments show the feasibility of this method.
Most saliency models compute primitive feature contrasts such as intensity, orientation, color, and others (e.g., edge gradient) to generate saliency, and these models are widely used for robots in landmark selection, and object tracking or location. However, these models are unsuitable for environment mapping. Our system uses motion contrasts to construct the saliency model, which is conducive for mapping close and conspicuous objects in the applications of robots such as navigation and VSLAM. In 3D mapping, we approximate the saliency value as a priori information on the occupancy of the cell. This scheme avoids the need to train the grid occupancy maps created previously. Many objects exist in the visual scene of robots, and the selective mapping and priori information approximation of 3D cells significantly improve the efficiency of system mapping and reduce interferences such as illumination changes.
3 Visual Saliency ModelingCameras capture more detailed information in every moment than range finders, such as laser scanners. However, not all objects in the environment have an effect on mobile robots. For example, in the application of mobile robots such as navigation, only some nearby obstacles hinder the motion of mobile robots and objects that are far from the mobile robots have a minimal effect at times. If we only model the objects that have considerable effects on mobile robots in every moment, the entire system will be simplified. Visual saliency modeling (VSM) makes the robots focus on conspicuous objects. If the conspicuous objects are defined as nearby objects, environment modeling will be easily constructed.
In the visual scene of mobile robots, only some objects in the environment affect the robots in a moment, e.g., objects that hinder the motion of mobile robots. Thus, the saliency model needs to highlight these conspicuous obstacles in the robot environment. Here, we use the distance between an obstacle and the mobile robot as the main parameters of saliency. The entire structure of the saliency model is shown in Fig. 2 in the dashed box.
In the navigation of mobile robots, the scene captured by the visual system of the robots is always dynamic. Many previous works have focused on the detection of feature contrasts to trigger human vision nerves. This type of detection is usually referred to as the "stimulidriven" mechanism. To detect nearby objects, we use a distancepotential function
$ \begin{equation} \label{eq1} \phi =\frac{1}{2}k_p \left( {\frac{1}{r}\frac{1}{r_0 }} \right)^2. \end{equation} $  (1) 
Equation (1) shows that a smaller distance
$ \begin{equation} \label{eq2} s=\frac{d\phi }{dt}=k_p \left( {\frac{1}{r}\frac{1}{r_0 }} \right)\frac{1}{r^2}\frac{dr}{dt}. \end{equation} $  (2) 
Assume that the center coordinate of the mobile robot is
$ r=\sqrt {(x_ix)^2+(y_iy)^2+(z_iz)^2} . $ 
By substituting
$ \begin{equation} \label{eq3} s=k_p \left( {\frac{1}{r}\frac{1}{r_0 }} \right)\frac{1}{r^3}\left[{\left( {x_ix} \right)v_x +\left( {y_iy} \right)v_y } \right]. \end{equation} $  (3) 
We model the motion contrast with (2) and consider the influence of the position of
$ \begin{equation} \label{eq4} {\rm sal} (p_i )=k_p k_d \left( {\frac{1}{r}\frac{1}{r_0 }} \right)\frac{1}{r^3}\left[{\left( {xx_i } \right)v_x +\left( {yy_i } \right)v_y } \right] \end{equation} $  (4) 
where
$ \begin{equation} \label{eq5} \Delta \varphi _i =\tan ^{1}\left( {\frac{v_y }{v_x }} \right)\tan ^{1}\left( {\frac{y_i y}{x_i x}} \right). \end{equation} $  (5) 
Equation (5) shows that point
$ \begin{equation} \label{eq6} k_d =\left\{ {{\begin{array}{*{20}c} {1\dfrac{\Delta \varphi _i }{\pi }}, \hfill \\ 1, \hfill \\ {1+\dfrac{\Delta \varphi _i }{\pi }}, \hfill \\ \end{array} }} \right.{\begin{array}{*{20}c} \hfill \\ \hfill \\ \hfill \\ \end{array} }{\begin{array}{*{20}c} \hfill \\ \hfill \\ \hfill \\ \end{array} }{\begin{array}{*{20}c} {\Delta \varphi >0} \hfill \\ {\Delta \varphi =0} \hfill \\ {\Delta \varphi <0} \hfill \\ \end{array} }. \end{equation} $  (6) 
The motion saliency of points on obstacles can be calculated by using (4), and the result shows higher saliency when the mobile robot moves closer to the front obstacles.
In (4) and (6), the position of points on objects can be acquired from the point cloud data obtained by using the stereo system. To reduce computational complexity, we use visual features such as SURF [21] or SIFT [22] to calculate the saliency of points. These features are widely used in the fields of image registration, matching, and object recognition because of their exceptional invariance performance in translation, rotation, scaling, lighting, and noise. The
Our target is to acquire the saliency of an object. Here, we use the conspicuous features to segment the objects. We first adopt the mean shift algorithm [23] to segment the objects and then use the conspicuous features to merge the segmented regions to generate the conspicuous objects. The mean shift algorithm is a robust featurespace analysis approach that can be applied to deal with discontinuity, preserving smoothing, and image segmentation problems. The algorithm uses an adaptive gradient ascent method to decompose an image into homogeneous tiles, and the mean shift procedure is not computationally expensive. The conspicuous objects acquired with the method will be applied to environment modeling.
4 Environment ModelingWe use volumetric models [24] (3D occupancy grid modeling) to construct the environment of mobile robots. The volumetric models adopt the occupancy of each independent cell to describe the position distribution of obstacles in the environment. These models have been extensively used in robotics because of their simplicity and suitability for various types of sensors. To reduce the uncertainty of mapping introduced by sensor noise, we use the probability of occupancy to model the environment of mobile robots.
We suppose that the robotic environment workspace W is discreted into equally sized voxels
$ \begin{equation} \label{eq7} p (v_i^t \vert m_i^t )=\frac{1}{\alpha }p(m_i^t \vert v_i^t )p(v_i^t \vert m_i^{t1} ) \end{equation} $  (7) 
where
Our mobile robot is equipped with a binocular stereo visual system, and the 3D positions of points in the map coordinate frame can be calculated by using the robot pose and point disparity [26]. The error of position increases with distance because of the uncertainty of sensors. Reference [27] indicated that the error was proportional to the square of the distance:
$ \begin{equation} \label{eq8} \Delta z=\frac{z^2}{Bf}\Delta d \end{equation} $  (8) 
where
$ \begin{equation} \label{eq9} p (m_i^t \vert v_i^t )=\frac{k_m }{\Delta r\sqrt {2\pi } }\exp \left[{\frac{1}{2}\left( {\frac{r_i\bar {r}}{\Delta r}} \right)^2} \right] \end{equation} $  (9) 
where
A priori information about the occupancy of the cell can be obtained by training the grid occupancy maps created previously [25]; however, this procedure requires a large computation. Our proposed saliency model highlights nearby conspicuous objects in the robot environment and not all the objects. If we approximate the priori information with the feature saliency, it will reduce the cost of the computation. In (7),
$ \begin{equation} \label{eq10} p (v_i^t \vert m_i^{t1} )=\left\{ {{\begin{array}{*{20}c} {\zeta _i }, \hfill \\ {0.5}, \hfill \\ \end{array} }} \right.{\begin{array}{*{20}c} \hfill \\ \hfill \\ \end{array} }{\begin{array}{*{20}c} {z\le 3} \hfill \\ {z>3} \hfill \\ \end{array} } \end{equation} $  (10) 
where
We define the occupancy probability of a grid as 0.5 in (10) because the uncertainty of the sensors quickly increases with distance when the depth measured is over 3 m, i.e., it is an unknown state. We also approximate the probability by using the saliency value of the feature points.
4.3 Obstacles OcclusionThreedimensional grid modeling uses a space occupancy grid to describe the environment. In this section, we only discuss the static environment. We define obstacles as the static objects aboveground, such as the indoor wall, table, outdoor tree, and pillar.
Our mobile robot is equipped with stereo vision. However, considering the installation's height restriction and that the mobile robot always detects objects from one side, occlusion will occur. Furthermore, identifying the grid occupancy state of entire object in maps will be difficult. We use the method of projection to calculate the occupancy range of the objects in the maps, and this method is shown in Fig. 4.
Fig. 4 is a side view of object projection. The shadow is an object that is divided into grid cells, and the others are the grid cells after projection. Suppose that the object occupancy width in a map is
$ \begin{equation} \label{eq11} w=\left\{ {{\begin{array}{*{20}c} {\dfrac{h_o D}{h_c h_o }}, \hfill \\ \infty, \hfill \\ \end{array} }} \right.{\begin{array}{*{20}c} \hfill & {h_c >h_o } \hfill \\ \hfill & {h_c \le h_o } \hfill \\ \end{array} } \end{equation} $  (11) 
where
In mapping, the mobile robot must update the grid occupancy in time by using its visual system. The robot perceives the environment, obtains data, and calculates
$ \begin{equation} \label{eq12} p (v_i^t \vert m_i^t )={\rm min}\left[{p (v_i^t \vert m_i^t ), p (v_i^{t1} \vert m_i^{t1} )} \right] \end{equation} $  (12) 
where
To decrease the mismatching features and remove the useless features, we add the following constraints to extract the matching features according to [8]:
1) All the points in
2) All the matching points that do not meet the condition are removed as
In the grid maps, we use the distancepotential gradient based on feature keypoints to calculate the saliency value of keypoints on nearby objects of the robot, and use these feature keypoints to describe the grid occupancy of the obstacles. However, when the number of these points is insufficient to describe the entire occupancy state of obstacles, these obstacles have to be segmented. We first use the mean shift algorithm [23] to segment the scenes scanned by cameras, and then combine these feature keypoints using similarity of features to segment these objects, and apply the stereo visual algorithm [26] to determine the positions of these objects.
5 Experiment Results and AnalysisTo verify the ability of the proposed method to construct the environment model, we perform a series of experiments containing VSM and indoor and outdoor environment mapping, and carry out some performance evaluation tests in different conditions. The whole 3D map is constructed as the robot moving forward, and we use the method of keypoints matching in consecutive frames to avoid previously visited areas being remapped.
In our experiments, the mobile robot we used is as shown in Fig. 1, and the ground is a plane. The mobile robot is equipped with a stereo camera pair, and the installation height of the cameras is
$ A_{\rm L} =\left[{{\begin{array}{*{20}ccccc} {1015.6} \hfill & ~~\!~~0 \hfill & {357.7} \hfill \\ ~~\!~~0 \hfill & {1073.5} \hfill & {264.3} \hfill \\ ~~\!~~0 \hfill & ~~\!~~0 \hfill & ~~\!~~1 \hfill \\ \end{array} }} \right] $ 
$ A_{\rm R} =\left[{{\begin{array}{*{20}ccccc} {1037.0} \hfill & ~~\!~~0 \hfill & {339.0} \hfill \\ ~~\!~~0 \hfill & {1099.2} \hfill & {289.2} \hfill \\ ~~\!~~0 \hfill & ~~\!~~0 \hfill & ~~\!~~1 \hfill \\ \end{array} }} \right]. $ 
The rotation matrix
$ R_0 =\left[{{\begin{array}{*{20}cccccc} ~~\!~~1 \hfill & {0.0059} \hfill & ~~{0.0060} \hfill \\ ~~{0.0062} \hfill & ~~{0.9987} \hfill & {0.0503} \hfill \\ {0.0057} \hfill & ~~{0.0503} \hfill & ~~{0.9987} \hfill \\ \end{array} }} \right] $ 
$ T_0 =\left[{{\begin{array}{*{20}c} {111.191} \hfill & {1.395} \hfill & {6.279} \hfill \\ \end{array} }} \right]~~~~~ $ 
where
In the saliency modeling experiments, we use the visual features to construct the saliency maps. However, the number of features extracted from some objects may be not enough to show the saliency of these objects. Hence we first adopt the mean shift algorithm [23] to segment the objects, and then merge the segmented regions with these features to acquire the saliency of the objects. If the surface textures of the larger objects show a significant difference and are far apart from each other, the saliency may be different, as shown in Fig. 6.
Fig. 6(a) is a constructed visual saliency map with the conspicuous SURF features, and it is obtained by using (4) and (6) combined with the mean shift algorithm. In Fig. 6, the SURF features on ground have been removed. There are eight conspicuous regions in this figure, and first has the highest saliency and eighth has the least. In Region 1, the features are nearest to the mobile robot and have the smallest deviation relative to the moving direction of the robot. In Region 2, the features have smaller deviation, however, these are farther from the robot compared with Region 1. Hence the saliency in Region 2 is lower than Region 1. Regions 7 and 8 have a larger deviation and are far away from the robot, thus their saliency is the lowest. In right of Fig. 6(b) is the conspicuous SURF features map, and the saliencies of features are different.
We first performed the 3D grid modeling experiments in an indoor environment with a size of approximately 5.0 m
Fig. 9 is an outdoor 3D grid map created with conspicuous SURF features, and the size of scene is approximately 30.5 m
In our method, we use the visual features to construct 3D environment modeling, and the feature type is one of the most influential choices for accuracy and runtime performance. We use SIFT, SURF, and HarrisLaplace to evaluate our method in indoor and outdoor environments. Twenty places (see the circles in Figs. 8 and 9) are carefully selected to evaluate the performance. The plots in Fig. 10 show the processing time comparison per frame in indoor and outdoor environments, whereas Fig. 11 shows the mapping accuracy in these places. It is very difficult to evaluate all the mapping errors. We select 10 conspicuous feature points in specified locations (as shown in Figs. 8 and 9 and compute their mapping error. The comparison results clearly show that outdoor modeling requires more time and provides lower mapping accuracy than indoor modeling. More saliency features are extracted and some features easily become unstable because of the effect of light outdoors, particularly for the HarrisLaplace. In contrast, SIFT provides the highest accuracy (median is approximately 0.07 m), however, the processing time is also highest, and HarrisLaplace is the opposite. SURF offers a tradeoff choice for a robot with limited computational resources or applications that require realtime performance.
Illumination changes may make the visual features become unstable, and the shadows of objects cast by light may change. These factors will cause false objects to occur in mapping, and reduce mapping accuracy. In our experiments, we evaluated the robust performance of the proposed method by the effects of light in an indoor environment. We control four lamps mounted on the ceiling to simulate illumination changes. In mapping, we use methods of VO (visual odometry) and SLAM to evaluate mapping accuracy, and the results are shown in Fig. 12. Results of experiments show that mapping accuracy deteriorates because of illumination, particularly the mapping error obtained using point clouds subsampling (PCS) (see the dash lines in Fig. 12). The solid lines are the results of our method, and it shows that the error reduces relatively little by the effects of light and shows good robust performance. This may be explained by the fact that the method of VSM gives priority to the closer objects when the robot is mapping the environment.
For the runtime, we compared our approach with some of the discussed approaches such as occupancy grid mapping (OGM) and occupancy mapping with pyramid (OMP) [9]. The experiments are performed in an indoor corridor shown in Fig. 6. The size of grid is 0.1 m
In this paper, we presented a method of 3D grid modeling based on visual attention. VSM makes the front objects (that are closer to the mobile robots) exhibit larger conspicuousness. This can be beneficial for mobile robots to model the closer objects with higher priority, thus avoiding the modeling of all the objects at every moment, reducing computing time in updating the map of the environment. In the constructed method of VSM, we define the distancepotential gradient as the motion contrast, and combine the visual features with the mean shift segmentation algorithm to determine the saliency of the objects. To create the 3D environment maps, we use stereo visual method to calculate the positions of conspicuous visual features and objects, and combine Bayes' theorem with the sensors modeling, grid priori modeling, and projection method to update the occupancy probability of the grids. Finally, a series of experiments that include saliency modeling, indoor and outdoor grid modeling, and performance tests are adopted to evaluate the presented method.
1 
T. K. Lee, S. H. Baek, Y. H. Choi, and S. Y. Oh, "Smooth coverage path planning and control of mobile robots based on highresolution grid map representation, " Robot. Auton. Syst. , vol. 59, no. 10, pp. 801812, Oct. 2011. http://www.researchgate.net/publication/220142222_Smooth_coverage_path_planning_and_control_of_mobile_robots_based_on_highresolution_grid_map_representation

2 
H. T. Cheng, H. P. Chen, and Y. Liu, "Topological indoor localization and navigation for autonomous mobile robot, " IEEE Trans. Automat. Sci. Eng. , vol. 12, no. 2, pp. 729738, Apr. 2015. https://www.researchgate.net/publication/274573448_Topological_Indoor_Localization_and_Navigation_for_Autonomous_Mobile_Robot

3 
I. J. Cox and J. J. Leonard, "Modeling a dynamic environment using a Bayesian multiple hypothesis approach, " Artif. Intell. , vol. 66, no. 2, pp. 311344, Apr. 1994. https://www.researchgate.net/publication/223080507_Modeling_a_dynamic_environment_using_a_Bayesian_multiple_hypothesis_approach?ev=auth_pub

4 
B. H. Guo and Z. H. Li, "Dynamic environment modeling of mobile robots based on visual saliency, " Control Theory Appl. , vol. 30, no. 7, pp. 821827, Jul. 2013. http://en.cnki.com.cn/Article_en/CJFDTotalKZLY201307006.htm

5 
R. Sim and J. J. Little, "Autonomous visionbased exploration and mapping using hybrid maps and RaoBlackwellised particle filters, " in Proc. 2006 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Beijing, China, 2006, pp. 20822089. https://www.researchgate.net/publication/224685128_Autonomous_visionbased_exploration_and_mapping_using_hybrid_maps_and_RaoBlackwellised_particle_filters

6 
Y. N. Wang, Y. M. Yang, X. F. Yuan, Y. Zuo, Y. L. Zhou, F. Yin, and L. Tan, "Autonomous mobile robot navigation system designed in dynamic environment based on transferable belief model, " Measurement, vol. 44, no. 8, pp. 13891405, Oct. 2011. http://www.researchgate.net/publication/251542234_Autonomous_mobile_robot_navigation_system_designed_in_dynamic_environment_based_on_transferable_belief_model

7 
A. A. S. Souza, R. Maia, and L. M. G. Gonçalves, "3D probabilistic occupancy grid to robotic mapping with stereo vision, " in Current Advancements in Stereo Vision, A. Bhatti, Ed. Croacia: INTECH, 2012, pp. 181198.

8 
D. Hähnel, W. Burgard, and S. Thrun, "Learning compact 3D models of indoor and outdoor environments with a mobile robot, " Robot. Auton. Syst. , vol. 44, no. 1, pp. 1527, Jul. 2003.

9 
K. Pirker, M. Rüther, H. Bischof, and G. Schweighofer, "Fast and accurate environment modeling using threedimensional occupancy grids, " in Proc. 2011 IEEE Int. Conf. Computer Vision Workshops, Barcelona, Spain, 2011, pp. 11341140. https://www.researchgate.net/publication/221430086_Fast_and_accurate_environment_modeling_using_threedimensional_occupancy_grids

10 
S. Kim and J. Kim, "Occupancy mapping and surface reconstruction using local gaussian processes with Kinect sensors, " IEEE Trans. Cybern. , vol. 43, no. 5, pp. 13351346, Oct. 2013. http://www.ncbi.nlm.nih.gov/pubmed/23893758

11 
Y. Zhuang, N. Jiang, H. S. Hu, and F. Yan, "3Dlaserbased scene measurement and place recognition for mobile robots in dynamic indoor environments, " IEEE Trans. Instrum. Meas. , vol. 62, no. 2, pp. 438450, Feb. 2013. https://www.researchgate.net/publication/260492325_3DLaserBased_Scene_Measurement_and_Place_Recognition_for_Mobile_Robots_in_Dynamic_Indoor_Environments

12 
F. Endres, J. Hess, J. Sturm, D. Cremers, and W. Burgard, "3D mapping with an RGBD camera, " IEEE Trans. Robot. , vol. 30, no. 1, pp. 177187, Feb. 2014. https://www.researchgate.net/publication/260520054_3D_Mapping_With_an_RGBD_Camera

13 
L. Itti, C. Koch, and E. Niebur, "A model of saliencybased visual attention for rapid scene analysis, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 20, no. 11, pp. 12541259, Nov. 1998. http://www.researchgate.net/publication/3192913_A_model_of_saliencybased_visual_attention_for_rapid_scene_analysis

14 
A. Kimura, R. Yonetani, and T. Hirayama, "Computational models of human visual attention and their implementations: A survey, " IEICE Trans. Inf. Syst. , vol. E96D, no. 3, pp. 562578, Mar. 2013. https://www.researchgate.net/publication/275603606_Computational_Models_of_Human_Visual_Attention_and_Their_Implementations_A_Survey

15 
S. Frintrop, E. Rome, and H. I. Christensen, "Computational visual attention systems and their cognitive foundations: A survey, " ACM Trans. Appl. Percept. , vol. 7, no. 1, pp. Article ID: 6, Jan. 2010. https://www.researchgate.net/publication/220244956_Computational_visual_attention_systems_and_their_cognitive_foundations_A_survey?ev=prf_cit

16 
S. Frintrop and P. Jensfelt, "Attentional landmarks and active gaze control for visual SLAM, " IEEE Trans. Robot. , vol. 24, no. 5, pp. 10541065, Oct. 2008. https://www.researchgate.net/publication/224332109_Attentional_Landmarks_and_Active_Gaze_Control_for_Visual_SLAM?ev=auth_pub

17 
P. Newman and K. Ho, "SLAMloop closing with visually salient features, " in Proc. 2005 IEEE Int. Conf. Robotics and Automation, Barcelona, Spain, 2005, pp. 635642. https://www.researchgate.net/publication/4210014_SLAMLoop_Closing_with_Visually_Salient_Features

18 
N. Ouerhani, A. Bur, and H. Hügli, "Visual attentionbased robot selflocalization, " in Proc. 2005 European Conf. Mobile Robotics, Ancona, Italy, 2005, pp. 813. https://www.researchgate.net/publication/33682208_Visual_attentionbased_robot_selflocalization

19 
E. Einhorn, C. Schröter, and H. M. Gross, "Attentiondriven monocular scene reconstruction for obstacle detection, robot navigation and map building, " Robot. Auton. Syst. , vol. 59, no. 5, pp. 296309, May 2011. https://www.researchgate.net/publication/228572034_Attentiondriven_monocular_scene_reconstruction_for_obstacle_detection_robot_navigation_and_map_building

20 
R. Roberts, D. N. Ta, J. Straub, K. Ok, and F. Dellaert, "Saliency detection and modelbased tracking: A two part vision system for small robot navigation in forested environment, " in Proc. SPIE 8387, Unmanned Systems Technology XIV, Baltimore, Maryland, USA, vol. 8387, Atricle ID 83870S. https://www.researchgate.net/publication/258716451_Saliency_detection_and_modelbased_tracking_a_two_part_vision_system_for_small_robot_navigation_in_forested_environment

21 
H. Bay, T. Tuytelaars, and L. Van Gool, "SURF: Speeded up robust features, " in Proc. 9th European Conf. Computer Vision, Graz, Austria, 2006, pp. 404417. https://www.researchgate.net/publication/221303886_SURF_Speeded_Up_Robust_Features

22 
D. G. Lowe, "Distinctive image features from scaleinvariant keypoints, " Int. J. Comput. Vis. , vol. 60, no. 2, pp. 91110, Nov. 2004.

23 
D. Comaniciu and P. Meer, "Mean shift: A robust approach toward feature space analysis, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 24, no. 5, pp. 603619, May 2002.

24 
R. Rocha, J. Dias, and A. Carvalho, "Cooperative multirobot systems: A study of visionbased 3D mapping using information theory, " Robot. Auton. Syst. , vol. 53, no. 34, pp. 282311, Dec. 2005. https://www.researchgate.net/publication/4210106_Cooperative_MultiRobot_Systems_A_study_of_Visionbased_3D_Mapping_using_Information_Theory

25 
Thrun S., Burgard W., Fox D.. Probabilistic Robotics. New York, NY, USA: MIT Press, 2005.

26 
A. Murarka, "Building safety maps using vision for safe local mobile robot navigation, " Ph. D. dissertation, Dept. CS, Univ. Texas, Austin, USA, 2009. https://www.researchgate.net/publication/50417504_Building_safety_maps_using_vision_for_safe_local_mobile_robot_navigation

27 
S. Hrabar, "An evaluation of stereo and laserbased range sensing for rotorcraft unmanned aerial vehicle obstacle avoidance, " J. Field Robot. , vol. 29, no. 2, pp. 215239, Mar. Apr. 2012. https://www.researchgate.net/publication/261847674_An_evaluation_of_stereo_and_laserbased_range_sensing_for_rotorcraft_unmanned_aerial_vehicle_obstacle_avoidance
