Autonomous Vehicles

Overview

Real-time sensor data processing for autonomous vehicles using Quasar

Monocular visual odometry

Current consumer vehicle navigation relies on Global Navigation Satellite Systems (GNSS) such as GPS, GLONASS and Galileo. However, as new cars are increasingly being equipped with cameras as standard equipment for a variety of tasks (e.g. lane departure warning, collision avoidance, traffic sign recognition), this opens up possibilities for navigation based on visual clues. Our visual odometry method robustly estimates the vehicle’s trajectory using a simple monocular camera. The method is easy to calibrate, can accommodate a wide variety of camera mounting positions and yields competitive results with traditional pose estimation methods.

Our visual odometry method applied on a sequence from a Kitty database (click on the image to play a video sequence)

Our visual odometry method applied on a sequence from a Kitty database (click on the image to play a video sequence) Furthermore, we have proposed a novel algorithm to relate the estimated trajectory to an offline map, effectively eliminating the error accumulation (drift) problem and thereby providing a complete fair-weather vehicle-contained navigation solution, requiring no external communication whatsoever. The high detail in which the trajectory also allows the detection of smaller maneuvers (e.g. lane changes), which is important in the context of intelligent vehicles / ADAS (advanced driver assistance systems).

Mapped monocular visual odometry (click on the image to play a video sequence)

Mapped monocular visual odometry (click on the image to play a video sequence) The method also has possible applications in indoor navigation and cyclist odometry.

Contact:dr. ing. David Van Hamme

References:

  • Van Hamme, D., Veelaert, P. & Philips, W. (2012). Communicationless navigation through robust visual odometry. In: 15th International IEEE conference on Intelligent Transportation Systems (ITSC 2012), Anchorage, AK, USA, 2012-09-16. IEEE. 1555-1560.

  • Van Hamme, D., Veelaert, P. & Philips, W. (2011). Robust monocular visual odometry by uncertainty voting. In: 2011 IEEE Intelligent Vehicles Symposium (IV 2011], Baden-Baden, Germany, 2011-06-05. IEEE. 643-647.

  • Van Hamme, D., Veelaert, P. & Philips, W. (2011). Robust visual odometry using uncertainty models. In: 13th International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS)/ACM/IEEE International Conference on Distributed Smart Cameras, Ghent, Belgium, 2011-08-22.

  • Van Hamme, D., Goeman, W., Veelaert, P., & Philips, W. (2015). Robust monocular visual odometry for road vehicles using uncertain perspective projection. In: EURASIP Journal on Image and Video Processing, 1–18.

Back to the top

LIDAR based odometry

Facing real world driving conditions, autonomous vehicle systems often fail to accurately find their position in space. Computer vision pose estimation techniques rely on camera technology which is highly unreliable in low light levels and/or bad weather. By using point cloud data scanned by on-board Lidar sensors we have developed an ego-localization system for accurate short term odometry. We use projective geometry to reduce the point cloud to a 2D map and then apply robust image registration techniques to estimate the ego-motion. The sensor nature guarantees that our system is invariant to the ambient light levels and atmospheric conditions. Experiments on real-world driving data show improvements in short term localization accuracy over standard Visual Odometry approaches even in perfect light conditions.

Estimated trajectory

Ground truth (KITTI dataset)

Contact: ir. Martin Dimitrievski

Back to the top

References:

-Martin Dimitrievski, David Van Hamme, Peter Veelaert and Wilfried Philips, ”Robust matching of occupancy maps for odometry in autonomous vehicles”, in Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pp.626-633, 2016

Automotive occupancy mapping

Our autonomous vehicle system perceives a snapshot of the local surroundings by applying the occupancy map model to the point cloud data. Occupancy maps are an elegant solution to the problem of data fusion especially when there is a multitude of heterogeneous sensors on board. Point cloud height information is projected on the local ground plane and a probabilistic model computes the probability of occupancy for each point around the vehicle. Using our previously demonstrated odometry sub-system we can incrementally build this occupancy map resulting in a very detailed picture of the local surroundings.

Figure 2: Example trajectory and the corresponding environment geometry (click on the image to play a video)

Contact: ir. Martin Dimitrievski

Back to the top

Object detection and tracking

Point Cloud based object detection is performed by first removing points that lie at or near the local ground plane. The remaining points are segmented into semantically meaningful objects by using projective geometry. A height map is processed using various image morphology techniques to extract local variances defining object boundaries. Finally, the processed height map contains connected components, i.e. blobs, which coincide with the contours of the 3D objects. Points in the original point cloud are therefore segmented to the respective object contour.

Tracking of objects is performed in the connected components (blobs) image on a frame-by frame basis. This blob image is first corrected for EGO motion using our LiDAR odometry algorithm and then matching blobs are tracked over time using maximum overlap criterion.

The following video demonstrates the results of our odometry, detection and tracking:

Top: Segmentation of the LiDAR point cloud with ground plane colored gray, bottom: corresponding segments projected on the RGB image. (click on the image to play a video)

Using the ground truth data in the KITTI dataset we trained a purely point cloud based car classifier. The model was built on a combination of Fast Point Feature Histograms (FPFH) and statistical moments from 3D points and a non-linear SVM classifer. A multiple target Kalman filter tracking technique is later used on the cars to accurately track and predict their speeds and velocities.


Point cloud ground truth vs. detected cars from our classifier. (click on the right image to play a video)
ground truth

Point cloud ground truth vs. detected cars from our classifier. (click on the right image to play a video) ground truth

Contact: ir. Martin Dimitrievski

Back to the top

Obstacle detection based on 3D imaging

This research has lead to an obstacle detection system that can assist visually impaired people. The system relies on the output of a 3D sensor (e.g. Kinect) and uses 3D imaging and computer vision to detect obstacles. The obstacle detection process itself involves two main tasks. The first task deals with the reconstruction of the scene in front of the pedestrian while the second task performs a segmentation of the scene based on this reconstruction.

Reconstruction

During the reconstruction phase, depth and color information are combined to create a colored point cloud. Once the point cloud is created, we transform it in order to allign the floor plane with the xz-plane in our reference system. This transformation can be carried out because the Kinect is equipped with an accelerometer which provides us with the direction of the gravity. The importance of the transformation is threefold. First, it is needed to keep the consecutive point clouds expressed with respect to the same reference system. Secondly, the detection of the floor plane is simplified and more robust since we do not have to assume that the floor plane is the dominant plane in the scene. Finally, this transformation will enable us to make use of the absolute position of objects in the scene, in particular of the absolute height of the obstacles. The figure below illustrates the idea of the performed reconstruction and transformation.

Illustration of the coordinate system transformation. During the transformation, the xz-plane is alligned with the floor plane and the origin is centered at the feet of the pedestrian. The performed transformation.

Illustration of the coordinate system transformation. During the transformation, the xz-plane is alligned with the floor plane and the origin is centered at the feet of the pedestrian.
The performed transformation.

Segmentation

The segmentation process consists of two subtasks. The first task will segment all the dominant planes in the scene. Obviously, the most important plane in the scene is the floor plane since most obstacles are lying on it. Once we have identified the floor plane, we can classify objects as obstacles from the moment they are located right in front of the pedestrian. The second task thus consists of the segmentation of the remaining objects. The easiest way to do this is to cluster the remaining points based on their distance relative to each other. Below you can find a video showing how the system is performing. The scene was recorded at one of the corridors of our research group making use of a Kinect sensor.

Our obstacle detection method (click on the image to play a video sequence)

Step detection

In some cases obstacles do not have to be avoided. When the obstacle takes the form of a staircase, the pedestrian can simply go upstairs. For this reason, this work also covers the detection of steps. This latter process is based on a piecewise planar model. After the different planes of the scene are segmented, we check if there are any planes parallel with the floor plane that are located at a certain height. The detection of steps in the scene is solely based on depth information. As a result, the algorithm can operate in situations subject to different light conditions. Below you can find an example segmentation of the different steps in the scene.

Illustration of step detection. (click on the images to play videos)

Door detection (click on the images to play videos)

The same reasoning applies in cases where the obstacle is a door. Instead of avoiding this obstacle, the door can be opened. The detection of doors also appeals on the plane segmentation process. This time we check if there is a plane that is perpendicular to the floor plane and has a certain width. Once such a plane is found, we check if a door handle is present at a certain position. The plane segmentation process is extended to take color information into account since a door is often part of the same plane as the wall. Below you can find an example segmentation of a door.

Illustration of door detection. (click on the images to play videos)

Contact: ir. Michiel Vlaminck

Back to the top

References:

Vlaminck, M.; Jovanov, L.; Van Hese, P.; Goossens, B.; Philips, W.; Pizurica, A., “Obstacle detection for pedestrians with a visual impairment based on 3D imaging,” 3D Imaging (IC3D), 2013 International Conference on , vol., no., pp.1,7, 3-5 Dec. 2013

Liborg - Lidar based mapping

Today, the 3D reconstruction of, for example, tunnels or industrial buildings is a time-consuming and expensive process. To simplify this process we developed Liborg, a lidar-based mobile mapping platform. The system is independent from any external positioning system such as GPS and conducts localization (inherent to mapping) solely based on the output of a lidar scanner. Thanks to our efficient 3D mapping algorithms, it is possible to build detailed 3D models of various environments on the fly. Furthermore, we developed a system to transmit the reconstructed 3D model to a remote computer or server. This latter allows a whole range of additional applications for which Liborg can be used such as live monitoring of the acquired 3D model, for instance to perform inspection or assess damages in areas that are difficult to reach.

Recently, we started the development of a truly autonomous robot, in a way that it will be possible for the robot to perform navigation based on its own acquired 3D model. This allows the robot to perform the entire 3D mapping of a scene without any manual intervention. We also integrated a regular camera to be able to combine the 3D geometric information of the scene with visual data. To that end, a synchronisation module was developed along with calibration algorithms to relate the data of both sensors. A picture of our autonomous robot can be found below. Note that it is still possible to control it remotely or to use our 3D mapping software in combination with another kind of sensor set-up, including a hand-held lidar scanner.

Liborg mapping platform

The main technological benefits of Liborg are: 1) fast, accurate and highly robust point cloud registration (stitching), 2) unrivalled sub centimeter accuracy and 3) a compact and efficient representation of the 3D model. The overall advantages of Liborg summarize as follows. First, it is no longer necessary to bring in an external company with specialized equipement. Second, there are no scanners that need to be put into place manually. Third, one can monitor, at all times, which areas have already been assessed or mapped. All this saves both time and money. Below, we demonstrate a few use cases.

Street mapping in Hasselt

The goal of this use case was to generate a highly precise 3D model of residential are in the vicinity of Hasselt. To that end, we mounted a Velodyne HDL-32E lidar scanner on the mapping van of the company Grontmij. In order to acquire the facades and rooftops of the houses in detail, the scanner was tilted, making an angle of approximately 44 degrees with the ground plane.


The resulting point cloud of our mobile mapping system on the entire lidar sequence recorded in the Belgian city of Hasselt. (click on the image to play a video)

A plot of the ground truth trajectory (red) as well as the estimated trajectory (blue) on top of the corresponding google earth image. Left before loop closing, right after loop closing.
Result odometry without loop closure Result odometry with loop closure

UFO building Ghent University

A set of sequences was captured at a campus of Ghent University, near the UFO and Technicum building. Below you can find an example sequence and life 3D reconstruction demo.


(click on the image to play a video)

(click on the image to play a video)

DOW Chemical plant

A third, challenging, use case was recorded at a chemical site of the Dow Company in Terneuzen. This environment was part of a disused area that was planned to be demolished. It consists of a lot of pipelines that were formerly used to carry liquids or gases, as can be seen in the video below. Although this environment seems outdoors, the GPS signal is far too unreliable due to the abundance of pipelines. Below you can again find an example sequence and life 3D reconstruction demo.

(click on the image to play a video) (click on the image to play a video)

References:

  • Vlaminck, M.; Luong, H.; Goeman, W.; Philips, W. "3D scene reconstruction using omnidirectional vision and LiDAR: a hybrid approach.", SENSORS. MDPI; 2016;16(11).
  • Vlaminck, M.; Luong, H; Philips, W., "Liborg: a lidar-based robot for efficient 3D mapping", Applications of Digitial Image Processing XL Vol. 10396, SPIE, 2017.
  • Vlaminck, M.; Luong, H; Philips, W., "Have I Seen This Place Before? A Fast and Robust Loop Detection and Correction Method for 3D Lidar SLAM.", SENSORS. MDPI; 2019;19(1).

Contact: ir. Michiel Vlaminck

Back to the top

Low level point cloud processing

Normal estimation

Recognition of objects in point cloud data relies on cues of the local 3D geometry and most of the algorithms use local plane normal information in one form or the other. Therefore, fast computation of local plane normals is of paramount importance. Part of our research resulted in the development of an algorithm for fast computation of plane normals in single Lidar configuration. Our approach transforms the point cloud to a spherical depth image where it exploits fast image processing paradigms. Within this depth image we employ a robust, block processing based, plane fitting algorithm to estimate the local normals. This approach completely skips the expensive nearest neighbor search operation and enables our local plane estimator to run real-time even in dense point clouds. The algorithm development, from design to implementation, was completely done using the Quasar* programming language.

Color coded point cloud with the orientation of the local plane normals relative to the vehicle (click on the image to play a video sequence)

Up-sampling

Our contribution is an efficient and semantically accurate algorithm for generation of dense depth maps using sparse LIDAR point cloud data. Using a reformulation of the bi-lateral filter we fill-in missing measurements, preserve the object edges and suppress the measurement noise. Our dense depth maps can be used in various computer vision systems as data modality or they can be back-projected to obtain a more dense point cloud. Experimental results on KITTI, sequence 19 and 16:

Top: RGB camera frame, Bottom: reconstructed dense depth map (click on the image to play a video sequence)

Top: RGB camera frame, Bottom: reconstructed dense depth map (click on the image to play a video sequence)

Contact: ir. Martin Dimitrievski

Back to the top

Multimodal sensor fusion

Reliable visibility of traffic participants in all weather conditions is crucial for driving safety. In the case of human drivers, an automatic driver assistance system (ADAS) should provide the driver with with an augmented representation of the vehicle surroundings, including imagery which improves the visibility in adverse atmospheric conditions. For autonomous vehicles, the use of multimodal representation of the traffic scene is even more important, since computer vision algorithms require more robust input in order to approach the performance of the human visual system. The vision system of an autonomous vehicle needs to create a complete representation of the environment, including all traffic participants, the current traffic conditions and the provided road signalization. Moreover, it should create a logical interpretation of the situation and the interactions between the objects in its surroundings. To achieve these goals, a core requirement for any machine vision system is reliable and robust object recognition in various atmospheric conditions. Therefore, we employ a multi-sensor setup that consists of cameras for acquisition of different imaging modalities, and design a system for global tracking of pedestrians based on fusion of separate tracking outputs.


Figure 1: Multimodal camera fusion for ADAS and autonomous vehicles

To develop the principles of the proposed method, our sensor subset consists of the following passive cameras/sensors: Front, visible light RGB camera (FVL), Front, long-wavelength infrared camera (LWIR/thermal), Left (LVL) and right (RVL) side, visible light cameras. The result of the proposed sensor fusion is shown in FIgure 1, where images from different cameras are shown overlapped. Moreover, the proposed method performs fusion at decision level, which significantly improves the accuracy of the pedestrian tracking.

Contact: dr.ir. Ljubomir Jovanov MSc Ivana Shopovska

Back to the top

References: Ivana Shopovska, Ljubomir Jovanov, Peter Veelaert, Wilfried Philips, Merwan Birem and Kris Lehaen, ”A Hybrid Fusion Based Frontal-Lateral Collaborative Pedestrian Detection and Tracking”, IEEE 20th International Conference on Intelligent Transportation Systems Yokohama, JAPAN, October 16 - 19, 2017

High dynamic range video capture

Video sequences of traffic captured from the moving vehicles are highly variable. As the vehicle moves different shadows, direct sunlight, headlights of the approaching cars etc. are creating extreme values in the image, which cause the performance drop of the computer vision algorithms employed to drive the vehicle or assist the driver. In order to improve the quality of captured video stream and increase the visibility of the scene, we employ high dynamic range imaging. In order to capture objects hidden in the shadows and in saturated regions we employ one longer and one shorter exposure time along with the standard one. To reduce the artefacts caused by global camera motion and motion of the objects in the scene we employ non-rigid motion estimation. The resulting images are much more informative than the images captured using base exposure time.


Figure 1: Improved visibility inside shadowed regions using high dynamic range imaging

Contact: dr.ir. Ljubomir Jovanov MSc Ivana Shopovska

Back to the top

References: Shopovska, I., Jovanov, L., Goossens, B., & Philips, W. (2016). HDR video synthesis for vision systems in dynamic scenes. Proceedings of SPIE (Vol. 9971, p. 99710C–99710C–16). Conference on Applications of Digital Image Processing XXXIX , SPIE.

Avatar
IPI
Image Processing and Interpretation

My research interests include distributed robotics, mobile computing and programmable matter.

Related