Automomous Vehicles

Overview

Real-time sensor data processing for autonomous vehicles using Quasar

Monocular visual odometry

Current consumer vehicle navigation relies on Global Navigation Satellite Systems (GNSS) such as GPS, GLONASS and Galileo. However, as new cars are increasingly being equipped with cameras as standard equipment for a variety of tasks (e.g. lane departure warning, collision avoidance, traffic sign recognition), this opens up possibilities for navigation based on visual clues. Our visual odometry method robustly estimates the vehicle’s trajectory using a simple monocular camera. The method is easy to calibrate, can accommodate a wide variety of camera mounting positions and yields competitive results with traditional pose estimation methods.

Our visual odometry method applied on a sequence from a Kitty database (click on the image to play a video sequence)

Our visual odometry method applied on a sequence from a Kitty database (click on the image to play a video sequence) Furthermore, we have proposed a novel algorithm to relate the estimated trajectory to an offline map, effectively eliminating the error accumulation (drift) problem and thereby providing a complete fair-weather vehicle-contained navigation solution, requiring no external communication whatsoever. The high detail in which the trajectory also allows the detection of smaller maneuvers (e.g. lane changes), which is important in the context of intelligent vehicles / ADAS (advanced driver assistance systems).

Mapped monocular visual odometry (click on the image to play a video sequence)

Mapped monocular visual odometry (click on the image to play a video sequence) The method also has possible applications in indoor navigation and cyclist odometry.

Contact:dr. ing. David Van Hamme

References:

  • Van Hamme, D., Veelaert, P. & Philips, W. (2012). Communicationless navigation through robust visual odometry. In: 15th International IEEE conference on Intelligent Transportation Systems (ITSC 2012), Anchorage, AK, USA, 2012-09-16. IEEE. 1555-1560.

  • Van Hamme, D., Veelaert, P. & Philips, W. (2011). Robust monocular visual odometry by uncertainty voting. In: 2011 IEEE Intelligent Vehicles Symposium (IV 2011], Baden-Baden, Germany, 2011-06-05. IEEE. 643-647.

  • Van Hamme, D., Veelaert, P. & Philips, W. (2011). Robust visual odometry using uncertainty models. In: 13th International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS)/ACM/IEEE International Conference on Distributed Smart Cameras, Ghent, Belgium, 2011-08-22.

  • Van Hamme, D., Goeman, W., Veelaert, P., & Philips, W. (2015). Robust monocular visual odometry for road vehicles using uncertain perspective projection. In: EURASIP Journal on Image and Video Processing, 1–18.

Back to the top

LIDAR based odometry

Facing real world driving conditions, autonomous vehicle systems often fail to accurately find their position in space. Computer vision pose estimation techniques rely on camera technology which is highly unreliable in low light levels and/or bad weather. By using point cloud data scanned by on-board Lidar sensors we have developed an ego-localization system for accurate short term odometry. We use projective geometry to reduce the point cloud to a 2D map and then apply robust image registration techniques to estimate the ego-motion. The sensor nature guarantees that our system is invariant to the ambient light levels and atmospheric conditions. Experiments on real-world driving data show improvements in short term localization accuracy over standard Visual Odometry approaches even in perfect light conditions.

Estimated trajectory

Ground truth (KITTI dataset)

Contact: ir. Martin Dimitrievski

Back to the top

References:

-Martin Dimitrievski, David Van Hamme, Peter Veelaert and Wilfried Philips, ”Robust matching of occupancy maps for odometry in autonomous vehicles”, in Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pp.626-633, 2016

Automotive occupancy mapping

Our autonomous vehicle system perceives a snapshot of the local surroundings by applying the occupancy map model to the point cloud data. Occupancy maps are an elegant solution to the problem of data fusion especially when there is a multitude of heterogeneous sensors on board. Point cloud height information is projected on the local ground plane and a probabilistic model computes the probability of occupancy for each point around the vehicle. Using our previously demonstrated odometry sub-system we can incrementally build this occupancy map resulting in a very detailed picture of the local surroundings.

Figure 2: Example trajectory and the corresponding environment geometry (click on the image to play a video)

Contact: ir. Martin Dimitrievski

Back to the top

Object detection and tracking

Point Cloud based object detection is performed by first removing points that lie at or near the local ground plane. The remaining points are segmented into semantically meaningful objects by using projective geometry. A height map is processed using various image morphology techniques to extract local variances defining object boundaries. Finally, the processed height map contains connected components, i.e. blobs, which coincide with the contours of the 3D objects. Points in the original point cloud are therefore segmented to the respective object contour.

Tracking of objects is performed in the connected components (blobs) image on a frame-by frame basis. This blob image is first corrected for EGO motion using our LiDAR odometry algorithm and then matching blobs are tracked over time using maximum overlap criterion.

The following video demonstrates the results of our odometry, detection and tracking:

Top: Segmentation of the LiDAR point cloud with ground plane colored gray, bottom: corresponding segments projected on the RGB image. (click on the image to play a video)

Using the ground truth data in the KITTI dataset we trained a purely point cloud based car classifier. The model was built on a combination of Fast Point Feature Histograms (FPFH) and statistical moments from 3D points and a non-linear SVM classifer. A multiple target Kalman filter tracking technique is later used on the cars to accurately track and predict their speeds and velocities.


Point cloud ground truth vs. detected cars from our classifier. (click on the right image to play a video)
ground truth

Point cloud ground truth vs. detected cars from our classifier. (click on the right image to play a video) ground truth

Contact: ir. Martin Dimitrievski

Back to the top

Obstacle detection based on 3D imaging

This research has lead to an obstacle detection system that can assist visually impaired people. The system relies on the output of a 3D sensor (e.g. Kinect) and uses 3D imaging and computer vision to detect obstacles. The obstacle detection process itself involves two main tasks. The first task deals with the reconstruction of the scene in front of the pedestrian while the second task performs a segmentation of the scene based on this reconstruction.

Reconstruction

During the reconstruction phase, depth and color information are combined to create a colored point cloud. Once the point cloud is created, we transform it in order to allign the floor plane with the xz-plane in our reference system. This transformation can be carried out because the Kinect is equipped with an accelerometer which provides us with the direction of the gravity. The importance of the transformation is threefold. First, it is needed to keep the consecutive point clouds expressed with respect to the same reference system. Secondly, the detection of the floor plane is simplified and more robust since we do not have to assume that the floor plane is the dominant plane in the scene. Finally, this transformation will enable us to make use of the absolute position of objects in the scene, in particular of the absolute height of the obstacles. The figure below illustrates the idea of the performed reconstruction and transformation.

Illustration of the coordinate system transformation. During the transformation, the xz-plane is alligned with the floor plane and the origin is centered at the feet of the pedestrian. The performed transformation.

Illustration of the coordinate system transformation. During the transformation, the xz-plane is alligned with the floor plane and the origin is centered at the feet of the pedestrian.
The performed transformation.

Segmentation

The segmentation process consists of two subtasks. The first task will segment all the dominant planes in the scene. Obviously, the most important plane in the scene is the floor plane since most obstacles are lying on it. Once we have identified the floor plane, we can classify objects as obstacles from the moment they are located right in front of the pedestrian. The second task thus consists of the segmentation of the remaining objects. The easiest way to do this is to cluster the remaining points based on their distance relative to each other. Below you can find a video showing how the system is performing. The scene was recorded at one of the corridors of our research group making use of a Kinect sensor.

Our obstacle detection method (click on the image to play a video sequence)

Step detection

In some cases obstacles do not have to be avoided. When the obstacle takes the form of a staircase, the pedestrian can simply go upstairs. For this reason, this work also covers the detection of steps. This latter process is based on a piecewise planar model. After the different planes of the scene are segmented, we check if there are any planes parallel with the floor plane that are located at a certain height. The detection of steps in the scene is solely based on depth information. As a result, the algorithm can operate in situations subject to different light conditions. Below you can find an example segmentation of the different steps in the scene.

Illustration of step detection. (click on the images to play videos)

Door detection (click on the images to play videos)

The same reasoning applies in cases where the obstacle is a door. Instead of avoiding this obstacle, the door can be opened. The detection of doors also appeals on the plane segmentation process. This time we check if there is a plane that is perpendicular to the floor plane and has a certain width. Once such a plane is found, we check if a door handle is present at a certain position. The plane segmentation process is extended to take color information into account since a door is often part of the same plane as the wall. Below you can find an example segmentation of a door.

Illustration of door detection. (click on the images to play videos)

Contact: ir. Michiel Vlaminck

Back to the top

References:

Vlaminck, M.; Jovanov, L.; Van Hese, P.; Goossens, B.; Philips, W.; Pizurica, A., “Obstacle detection for pedestrians with a visual impairment based on 3D imaging,” 3D Imaging (IC3D), 2013 International Conference on , vol., no., pp.1,7, 3-5 Dec. 2013

3D scene mapping

Goal

The development of a robust and generic system to obtain accurate 3D reconstructions of large scale environments in any scene without any assumption on the type of the scene or on the sensor set-up.

Acquisition platform

Our acquisition platform consists of a Velodyne High Definition Lidar (HDL-32e) scanner combined with a Ladybug panoramic camera system. Because of this combination, we gave it the name ‘Vellady’ platform. As can be seen on figures, the Ladybug camera is mounted perpendicular to the ground plane, whereas the Velodyne is tilted on its head making an angle of approximate 66° with the ground plane. The Velodyne lidar scanner is equipped with 32 lasers mounted collinear and covering a vertical FOV of 41.3° hence resulting in a vertical resolution of 1.29°. The head is continuously spinning at approximately 10 Hz resulting in a horizontal FOV of 360°. The Ladybug on the other hand is a fixed system that comes in the form of a pentagonal prism consisting of five vertical-oriented sides, each one incorporating a camera. A sixth camera is mounted on top pointing upwards.


Two pictures of our mobile acquisition platform Vellady mounted on a kitchen cart (left) and in close-up (right). The platform consists of a Ladybug panoramic camera (mounted at the top) and a Velodyne HDL32-e lidar scanner (mounted at the bottom).
Vellady platform Vellady platform zoomed in

Approach

A schematic overview of our approach is depicted in the picture. As can be seen, it is implemented as a sequential, i.e. incremental, process in which every newly arrived point cloud is first processed and subsequently added to the current world model. The process repeated for every point cloud is conducted in five steps. First, we project the generated point clouds on a 2D grid (step 1). Subsequently we conduct a surface analysis (step 2) after which we perform pairwise alignment of two consecutive point clouds (step 3), which serves as an initial guess for the current pose. Next, we register the aligned point cloud with a global 3D map (step 4) and fuse the new points with this 3D map (step 5). This fusion consists of re-sampling the point cloud by means of a surface reconstruction technique. Optionally - when a loop has been detected - we adopt loop closure to preserve global consistency (step 6). This is done by means of pose graph optimization, which propagates the estimated error back in the pose graph. In the following sections we will further clarify each of these steps.

System overview

Use cases

Street mapping in Hasselt

We recorded a data sequence in the streets of the Belgian city Hasselt. The exact longitude and latitude of the starting point are respectively 5.30935° and 50.9416°. An image of this starting point is depicted in figure below. The mobile mapping van was driving at a speed of approximately 15 km/h. Besides data captured with the Velodyne scanner, we also recorded accurate positioning information using the POS LV 420 positioning sensor developed by Applanix. This INS system incorporates both an inertial measurement unit (IMU), a distance measurement indicator (DMI) and Trimble’s BD960 GNSS receiver. Since each point of the Velodyne data is timestamped we can derive the exact geolocation of each point in space. This information has been used as ground truth.


The resulting point cloud of our mobile mapping system on the entire lidar sequence recorded in the Belgian city of Hasselt. (click on the image to play a video)

A plot of the ground truth trajectory (red) as well as the estimated trajectory (blue) on top of the corresponding google earth image. Left before loop closing, right after loop closing.
Result odometry without loop closure Result odometry with loop closure

UFO building Ghent University

A set of sequences was captured at a campus of Ghent University, near the UFO and Technicum building. The speed of the platform approximated walking speed, i.e. 4 km/h. Below you can find an example sequence and life 3D reconstruction demo.


(click on the image to play a video)

(click on the image to play a video)

DOW Chemical plant

Some other sequences was captured at a chemical site of the Dow Company in Terneuzen. This environment is part of a disused area that was planned to be demolished. It consists of a lot of pipelines that were formerly used to carry liquids or gases, as can be seen in the video below. Although this environment seems outdoors, the GPS signal is far too unreliable due to the abundance of pipelines. The acquired data consists of video sequences recorded with our Vellady platform. Below you can again find an example sequence and life 3D reconstruction demo.

(click on the image to play a video) (click on the image to play a video)

Contact: ir. Michiel Vlaminck

Back to the top

Low level point cloud processing

Normal estimation

Recognition of objects in point cloud data relies on cues of the local 3D geometry and most of the algorithms use local plane normal information in one form or the other. Therefore, fast computation of local plane normals is of paramount importance. Part of our research resulted in the development of an algorithm for fast computation of plane normals in single Lidar configuration. Our approach transforms the point cloud to a spherical depth image where it exploits fast image processing paradigms. Within this depth image we employ a robust, block processing based, plane fitting algorithm to estimate the local normals. This approach completely skips the expensive nearest neighbor search operation and enables our local plane estimator to run real-time even in dense point clouds. The algorithm development, from design to implementation, was completely done using the Quasar* programming language.

Color coded point cloud with the orientation of the local plane normals relative to the vehicle (click on the image to play a video sequence)

Up-sampling

Our contribution is an efficient and semantically accurate algorithm for generation of dense depth maps using sparse LIDAR point cloud data. Using a reformulation of the bi-lateral filter we fill-in missing measurements, preserve the object edges and suppress the measurement noise. Our dense depth maps can be used in various computer vision systems as data modality or they can be back-projected to obtain a more dense point cloud. Experimental results on KITTI, sequence 19 and 16:

Top: RGB camera frame, Bottom: reconstructed dense depth map (click on the image to play a video sequence)

Top: RGB camera frame, Bottom: reconstructed dense depth map (click on the image to play a video sequence)

Contact: ir. Martin Dimitrievski

Back to the top

Multimodal sensor fusion

Reliable visibility of traffic participants in all weather conditions is crucial for driving safety. In the case of human drivers, an automatic driver assistance system (ADAS) should provide the driver with with an augmented representation of the vehicle surroundings, including imagery which improves the visibility in adverse atmospheric conditions. For autonomous vehicles, the use of multimodal representation of the traffic scene is even more important, since computer vision algorithms require more robust input in order to approach the performance of the human visual system. The vision system of an autonomous vehicle needs to create a complete representation of the environment, including all traffic participants, the current traffic conditions and the provided road signalization. Moreover, it should create a logical interpretation of the situation and the interactions between the objects in its surroundings. To achieve these goals, a core requirement for any machine vision system is reliable and robust object recognition in various atmospheric conditions. Therefore, we employ a multi-sensor setup that consists of cameras for acquisition of different imaging modalities, and design a system for global tracking of pedestrians based on fusion of separate tracking outputs.


Figure 1: Multimodal camera fusion for ADAS and autonomous vehicles

To develop the principles of the proposed method, our sensor subset consists of the following passive cameras/sensors: Front, visible light RGB camera (FVL), Front, long-wavelength infrared camera (LWIR/thermal), Left (LVL) and right (RVL) side, visible light cameras. The result of the proposed sensor fusion is shown in FIgure 1, where images from different cameras are shown overlapped. Moreover, the proposed method performs fusion at decision level, which significantly improves the accuracy of the pedestrian tracking.

Contact: dr.ir. Ljubomir Jovanov MSc Ivana Shopovska

Back to the top

References: Ivana Shopovska, Ljubomir Jovanov, Peter Veelaert, Wilfried Philips, Merwan Birem and Kris Lehaen, ”A Hybrid Fusion Based Frontal-Lateral Collaborative Pedestrian Detection and Tracking”, IEEE 20th International Conference on Intelligent Transportation Systems Yokohama, JAPAN, October 16 - 19, 2017

High dynamic range video capture

Video sequences of traffic captured from the moving vehicles are highly variable. As the vehicle moves different shadows, direct sunlight, headlights of the approaching cars etc. are creating extreme values in the image, which cause the performance drop of the computer vision algorithms employed to drive the vehicle or assist the driver. In order to improve the quality of captured video stream and increase the visibility of the scene, we employ high dynamic range imaging. In order to capture objects hidden in the shadows and in saturated regions we employ one longer and one shorter exposure time along with the standard one. To reduce the artefacts caused by global camera motion and motion of the objects in the scene we employ non-rigid motion estimation. The resulting images are much more informative than the images captured using base exposure time.


Figure 1: Improved visibility inside shadowed regions using high dynamic range imaging

Contact: dr.ir. Ljubomir Jovanov MSc Ivana Shopovska

Back to the top

References: Shopovska, I., Jovanov, L., Goossens, B., & Philips, W. (2016). HDR video synthesis for vision systems in dynamic scenes. Proceedings of SPIE (Vol. 9971, p. 99710C–99710C–16). Conference on Applications of Digital Image Processing XXXIX , SPIE.

Avatar
IPI
Image Processing and Interpretation

My research interests include distributed robotics, mobile computing and programmable matter.

Related