Presented By O’Reilly and Intel AI
Put AI to work
8-9 Oct 2018: Training
9-11 Oct 2018: Tutorials & Conference
London, UK

DragonFly+: An FPGA-based quad-camera visual SLAM system for autonomous vehicles

Shaoshan Liu (PerceptIn)
14:35–15:15 Wednesday, 10 October 2018
Implementing AI
Location: Westminster Suite
Secondary topics:  Edge computing and Hardware

Prerequisite knowledge

What you'll learn

  • Learn how PerceptIn built specialized processors for autonomous driving


In recent years, autonomous driving has become quite a popular topic in the research community, industry, and even the press. Nonetheless, the large-scale adoption of autonomous vehicles is meeting affordability problems. The major contributors to the high cost of autonomous vehicles include lidar sensors, which cost over $80,000 per unit, and computing systems, which cost over $20,000 each.

Shaoshan Liu explains how PerceptIn built a reliable autonomous vehicle, the DragonFly car, for under $10,000. The car was built for low-speed scenarios, such as university campuses, industrial parks, and areas with limited traffic. PerceptIn’s approach starts with low-speed to ensure safety, thus allowing immediate deployment. With technology improvements and with the benefit of accumulated experience, high-speed scenarios will be envisioned, ultimately having the vehicle’s performance equal that of a human driver in any driving scenario.

Instead of lidar, the DragonFly system utilizes computer vision-based sensor fusion to achieve reliable localization. Specifically, DragonFly integrates four cameras (with 720p resolution) into one hardware module, such that a pair of cameras faces the front of the vehicle and another pair of cameras faces the rear. Each pair of cameras functions like human eyes to capture spatial information of the environment from left and right two-dimensional images. The combination of the two pairs of cameras creates a 360-degree panoramic view of the environment. With this design, visual odometry should never fail since at any moment in time, you can always extract 360-degree spatial information from the environment, and there are always enough overlapping spatial regions between consecutive frames.

To achieve affordability and reliability, PerceptIn had four basic requirements for the DragonFly system design: It must be modular, with an independent hardware module for computer-vision-based localization and map generation. It must be SLAM-ready, with hardware synchronization of four cameras and IMU. It must be low power: the total power budget for this system is less than 10 W. It must be high performance: DragonFly needs to process four-way 720p YUV images with > 30 fps. Note that, with this design, at 30 fps, it generates more than 100 MB of raw image data per second and thus imposes tremendous stress on the computing system. After initial profiling, PerceptIn found out that the image processing frontend (e.g., image feature extraction) accounts for > 80% of the processing time.

To achieve the aforementioned design goals, PerceptIn designed and implemented DragonFly+, an FPGA-based real-time localization module. The DragonFly+ system includes hardware synchronizations among the four image channels as well as the IMU; a direct I/O architecture to reduce off-chip memory communication; and a fully pipelined architecture to accelerate the image processing frontend of the localization system. In addition, it employs parallel and multiplexing processing techniques to achieve a good balance between bandwidth and hardware resource consumption.

PerceptIn has thoroughly evaluated the performance and power consumption of the proposed hardware and compared it against an NVIDIA TX1 GPU SoC and an Intel Core i7 processor. The results demonstrate that, for processing four-way 720p images, DragonFly+ achieves 42 fps performance while consuming only 2.3 W of power, exceeding the design goals. By comparison, the NVIDIA Jetson TX1 GPU SoC achieves 9 fps at 7 W, and the Intel Core i7 achieves 15 fps at 80 W. Therefore, DragonFly+ is 3x more power efficient and delivers 5x of computing power compared to the NVIDIA TX1 and 34x more power efficient and delivers 3x of computing power compared to the Intel Core i7.

Photo of Shaoshan Liu

Shaoshan Liu


Shaoshan Liu is the cofounder and chairman of PerceptIn, a company working on developing a next-generation robotics platform. Previously, he worked on autonomous driving and deep learning infrastructure at Baidu USA. Shaoshan holds a PhD in computer engineering from the University of California, Irvine.