DEV Community

Cover image for 15 Best Open-Source Autonomous Driving Datasets

Posted on

15 Best Open-Source Autonomous Driving Datasets

In recent years, more and more companies and research institutions have made their autonomous driving datasets open to the public. However, the best datasets are not always easy to find, and scouring the internet for them takes time.

To help, we at SiaSearch have put together a list of the top 15 open datasets for autonomous driving. The resources below collectively contain millions of data samples, many of which are already annotated. We hope this list provides you with a solid starting point for learning more about the field, or for starting your own autonomous driving project.

Top Open Datasets for Autonomous Driving Projects

  1. A2D2 Dataset
    The Audi Autonomous Driving Dataset (A2D2) features over 41,000 labeled with 38 features. Around 2.3 TB in total, A2D2 is split by annotation type (i.e. semantic segmentation, 3D bounding box).

  2. ApolloScape Dataset
    ApolloScape is an evolving research project that aims to foster innovation across all aspects of autonomous driving, from perception to navigation and control. Via their website, users can explore a variety of simulation tools and over 100K street view frames, 80k lidar point cloud and 1000km trajectories for urban traffic.

  3. Argoverse Dataset
    The Argoverse dataset includes 3D tracking annotations for 113 scenes and over 324,000 unique vehicle trajectories for motion forecasting.

  4. Berkeley DeepDrive Dataset
    Also known as BDD 100K, the DeepDrive dataset gives users access to 100,000 annotated videos and 10 tasks to evaluate image recognition algorithms for autonomous driving. The dataset represents more than 1000 hours of driving experience with more than 100 million frames, as well as information on geographic, environmental, and weather diversity.

  5. CityScapes Dataset
    CityScapes is a large-scale dataset focused on the semantic understanding of urban street scenes in 50 German cities. It features semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories. The entire dataset includes 5,000 annotated images with fine annotations, and an additional 20,000 annotated images with coarse annotations.

  6. Comma2k19 Dataset
    This dataset includes 33 hours of commute time recorded on highway 280 in California. Each 1-minute scene was captured on a 20km section of highway driving between San Jose and San Francisco. The data was collected using comma EONs, which features a road-facing camera, phone GPS, thermometers and a 9-axis IMU.

  7. Google-Landmarks Dataset
    Published by Google in 2018, the Landmarks dataset is divided into two sets of images to evaluate recognition and retrieval of human-made and natural landmarks. The original dataset contains over 2 million images depicting 30 thousand unique landmarks from across the world. In 2019, Google published Landmarks-v2, an even larger dataset with 5 million images and 200k landmarks.

  8. KITTI Vision Benchmark Suite
    First released in 2012 by Geiger et al, the KITTI dataset was released with the intent of advancing autonomous driving research with a novel set of real-world computer vision benchmarks. One of the first ever autonomous driving datasets, KITTI boasts over 4000 academic citations and counting.

  9. Level 5 Open Data
    Published by popular rideshare app Lyft, the Level5 dataset is another great source for autonomous driving data. It includes over 55,000 human-labeled 3D annotated frames, surface map, and an underlying HD spatial semantic map that is captured by 7 cameras and up to 3 LiDAR sensors that can be used to contextualize the data.

  10. nuScenes Dataset
    Developed by Motional, the nuScenes dataset is one of the largest open-source datasets for autonomous driving. Recorded in Boston and Singapore using a full sensor suite (32-beam LiDAR, 6 360° cameras and radars), the dataset contains over 1.44 million camera images capturing a diverse range of traffic situations, driving maneuvers, and unexpected behaviors.

Looking for more datasets? Read the entire blogpost at

Top comments (0)