DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on

Datasets for Computer Vision (5)

Buy Me a Coffee

*Memos:

(1) PASCAL VOC(Pattern Analysis, Statistical Modelling, and Computational Learning Visual Object Classes)(2005):

  • has object images and annotations with 4, 10 or 20 classes and there are the 8 datasets VOC2005, VOC2006, VOC2007, VOC2008, VOC2009, VOC2010, VOC2011 and VOC2012: *Memos:
    • VOC2005 has 2,232 images and annotations(some for train, some for validation and some for test) with 4 classes.
    • VOC2006 has 5,304 images and annotations(1,277 for train, 1,341 for validation and 2,686 for test) with 10 classes.
    • VOC2007 has 9,963 images and annotations(2,501 for train, 2,510 for validation and 4,952 for test) with 20 classes.
    • VOC2008 has 5,096 images and annotations(2,111 for train, 2,221 for validation and 764 as extra) with 20 classes. *There are 4,133 images for test in it but just ignore them.
    • VOC2009 has 7,818 images and annotations(3,473 for train, 3,581 for validation and 764 as extra) with 20 classes.
    • VOC2010 has 11,321 images and annotations(4,998 for train, 5,105 for validation and 1,218 as extra) with 20 classes.
    • VOC2011 has 14,961 images and annotations(5,717 for train, 5,823 for validation and 3,421 as extra) with 20 classes.
    • VOC2012 has 17,125 images and annotations(5,717 for train, 5,823 for validation and 5,585 as extra) with 20 classes.
  • is VOCSegmentation() and VOCDetection() in PyTorch.

Image description

(2) SUN Database(Scene UNderstanding database)(2010):

  • has 108,754 scene images with 397 classes.
  • is also called SUN397.
  • is SUN397() in PyTorch.

Image description

(3) Kinetics Dataset(2017):

  • has human action short video clips and there are the 3 datasets Kinetics-400, Kinetics-600 and Kinetics-700: *Memos:
    • Each video clip lasts around 10 seconds.
    • Kinetics-400(2017) has 306,245 video clips each connected to the label from 400 categories(classes).
    • Kinetics-600(2018) has 495,547 video clips each connected to the label from 600 categories.
    • Kinetics-700(2019) has 545,317 video clips each connected to the label from 700 categories.
  • is used for Video Classification.
  • is Kinetics() in PyTorch.

Image description

(4) Cityscapes(2016):

  • has the 25,000 annotated urban street scene images of semantic understanding with the 30 classes grouped into 8 categories. *5,000 images are fine-annotated and 20,000 images are coarse-annotated.
  • is used for Image Segmentation.
  • is Cityscapes() in PyTorch. *How to set the dataset isn't explained.

Fine-annotated images:

Image description

Coarse-annotated images:

Image description

Top comments (0)