Datasets for Computer Vision (1)

#python #pytorch #dataset #computervision

Buy Me a Coffee☕

*Memos:

My post explains Fashion-MNIST, Caltech 101, Caltech 256, CelebA, CIFAR-10 and CIFAR-100.
My post explains Oxford-IIIT Pet, Oxford 102 Flower, Stanford Cars, Places365, Flickr8k and Flickr30k.
My post explains ImageNet, LSUN and MS COCO.
My post explains Image Classification(Recognition), Object Localization, Object Detection and Image Segmentation.
My post explains Keypoint Detection(Landmark Detection), Image Matching, Object Tracking, Stereo Matching, Video Prediction, Optical Flow, Image Captioning.

(1) MNIST(Modified National Institute of Standards and Technology)(1998):

has the 70,000 handwritten digit images[0~9] each connected to the label from 10 classes: *Memos:
- 60,000 for train and 10,000 for test.
- Each image has 28x28 pixels.
is used for Image Classification.
is MNIST() in PyTorch. *My post explains MNIST().

(2) EMNIST(Extended MNIST)(2017):

has the handwritten character images(digits[0~9] and alphabet letters[A~Z][a~z]) splitted into 6 datasets(ByClass, ByMerge, Balanced, Letters, Digits and MNIST): *Memos:
- Each image has 28x28 pixels.
- ByClass has the 814,255 character images(digits[0~9] and alphabet letters[A~Z][a~z]) each connected to the label from 62 classes. *697,932 for train and 116,323 for test.
- ByMerge has the 814,255 character images(digits[0~9] and alphabet letters[A~Z][a, b, d~h, n, q, r, t]) each connected to the label from 47 classes. *697,932 for train and 116,323 for test.
- Balanced has the 131,600 character images(digits[0~9] and alphabet letters[A~Z][a, b, d~h, n, q, r, t]) each connected to the label from 47 classes. *112,800 for train and 18,800 for test.
- Letters has the 145,600 alphabet letter images[a~z] each connected to the label from 27 classes. *124,800 for train and 20,800 for test.
- Digits has the 280,000 digit images[0~9] each connected to the label from 10 classes. *240,000 for train and 40,000 for test.
- MNIST has the 70,000 digit images[0~9] each connected to the label from 10 classes. *60,000 for train and 10,000 for test.
- is used for Image Classification.
is EMNIST() in PyTorch. *My post explains EMNIST().

(3) QMNIST(2019):

has the 522,953 handwritten digit images[0~9] each connected to the label from 10 classes: *Memos:
- 60,000 for train, 60,000 for test and 402,953 of all the digits from the NIST Special Database 19.
- Each image has 28x28 pixels.
is an extended MNIST. *I don't know what Q of QMNIST means.
is used for Image Classification.
is QMNIST() in PyTorch. *My post explains QMNIST().

(4) ETLCDB(Extract-Transform-Load Character Database)(2011):

has the handwritten or machine-printed character images(digits, symbols, alphabet letters and Japanese characters) splitted into 9 datasets(ETL-1, ETL-2, ETL-3, ETL-4, ETL-5, ETL-6, ETL-7, ETL-8 and ETL-9): *Memos:
- ETL1 has the 141,319 character images(digits[0~9], alphabet letters[A~Z], symbols[+-*/=()・,?’] and Katakana[ア~ン]) each connected to the label from 99 classes. *Each image has 64x63 pixels.
- ETL2 has 52,796 character images(digits[0~9], alphabet letters[A~Z], symbols, Katakana letters[ア~ン], Hiragana letters[あ~ん] and Kanji letters) each connected to the label from 2,184 classes. *Each image has 60x60 pixels.
- ETL3 has 9,600 character images(digits[0~9], alphabet letters[A~Z] and symbols[¥+-*/=()・,_▾]) each connected to the label from 48 classes. *Each image has 72×76 pixels.
- ETL4 has 6,120 Hiragana letter images[あ~ん] each connected to the label from 51 classes. *Each image has 72×76 pixels.
- ETL5 has 10,608 Katakana letter images[ア~ン] each connected to the label from 51 classes. *Each image has 72×76 pixels.
- ETL6 has 52,796 character images(digits[0~9], alphabet letters[A~Z][a~z], symbols and Katakana letters[ア~ン]) each connected to the label from 114 classes. *Each image has 64x63 pixels.
- ETL7(ETL7L and ETL7S) has 16,800 character images(Hiragana letters[あ~ん], Dakuten[゛] and Handakuten[゜]) each connected to the label from 48 classes. *Each image has 64x63 pixels.
- ETL8(ETL8G and ETL8B2) has 152,960 character images(Hiragana letters[あ~ん] and Kanji letters) each connected to the label from 956 classes. *Each image has 128x127 pixels.
- ETL9(ETL9G and ETL9B) has 607,200 character images(Hiragana letters[あ~ん] and JIS first level Kanji letters) each connected to the label from 3,036 classes. *Each image has 128x127 pixels.
is used for Image Classification.
isn't in PyTorch so we need to download it from etlcdb.

(5) Kuzushiji(2018):

has the cursive style Japanese character images splitted into 3 datasets(Kuzushiji-MNIST, Kuzushiji-49 and Kuzushiji-Kanji): *Memos:
- Kuzushiji-MNIST has the 70,000 Hiragana letter images each connected to the label from 10 classes. *Each image has 28x28 pixels.
- Kuzushiji-49 has the imbalanced 270,912 character images(Hiragana letters and Hiragana iteration marks) each connected to the label from 49 classes. *Each image has 28x28 pixels.
- Kuzushiji-Kanji has the imbalanced 140,424 Kanji letter images of 3832 classes. *Each image has 64x64 pixels.
is used for Image Classification.
is KMNIST() in PyTorch but it only has Kuzushiji-MNIST so we need to download Kuzushiji-49 and Kuzushiji-Kanji from GitHub. *My post explains KMNIST().

(6) Moving MNIST(2015):

has 10,000 videos: *Memos:
- Each video has 20 frames(images) with 2 moving digits.
- Each frame(image) has 64x64 pixels.
is used for Video Prediction.
is MovingMNIST() in PyTorch. *My post explains MovingMNIST().