QMNIST in PyTorch

#python #pytorch #qmnist #dataset

Buy Me a Coffee☕

*My post explains QMNIST.

QMNIST() can use QMNIST dataset as shown below:

*Memos:

The 1st argument is root(Required-Type:str or pathlib.Path). *An absolute or relative path is possible.
The 2nd argument is what(Optional-Default:None-Type:str). *"train"(60,000 images), "test"(60,000 images), "test10k"(10,000 images), "test50k"(50,000 images) or "nist"(402,953 images) can be set to it.
The 3rd argument is compat(Optional-Default:True-Type:bool). *If it's True, the class number of each image is returnd(for compatibility with the MNIST dataloader) while if it's False, the 1D tensor of the full qmnist information is returned.
The 4th argument is train argument(Optional-Default:True-Type:bool): *Memos:
- It's ignored if what isn't None.
- If it's True, train data(60,000 images) is used while if it's False, test data(60,000 images) is used.
There is transform argument(Optional-Default:None-Type:callable). *transform= must be used.
There is target_transform argument(Optional-Default:None-Type:callable). *target_transform= must be used.
There is download argument(Optional-Default:False-Type:bool): *Memos:
- download= must be used.
- If it's True, the dataset is downloaded from the internet and extracted(unzipped) to root.
- If it's True and the dataset is already downloaded, it's extracted.
- If it's True and the dataset is already downloaded and extracted, nothing happens.
- It should be False if the dataset is already downloaded and extracted because it's faster.
- You can manually download and extract the dataset(qmnist-train-images-idx3-ubyte.gz, qmnist-train-labels-idx2-int.gz, qmnist-test-labels-idx2-int.gz, qmnist-test-images-idx3-ubyte.gz, xnist-images-idx3-ubyte.xz and xnist-labels-idx2-int.xz) from here to data/QMNIST/raw/.

from torchvision.datasets import QMNIST

train_data = QMNIST(
    root="data"
)

train_data = QMNIST(
    root="data",
    what=None,
    compat=True,
    train=True,
    transform=None,
    target_transform=None,
    download=False
)

train_data = QMNIST(
    root="data",
    what="train",
    train=False
)

test_data = QMNIST(
    root="data",
    train=False
)

test_data = QMNIST(
    root="data",
    what="test",
    train=True
)

test10k_data = QMNIST(
    root="data",
    what="test10k"
)

test50k_data = QMNIST(
    root="data",
    what="test50k",
    compat=False
)

nist_data = QMNIST(
    root="data",
    what="nist"
)

l = len
l(train_data), l(test_data), l(test10k_data), l(test50k_data), l(nist_data)
# (60000, 60000, 10000, 50000, 402953)

train_data
# Dataset QMNIST
#     Number of datapoints: 60000
#     Root location: data
#     Split: train

train_data.root
# 'data'

train_data.what
# 'train'

train_data.compat
# True

train_data.train
# True

print(train_data.transform)
# None

print(train_data.target_transform)
# None

train_data.download
# <bound method QMNIST.download of Dataset QMNIST
#     Number of datapoints: 60000
#     Root location: data
#     Split: train>

len(train_data.classes)
# 10

train_data.classes
# ['0 - zero', '1 - one', '2 - two', '3 - three', '4 - four',
#  '5 - five', '6 - six', '7 - seven', '8 - eight', '9 - nine']

train_data[0]
# (<PIL.Image.Image image mode=L size=28x28>, 5)

test50k_data3[0]
# (<PIL.Image.Image image mode=L size=28x28>,
#  tensor([3, 4, 2424, 51, 33, 261051, 0, 0]))

train_data[1]
# (<PIL.Image.Image image mode=L size=28x28>, 0)

test50k_data3[1]
# (<PIL.Image.Image image mode=L size=28x28>,
#  tensor([8, 1, 522, 60, 38, 55979, 0, 0]))

train_data[2]
# (<PIL.Image.Image image mode=L size=28x28>, 4)

test50k_data3[2]
# (<PIL.Image.Image image mode=L size=28x28>,
#  tensor([9, 4, 2496, 115, 39, 269531, 0, 0]))

train_data[3]
# (<PIL.Image.Image image mode=L size=28x28>, 1)

test50k_data3[3]
# (<PIL.Image.Image image mode=L size=28x28>,
#  tensor([5, 4, 2427, 77, 35, 261428, 0, 0]))

train_data[4]
# (<PIL.Image.Image image mode=L size=28x28>, 9)

test50k_data3[4]
# (<PIL.Image.Image image mode=L size=28x28>,
#  tensor([7, 4, 2524, 69, 37, 272828, 0, 0]))

from torchvision.datasets import QMNIST

train_data = QMNIST(
    root="data",
    what="train"
)

test_data = QMNIST(
    root="data",
    what="test"
)

test10k_data = QMNIST(
    root="data",
    what="test10k"
)

test50k_data = QMNIST(
    root="data",
    what="test50k"
)

nist_data = QMNIST(
    root="data",
    what="nist"
)

import matplotlib.pyplot as plt

def show_images(data, main_title=None):
    plt.figure(figsize=(10, 4))
    plt.suptitle(t=main_title, y=0.8, fontsize=14)
    for i, (im, lab) in enumerate(data, start=1):
        plt.subplot(1, 5, i)
        plt.title(label=lab)
        plt.imshow(X=im)
        if i == 5:
            break
    plt.tight_layout()
    plt.show()

show_images(data=train_data, main_title="train_data")
show_images(data=test_data, main_title="test_data")
show_images(data=test10k_data, main_title="test10k_data")
show_images(data=test50k_data, main_title="test50k_data3")
show_images(data=nist_data, main_title="nist_data")