Vision Transformerの論文 Lucas
Large-scale training induces biases
Related works
Cordonnier et al.
Convolutional neural network-> object detection, image classification
ImageGPT: unsupervised fashion
Method
Vision transformer
Transformer input: 1D sequences
Reshape images into a sequence of flattened 2D patches.
the output of the transformer encoder serves as the image representation.
Adding position embeddings [CLASS] to each patches.
Vision transformer
簡単なアーキテクチャながら高い精度を達成しているのはすごいなあと
Top comments (0)