DEV Community

loading...

Innovative Technology NVIDIA StyleGAN2

Juyae
imjuyae
・7 min read

StyleGAN2 - Official TensorFlow Implementation

Style Based Generator Architecture for Generative Adversarial Networks

Developmental Environment
I used deep learning server environment provided by Seoul Women's University.

  • 64-bit Python 3.6 installation. I recommend Anaconda3 with numpy 1.14.3 or newer.
  • TensorFlow 1.10.0 or newer with GPU support.
  • One or more high-end NVIDIA GPUs with at least 11GB of DRAM. We recommend NVIDIA DGX-1 with 8 Tesla V100 GPUs.
  • NVIDIA driver 391.35 or newer, CUDA toolkit 9.0 or newer, cuDNN 7.3.1 or newer.

Do you think these people are REAL ?

Answer is NO. These people are produced by NVIDIA's generator that allows control over different aspects of the image. The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent vectors to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably detect if an image is generated by a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.

Resources

Material related to our paper is available via the following links:

Path Description
StyleGAN Main folder.
stylegan-paper.pdf High-quality version of the paper PDF.
stylegan-video.mp4 High-quality version of the result video.
images Example images produced using our generator.
representative-images High-quality images to be used in articles, blog posts, etc.
100k-generated-images 100,000 generated images for different amounts of truncation.
ffhq-1024x1024 Generated using Flickr-Faces-HQ dataset at 1024×1024.
bedrooms-256x256 Generated using LSUN Bedroom dataset at 256×256.
cars-512x384 Generated using LSUN Car dataset at 512×384.
cats-256x256 Generated using LSUN Cat dataset at 256×256.
videos Example videos produced using our generator.
high-quality-video-clips Individual segments of the result video as high-quality MP4.
ffhq-dataset Raw data for the Flickr-Faces-HQ dataset.
networks Pre-trained networks as pickled instances of dnnlib.tflib.Network.
stylegan-ffhq-1024x1024.pkl StyleGAN trained with Flickr-Faces-HQ dataset at 1024×1024.
stylegan-celebahq-1024x1024.pkl StyleGAN trained with CelebA-HQ dataset at 1024×1024.
stylegan-bedrooms-256x256.pkl StyleGAN trained with LSUN Bedroom dataset at 256×256.
stylegan-cars-512x384.pkl StyleGAN trained with LSUN Car dataset at 512×384.
stylegan-cats-256x256.pkl StyleGAN trained with LSUN Cat dataset at 256×256.
metrics Auxiliary networks for the quality and disentanglement metrics.
inception_v3_features.pkl Standard Inception-v3 classifier that outputs a raw feature vector.
vgg16_zhang_perceptual.pkl Standard LPIPS metric to estimate perceptual similarity.
celebahq-classifier-00-male.pkl Binary classifier trained to detect a single attribute of CelebA-HQ.

StyleGAN's Pre-Trained Networks

Once the datasets are set up, you can train your own StyleGAN networks as follows:

  1. Edit train.py to specify the dataset and training configuration by uncommenting or editing specific lines.
  2. Run the training script with python train.py.
  3. The results are written to a newly created directory results/<ID>-<DESCRIPTION>.
  4. The training may take several days (or weeks) to complete, depending on the configuration.

By default, train.py is configured to train the highest-quality StyleGAN (configuration F in Table 1) for the FFHQ dataset at 1024×1024 resolution using 8 GPUs. Please note that we have used 8 GPUs in all of our experiments. Training with fewer GPUs may not produce identical results – if you wish to compare against our technique, we strongly recommend using the same number of GPUs.

Expected training times for the default configuration using Tesla V100 GPUs:

GPUs 1024×1024 512×512 256×256
1 41 days 4 hours 24 days 21 hours 14 days 22 hours
2 21 days 22 hours 13 days 7 hours 9 days 5 hours
4 11 days 8 hours 7 days 0 hours 4 days 21 hours
8 6 days 14 hours 4 days 10 hours 3 days 8 hours

It takes lot of time for training so I recommend you to charge NVIDIA's Cuda Environment

Datasets for training

The training and evaluation scripts operate on datasets stored as multi-resolution TFRecords. Each dataset is represented by a directory containing the same image data in several resolutions to enable efficient streaming. There is a separate *.tfrecords file for each resolution, and if the dataset contains labels, they are stored in a separate file as well. By default, the scripts expect to find the datasets at datasets/<NAME>/<NAME>-<RESOLUTION>.tfrecords. The directory can be changed by editing config.py:

result_dir = 'results'
data_dir = 'datasets'
cache_dir = 'cache'
Enter fullscreen mode Exit fullscreen mode

To obtain the FFHQ dataset (datasets/ffhq), please refer to the Flickr-Faces-HQ repository.

To obtain the CelebA-HQ dataset (datasets/celebahq), please refer to the Progressive GAN repository.

To obtain other datasets, including LSUN, please consult their corresponding project pages. The datasets can be converted to multi-resolution TFRecords using the provided dataset_tool.py:

> python dataset_tool.py create_lsun datasets/lsun-bedroom-full ~/lsun/bedroom_lmdb --resolution 256
> python dataset_tool.py create_lsun_wide datasets/lsun-car-512x384 ~/lsun/car_lmdb --width 512 --height 384
> python dataset_tool.py create_lsun datasets/lsun-cat-full ~/lsun/cat_lmdb --resolution 256
> python dataset_tool.py create_cifar10 datasets/cifar10 ~/cifar10
> python dataset_tool.py create_from_images datasets/custom-dataset ~/custom-images
Enter fullscreen mode Exit fullscreen mode

Disentanglement & Evaluating Quality

The quality and disentanglement metrics used in our paper can be evaluated using run_metrics.py. By default, the script will evaluate the Fréchet Inception Distance (fid50k) for the pre-trained FFHQ generator and write the results into a newly created directory under results. The exact behavior can be changed by uncommenting or editing specific lines in run_metrics.py.

Expected evaluation time and results for the pre-trained FFHQ generator using one Tesla V100 GPU:

Metric Time Result Description
fid50k 16 min 4.4159 Fréchet Inception Distance using 50,000 images.
ppl_zfull 55 min 664.8854 Perceptual Path Length for full paths in Z.
ppl_wfull 55 min 233.3059 Perceptual Path Length for full paths in W.
ppl_zend 55 min 666.1057 Perceptual Path Length for path endpoints in Z.
ppl_wend 55 min 197.2266 Perceptual Path Length for path endpoints in W.
ls 10 hours z: 165.0106
w: 3.7447
Linear Separability in Z and W.

Training curves for FFHQ config F (StyleGAN2) compared to original StyleGAN using 8 GPUs:

Training curves

Path Length Regularization to smooth latent space

The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent vectors to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably detect if an image is generated by a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.

Conclusion

StyleGAN2 was my CAPSTONE DESIGN FINAL PROJECT. Next time I will post GAN basic, StyleGAN2 proposed in NVIDIA paper. Priority to main metrics and normalization. See you then!

References

https://arxiv.org/abs/1812.04948 (A Style-Based Generator Architecture for Generative Adversarial Networks)

https://arxiv.org/pdf/1912.04958.pdf (Analyzing and Improving the Image Quality of StyleGAN)

https://medium.com/analytics-vidhya/from-gan-basic-to-stylegan2-680add7abe82 (From GAN basic to StyleGAN2)

Discussion (0)