Juyae

Posted on Jun 16, 2021

Innovative Technology NVIDIA StyleGAN2

#machinelearning #ai #python #deeplearning

StyleGAN2 - Official TensorFlow Implementation

Style Based Generator Architecture for Generative Adversarial Networks

Developmental Environment
I used deep learning server environment provided by Seoul Women's University.

64-bit Python 3.6 installation. I recommend Anaconda3 with numpy 1.14.3 or newer.
TensorFlow 1.10.0 or newer with GPU support.
One or more high-end NVIDIA GPUs with at least 11GB of DRAM. We recommend NVIDIA DGX-1 with 8 Tesla V100 GPUs.
NVIDIA driver 391.35 or newer, CUDA toolkit 9.0 or newer, cuDNN 7.3.1 or newer.

Do you think these people are REAL ?

Answer is NO. These people are produced by NVIDIA's generator that allows control over different aspects of the image. The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent vectors to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably detect if an image is generated by a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.

Resources

Material related to our paper is available via the following links:

Paper: https://arxiv.org/abs/1812.04948
Video: https://youtu.be/kSLJriaOumA
Code: https://github.com/NVlabs/stylegan
FFHQ: https://github.com/NVlabs/ffhq-dataset

Path	Description
StyleGAN	Main folder.
stylegan-paper.pdf	High-quality version of the paper PDF.
stylegan-video.mp4	High-quality version of the result video.
images	Example images produced using our generator.
representative-images	High-quality images to be used in articles, blog posts, etc.
100k-generated-images	100,000 generated images for different amounts of truncation.
ffhq-1024x1024	Generated using Flickr-Faces-HQ dataset at 1024×1024.
bedrooms-256x256	Generated using LSUN Bedroom dataset at 256×256.
cars-512x384	Generated using LSUN Car dataset at 512×384.
cats-256x256	Generated using LSUN Cat dataset at 256×256.
videos	Example videos produced using our generator.
high-quality-video-clips	Individual segments of the result video as high-quality MP4.
ffhq-dataset	Raw data for the Flickr-Faces-HQ dataset.
networks	Pre-trained networks as pickled instances of dnnlib.tflib.Network.
stylegan-ffhq-1024x1024.pkl	StyleGAN trained with Flickr-Faces-HQ dataset at 1024×1024.
stylegan-celebahq-1024x1024.pkl	StyleGAN trained with CelebA-HQ dataset at 1024×1024.
stylegan-bedrooms-256x256.pkl	StyleGAN trained with LSUN Bedroom dataset at 256×256.
stylegan-cars-512x384.pkl	StyleGAN trained with LSUN Car dataset at 512×384.
stylegan-cats-256x256.pkl	StyleGAN trained with LSUN Cat dataset at 256×256.
metrics	Auxiliary networks for the quality and disentanglement metrics.
inception_v3_features.pkl	Standard Inception-v3 classifier that outputs a raw feature vector.
vgg16_zhang_perceptual.pkl	Standard LPIPS metric to estimate perceptual similarity.
celebahq-classifier-00-male.pkl	Binary classifier trained to detect a single attribute of CelebA-HQ.

StyleGAN's Pre-Trained Networks

Once the datasets are set up, you can train your own StyleGAN networks as follows:

Edit train.py to specify the dataset and training configuration by uncommenting or editing specific lines.
Run the training script with python train.py.
The results are written to a newly created directory results/<ID>-<DESCRIPTION>.
The training may take several days (or weeks) to complete, depending on the configuration.

By default, train.py is configured to train the highest-quality StyleGAN (configuration F in Table 1) for the FFHQ dataset at 1024×1024 resolution using 8 GPUs. Please note that we have used 8 GPUs in all of our experiments. Training with fewer GPUs may not produce identical results – if you wish to compare against our technique, we strongly recommend using the same number of GPUs.

Expected training times for the default configuration using Tesla V100 GPUs:

GPUs	1024×1024	512×512	256×256
1	41 days 4 hours	24 days 21 hours	14 days 22 hours
2	21 days 22 hours	13 days 7 hours	9 days 5 hours
4	11 days 8 hours	7 days 0 hours	4 days 21 hours
8	6 days 14 hours	4 days 10 hours	3 days 8 hours

It takes lot of time for training so I recommend you to charge NVIDIA's Cuda Environment

Datasets for training

The training and evaluation scripts operate on datasets stored as multi-resolution TFRecords. Each dataset is represented by a directory containing the same image data in several resolutions to enable efficient streaming. There is a separate *.tfrecords file for each resolution, and if the dataset contains labels, they are stored in a separate file as well. By default, the scripts expect to find the datasets at datasets/<NAME>/<NAME>-<RESOLUTION>.tfrecords. The directory can be changed by editing config.py:

result_dir = 'results'
data_dir = 'datasets'
cache_dir = 'cache'

To obtain the FFHQ dataset (datasets/ffhq), please refer to the Flickr-Faces-HQ repository.

To obtain the CelebA-HQ dataset (datasets/celebahq), please refer to the Progressive GAN repository.

To obtain other datasets, including LSUN, please consult their corresponding project pages. The datasets can be converted to multi-resolution TFRecords using the provided dataset_tool.py:

> python dataset_tool.py create_lsun datasets/lsun-bedroom-full ~/lsun/bedroom_lmdb --resolution 256
> python dataset_tool.py create_lsun_wide datasets/lsun-car-512x384 ~/lsun/car_lmdb --width 512 --height 384
> python dataset_tool.py create_lsun datasets/lsun-cat-full ~/lsun/cat_lmdb --resolution 256
> python dataset_tool.py create_cifar10 datasets/cifar10 ~/cifar10
> python dataset_tool.py create_from_images datasets/custom-dataset ~/custom-images

Disentanglement & Evaluating Quality

The quality and disentanglement metrics used in our paper can be evaluated using run_metrics.py. By default, the script will evaluate the Fréchet Inception Distance (fid50k) for the pre-trained FFHQ generator and write the results into a newly created directory under results. The exact behavior can be changed by uncommenting or editing specific lines in run_metrics.py.

Expected evaluation time and results for the pre-trained FFHQ generator using one Tesla V100 GPU:

Metric	Time	Result	Description
fid50k	16 min	4.4159	Fréchet Inception Distance using 50,000 images.
ppl_zfull	55 min	664.8854	Perceptual Path Length for full paths in Z.
ppl_wfull	55 min	233.3059	Perceptual Path Length for full paths in W.
ppl_zend	55 min	666.1057	Perceptual Path Length for path endpoints in Z.
ppl_wend	55 min	197.2266	Perceptual Path Length for path endpoints in W.
ls	10 hours	z: 165.0106 w: 3.7447	Linear Separability in Z and W.

Training curves for FFHQ config F (StyleGAN2) compared to original StyleGAN using 8 GPUs:

Path Length Regularization to smooth latent space

The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent vectors to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably detect if an image is generated by a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.

Conclusion

StyleGAN2 was my CAPSTONE DESIGN FINAL PROJECT. Next time I will post GAN basic, StyleGAN2 proposed in NVIDIA paper. Priority to main metrics and normalization. See you then!

References

https://arxiv.org/abs/1812.04948 (A Style-Based Generator Architecture for Generative Adversarial Networks)

https://arxiv.org/pdf/1912.04958.pdf (Analyzing and Improving the Image Quality of StyleGAN)

https://medium.com/analytics-vidhya/from-gan-basic-to-stylegan2-680add7abe82 (From GAN basic to StyleGAN2)

DEV Community

Innovative Technology NVIDIA StyleGAN2

StyleGAN2 - Official TensorFlow Implementation

Do you think these people are REAL ?

Resources

StyleGAN's Pre-Trained Networks

Datasets for training

Disentanglement & Evaluating Quality

Path Length Regularization to smooth latent space

Conclusion

References

Top comments (0)

Read next

Convert an Excel dataset into a SQL insert statement

Best AI Tools for Programmers

Top 10 Real-World Applications of Artificial Intelligence to Watch in 2025

How Does Machine Learns (Intro to Machine Learning)