Also: Stable Diffusion and 3D
Tutorials
- https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
- Another diffusion tutorial https://github.com/Project-MONAI/GenerativeModels/blob/main/tutorials/generative/2d_ddpm/2d_ddpm_compare_schedulers.ipynb
- PyTorch code with explanation link
- no gaussian noise, masks instead. any other distortion is good also https://arxiv.org/pdf/2208.09392.pdf
Papers
- diffusion probabilistic models (Sohl-Dickstein et al., 2015)
- noise-conditioned score network (NCSN; Yang & Ermon, 2019)
- denoising diffusion probabilistic models (DDPM; Ho et al. 2020, example, tricks)
Our neural network architecture follows the backbone of PixelCNN++, which is a U-Net based on a Wide ResNet
Denoising Diffusion Probabilistic Models
- adding more confusing physics: Annealed Langevin dynamics
- self-conditioning
- P2 weighting (PyTorch code)
- non linear scheduling
- from OpenAI bragging that diffusion beats GAN
- some people referring that UNet model in duffision is taken from PixelCNN++
condition on whole pixels, rather than R/G/B sub-pixels
PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
Dataset: poses
I incline to small but coherent datasets. This will make every diffusion model precise in their little purposes. Which means I need to work on my dataset. It has already some categories and I will start to train on <redacted> recognition models using GAN, but to generate something interesting I need to recognize poses and generate new ones (I saw GAN models can do it).
Pose recognition is very helpful for robots anyway. That’s how they can learn new movements, though their bone and “muscle” structure is different. But maybe they can imitate to some degree.
Extra
It's possible to get access to TPU v3-8 (similar to 8 V100 GPUs) with Google TRC program.
On top you need the Super resolution models. From IF, for example. And tutorial to use small memory footprint.
Lessons
Diffusion model staggers and not improving loss
It’s normal. Diffusion models can plateau at 0.1 or somewhere there. It’s better not fixate on that
Why batch size not reducing train time?
From this tweet TL;DR:
- Maximum speed: Largest batch_size as possible.
- Maximum generalization: small batch_size, and increase throughout the training.
There is a lot of confusion about neural networks batch size during training, here is what I know. The batch_size
is a balance between the training speed and generalization performance.
Generally, up to a certain limit (can be around around 8-9 samples), the smaller the batch: the better the generalization performance on the validation set.
In addition: Increasing the batch_size
throughout the training also helps with the validation performance.
If you changed your batch_size
, it is important to also change the learning_rate
as well. A good ratio for this is according to the ratio of the batch_size
change. Larger batches: Need larger learning_rate
. Smaller batches Need smaller learning_rate
.