Diffusion Model from Principle to Practice (Li Xinwei, Su Busheng, Xu Haoran, Yu Haiming)

2024-01-28
7.84MB
Points it Requires : 2

Download

repReport

Document Introduction
You Might Like
Recommended Downloads

★The application field of AIGC is becoming more and more extensive, and in the field of image generation, the diffusion model is an important application of AIGC technology. ★This book takes the theoretical knowledge of the diffusion model as the starting point, introduces the relevant knowledge of the diffusion model from the shallow to the deep, and uses a large number of vivid and interesting practical cases to help readers understand the relevant details of the diffusion model. ★This book is suitable for all AI researchers, related scientific researchers and practitioners who have drawing needs in their work who are interested in the diffusion model. It can also be used as a reference book for students of computer and other related majors. ◎Content introduction: The book has 8 chapters in total, which introduces the principles of the diffusion model in detail, as well as important concepts and methods such as diffusion model degradation, sampling, and DDIM inversion. In addition, it also introduces Stable Diffusion, ControlNet and audio diffusion models. Finally, the appendix provides a high-quality image set generated by the diffusion model and related resources of the Hugging Face community. ◎Professional book review: This book systematically introduces the principles and relevant details of the diffusion model. At the same time, the rich practical cases in the book will also lead readers to quickly get started with the diffusion model. For anyone who wants to learn and understand the diffusion model, this book is a valuable reference. ——Zhou Ming, founder and CEO of Lanzhou Technology, chief scientist of Sinovation Ventures, and vice president of CCFInspired by non-equilibrium thermodynamics, the diffusion model has quickly become a dazzling new star in the field of AIGC with its good mathematical interpretation and controllable generation diversity. Starting from \"a drop of ink\", this book \"diffuses\" the AIGC blueprint of images, texts and audios from theory to practice, and retains the essence for readers, removes \"noise\", and restores the most realistic \"distribution\" of the knowledge system. ——Yang Yaodong, researcher at the Institute of Artificial Intelligence of Peking UniversityThe artificial intelligence diffusion model has achieved amazing results in recent years, which can effectively solve the bottleneck problem of visual content generation. Reading this book carefully, you can have a deeper understanding of the principles behind the diffusion model, and you can also practice it based on it, so as to firmly grasp the diffusion model and lay a solid foundation for further innovation or in-depth application. This book is worth recommending! ——Zhong Sheng, CTO of Agora Throughout human history, opportunities always belong to those who take the lead in occupying the height of the future. It is necessary for each of us to explore the mysteries of artificial intelligence in order to compete for a place in the upcoming tide of change. ——Ma Boyong, author \"Diffusion Model from Principle to Practice\" is based on Hugging Face\'s Diffusion Model course. It combines theory and examples to build a complete learning framework for readers. Whether you are a novice or an experienced practitioner, this practice-oriented book can help you better understand and apply the diffusion model. ——Wang Tiezhen, head of Hugging Face China, senior engineer With the launch of Stable Diffusion and Midjourney, AI painting in the form of literary drawings has become extremely popular. Many game character designs and online store page designs use AI painting tools. This book systematically sorts out a series of principle details behind AI painting, and has code practice. I highly recommend everyone to read this book! ——July, founder and CEO of July Online Chapter 1 Introduction to Diffusion Model 1 1.1 Principle of Diffusion Model 1 1.1.1 Generative Model 1 1.1.2 Diffusion Process 2 1.2 Development of Diffusion Model 5 1.2.1 Start Diffusion: Proposal and Improvement of Basic Diffusion Model 6 1.2.2 Accelerate Generation: Sampler 6 1.2.3 Refresh the Record: Diffusion Model Guided by Explicit Classifier 7 1.2.4 Explode the Network: Multimodal Image Generation Based on CLIP 8 1.2.5 \"Out of the Circle\" Again: \"Relearning\" Method of Large Models-DreamBooth, LoRA and ControlNet 8 1.2.6 Opening the Era of AI Painting: Many Commercial Companies Propose Mature Image Generation Solutions 10 1.3 Application of Diffusion Model 12 1.3.1 Computer Vision 12 1.3.2 Time Series Data Prediction 14 1.3.3 Natural Language 15 1.3.4 Text-based Multimodality 16 1.3.5 Basic Science of AI 19 Chapter 2 Introduction to Hugging Face 21 2.1 Introduction to the core functions of Hugging Face 21 2.2 Hugging Face Open Source Library 28 2.3 Introduction to Gradio Tools 30 Chapter 3 Building a Diffusion Model from Scratch 33 3.1 Environment Preparation 33 3.1.1 Environment Creation and Import 33 3.1.2 Dataset Testing 34 3.2 Degradation Process of Diffusion Model 34 3.3 Training of Diffusion Model 36 3.3.1 UNet Network 36 3.3.2 Start Training Model 38 3.4 Sampling Process of Diffusion Model 41 3.4.1 Sampling Process 41 3.4.2 Difference from DDPM 44 3.4.3 UNet2DModel Model 44 3.5 Example of Degradation Process of Diffusion Model 57 3.5.1 Degradation process 57 3.5.2 Final training goal 59 3.6 Expanded knowledge 60 3.6.1 Time step adjustment 60 3.6.2 Key issues of sampling 61 3.7 Summary of this chapter 61 Chapter 4 Diffusers practice 62 4.1 Environment preparation 62 4.1.1 Install the Diffusers library 62 4.1.2 DreamBooth 64 4.1.3 Diffusers core API 66 4.2 Practice: Generate beautiful butterfly images 67 4.2.1 Download butterfly image collection 67 4.2.2 Diffusion model scheduler 69 4.2.3 Define diffusion model 70 4.2.4 Create diffusion model training loop 72 4.2.5 Image generation 75 4.3 Expanded knowledge 77 4.3.1 Upload the model to Hugging Face Hub 77 4.3.2 4.4 Chapter Summary 81 Chapter 5 Fine-tuning and Bootstrapping 83 5.1 Environment Preparation 86 5.2 Loading a Pretrained Pipeline 87 5.3 DDIM - Faster Sampling 88 5.4 Fine-tuning a Diffusion Model 91 5.4.1 Practice: Fine-tuning 91 5.4.2 Fine-tuning a Model Using a Minimized Example 96 5.4.3 Saving and Loading a Fine-tuned Pipeline 97 5.5 Bootstrapping a Diffusion Model 98 5.5.1 Practice: Bootstrapping 100 5.5.2 CLIP Bootstrapping 104 5.6 Sharing Your Custom Sampling Training 108 5.7 Practice: Creating a Class-Conditional Diffusion Model 111 5.7.1 Configuration and Data Preparation 111 5.7.2 Creating a Class-Conditional UNet Model 112 5.7.3 Training and Sampling 114 5.8 Chapter Summary 117 Chapter 6 Stable Diffusion 118 6.1 Basic Concepts 118 6.1.1 Implicit Diffusion 118 6.1.2 Using Text as a Condition for Generation 119 6.1.3 Without Classifier Guidance 121 6.1.4 Other Types of Conditional Generation Models: Img2Img, Inpainting, and Depth2Img Models 122 6.1.5 Fine-tuning with DreamBooth 123 6.2 Environment Preparation 124 6.3 Generating Images from Text 125 6.4 Stable Diffusion Pipeline 128 6.4.1 Variant Autoencoder 128 6.4.2 Tokenizer and Text Encoder 129 6.4.3 UNet 131 6.4.4 Scheduler 132 6.4.5 DIY Sampling Loop 134 6.5 Introduction to Other Pipelines 136 6.5.1 Img2Img 136 6.5.2 Inpainting 138 6.5.3 Depth2Image 139 6.6 Chapter Summary 140 Chapter 7 DDIM Inversion 141 7.1 Practice: Inversion 141 7.1.1 Configuration 141 7.1.2 Loading a Pretrained Pipeline 142 7.1.3 DDIM Sampling 143 7.1.4 Inversion 147 7.2 Combination Package 153 7.3 ControlNet Structure and Training Process 158 7.4 ControlNet Example 162 7.4.1 ControlNet and Canny Edge 162 7.4.2 ControlNet and M-LSD Lines 162 7.4.3 ControlNet and HED Boundary 163 7.4.4 ControlNet and Graffiti 164 7.4.5 ControlNet and Human Key Points 164 7.4.6 ControlNet and Semantic Segmentation 164 7.5 ControlNet in Practice 165 7.6 Summary of this Chapter 174 Chapter 8 Audio Diffusion Model 175 8.1 Practice: Audio Diffusion Model 175 8.1.1 Setup and Import 175 8.1.2 Sampling in the Pre-trained Audio Diffusion Model Pipeline 176 8.1.3 Converting from Audio to Spectrum 177 8.1.4 Fine-tuning the Pipeline 180 8.1.5 Training Loop 183 8.2 Upload the model to Hugging Face Hub 186 8.3 Summary of this chapter 187 Appendix A Beautiful image collection display 188 Appendix B Hugging Face related resources 202

unfold

Diffusion Model from Principle to Practice (Li Xinwei, Su Busheng, Xu Haoran, Yu Haiming)

Document Introduction