Using LoRA to fine tune on illustration dataset : W=W0+aΔW, where a is the merging ratio. Above gif is scaling alpha from 0 to 1. Setting alpha to 0 is same as using the original model, and setting alpha to 1 is same as using the fully fine-tuned model.
a : strength (SD生成结果和自定义任务FT结果的blending强度)
Now, how would we actually use this to update diffusion model? First, we will use Stable-diffusion from stability-ai. Their model is nicely ported through Huggingface API, so this repo has built various fine-tuning methods around them. In detail, there are three subtle but important distictions in methods to make this work out.
First, there is LoRA applied to Dreambooth. The idea is to use prior-preservation class images to regularize the training process, and use low-occuring tokens. This will keep the model's generalization capability while keeping high fidelity. If you turn off prior preservation, and train text encoder embedding as well, it will become naive fine tuning.
Second, there is Textual inversion. There is no room to apply LoRA here, but it is worth mentioning. The idea is to instantiate new token, and learn the token embedding via gradient descent. This is a very powerful method, and it is worth trying out if your use case is not focused on fidelity but rather on inverting conceptual ideas.
Last method (although originally proposed for GANs) takes the best of both worlds to further benefit. When combined together, this can be implemented as a strict generalization of both methods. Simply you apply textual inversion to get a matching token embedding. Then, you use the token embedding + prior-preserving class image to fine-tune the model. This two-fold nature make this strict generalization of both methods.
Pivotal Tuning
Textual Inversion
Dreambooth
Main Features
Fine-tune Stable diffusion models twice as fast than dreambooth method, by Low-rank Adaptation
Get insanely small end result (1MB ~ 6MB), easy to share and download.
Compatible with diffusers
Support for inpainting
Sometimes even better performance than full fine-tuning (but left as future work for extensive comparisons)
Merge checkpoints + Build recipes by merging LoRAs together
Pipeline to fine-tune CLIP + Unet + token to gain better results.
这个案例中,“一个绿色长发在空中飘扬的女孩(a girl wiht long green hair is flying through the air)”描述相当准确,但还不够精准。我要描述的是一个蓝绿色眼睛、绿色双马尾长发的少女(a teenage girl with teal eyes and long teal hair in twin tails),然后删掉其余所有描述,将在空中飞扬(flying through the air)留在最后。
那么,除了上述特征之外,她还有什么特点?可以看到,她穿了一件白色衬衫,白色短裙,黑色毛衣围在腰间,还穿了一件蓝色连衣裙,腿部暴露在外,还有一双蓝绿色溜冰鞋。随后,还可以看到蓝天白云。
完整说明文字:a teenage girl with teal eyes and long teal hair in twin tails, wearing a white colored blouse, white skirt, black sweater tied around waist, blue dress, exposed legs, teal rollerskated, blue sky with white clouds, flying through the air
那么,我为何要在这里描述的这么具体呢?如果我不在这里描述,那就意味着我希望LoRA学习这些东西,如果我不描述蓝绿色眼睛,那么最终生成的所有角色都会有蓝绿色眼睛。所以,一定要将能够改变的所有东西描述出来。