用LoRA定制专属AI画师

iResearch666

发布于 2023-09-13 06:23:06

51500

代码可运行

文章被收录于专栏：AI算法能力提高班AI算法能力提高班

运行总次数：0

代码可运行

Using LoRA to quickly fine-tune diffusion models

https://github.com/cloneofsimo/lora
https://github.com/kohya-ss/sd-scripts

Introduction

alpha_scale

Using LoRA to fine tune on illustration dataset : W=W0+aΔW, where a is the merging ratio. Above gif is scaling alpha from 0 to 1. Setting alpha to 0 is same as using the original model, and setting alpha to 1 is same as using the fully fine-tuned model.

a : strength (SD生成结果和自定义任务FT结果的blending强度)
Now, how would we actually use this to update diffusion model? First, we will use Stable-diffusion from stability-ai. Their model is nicely ported through Huggingface API, so this repo has built various fine-tuning methods around them. In detail, there are three subtle but important distictions in methods to make this work out. First, there is LoRA applied to Dreambooth. The idea is to use prior-preservation class images to regularize the training process, and use low-occuring tokens. This will keep the model's generalization capability while keeping high fidelity. If you turn off prior preservation, and train text encoder embedding as well, it will become naive fine tuning. Second, there is Textual inversion. There is no room to apply LoRA here, but it is worth mentioning. The idea is to instantiate new token, and learn the token embedding via gradient descent. This is a very powerful method, and it is worth trying out if your use case is not focused on fidelity but rather on inverting conceptual ideas. Last method (although originally proposed for GANs) takes the best of both worlds to further benefit. When combined together, this can be implemented as a strict generalization of both methods. Simply you apply textual inversion to get a matching token embedding. Then, you use the token embedding + prior-preserving class image to fine-tune the model. This two-fold nature make this strict generalization of both methods.
1. Pivotal Tuning
2. Textual Inversion
3. Dreambooth

Main Features

Fine-tune Stable diffusion models twice as fast than dreambooth method, by Low-rank Adaptation
Get insanely small end result (1MB ~ 6MB), easy to share and download.
Compatible with diffusers
Support for inpainting
Sometimes even better performance than full fine-tuning (but left as future work for extensive comparisons)
Merge checkpoints + Build recipes by merging LoRAs together
Pipeline to fine-tune CLIP + Unet + token to gain better results.
Out-of-the box multi-vector pivotal tuning inversion

LoRA Applications

1 Character LoRA

根据特定角色（如卡通或视频游戏角色）训练的模型。角色 LoRA 能够准确再现角色的外观和感觉，以及与之相关的任何关键特征。这是最常见的 LoRA 类型，因为在没有这种训练数据的情况下生成角色往往很棘手，而且不连贯。应用角色 LoRA 可以快速生成具有真实外观的角色，使其成为 AI 插图、角色概念图甚至参考表的完美选择。根据模型的训练情况，角色可能会适合某种服装、特定发型甚至某种面部表情。不过，有些角色 LoRA 可以为您选择的角色换上新的服装和设置，使其魅力倍增。

character lora

Model used: Dreamlike Diffusion 1.0
LoRA used: The Joker | Photorealistic
Prompt used: portrait of the joker, high quality, 8k

2 Style LoRA

风格 LoRA 与角色 LoRA 有许多相似之处，但它不是针对特定角色或对象进行训练，而是侧重于一种艺术风格。这种类型的模型通常是针对特定艺术家的艺术作品进行训练，让您可以在自己的作品中使用他们的标志性风格。风格 LoRA 可用于从参考图像的风格化到以相同风格创作原创作品等任何用途。顾名思义，这些模型是根据特定风格进行训练的，如动画片、水彩画、线描等的特定外观。有了这种 LoRA 模型，您就可以轻松地为您的人工智能作品赋予独特的风格，使其在众多作品中脱颖而出！

style lora

Model used: AnyLoRA - Checkpoint
LoRA used: Arcane Style LoRA
Prompt used: arcane style, 1girl, pink hair, long hair, one braid, white shirt, coat, yellow eyes, looking at viewer, city street

3 Pose LoRA

将姿势 LoRA 应用到生成中的效果和它听起来一样--它会以特定的方式摆出角色的姿势。这非常适合生成动态场景，在动态场景中，您可以制作特定的姿势和动作，而这些姿势和动作通常很难或无法通过常规的提示工程来实现。姿势 LoRA 模型更侧重于角色的姿势，而不是其风格或特征。例如，如果您要将姿势 LoRA 模型应用到人形角色上，它会为他们创建不同的姿势，如奔跑、跳跃或坐姿，但不会改变他们的特征、服装，也不会改变您所使用模型的风格。

pose lora

Model used: GhostMix
LoRA used: Shinji in a Chair | Meme
Prompt used: solo, male focus, sitting, head down, short black hair, hooded jacket, jeans, sneakers

4 Clothing LoRa

另一个有用的模型是服装 LoRA。正如您所期望的那样，这种 LoRA 模型旨在更换人物的服装和配饰。有了它，您可以快速、轻松地为任何角色换上新衣服，无论是现代风格还是历史风格。这些模型最棒的地方在于它们适用于任何类型的角色。只需一个模型，您就可以应用各种不同民族的风格和设计！例如，如果您想创建一个人物穿着传统中式服装的场景，只需将您选择的服装 LoRA 套用到您的世代上，然后就可以了--即刻就可以穿上传统的中式服装！

clothing lora

Model used: GhostMix
LoRA used: hanfu
Prompt used: girl, blue hanfu, full body

5 Object LoRA

最后是物体 LoRA。这是一大类 LoRA 模型，用于生成家具、植物甚至车辆等物品。当然，您可以使用这些模型创建的物品类型取决于您使用的特定模型和您提供的提示。不过，这个术语也适用于用于创建更抽象对象的 LoRA，例如游戏或网站的用户界面元素。这对于为您的项目创建更具凝聚力的外观和感觉非常有用。对于需要高效创建资产的艺术家、游戏开发人员、网页设计师和其他创意专业人员来说，对象 LoRA 都是一种宝贵的工具。能够生成具有自定义设计的对象，让您可以自由地尝试和探索不同的视觉效果，直到找到适合您项目的完美效果。

object lora

Model used: Szechuan Special Sauce
LoRA used: Product Design (Dark minimalism-eddiemauro)
Prompt used: futuristic kettle, a computer rendering, minimalism, 4k

Captions

如果我们要训练一个角色，那么所有图片的说明都是同一个角色，这样它才能被识别。把一个女孩（a girl）换成初音未来（Hatsune Miku）
如果我们要训练的是一种风格，所以会描述除了图片风格之外的一切，因此我们不会使用“插画（an illustration of）”或者“照片（a photo of）”之类的术语。基本而言，我们希望对想要改变的一切进行说明，

image-20230901101936968

这个案例中，“一个绿色长发在空中飘扬的女孩（a girl wiht long green hair is flying through the air）”描述相当准确，但还不够精准。我要描述的是一个蓝绿色眼睛、绿色双马尾长发的少女（a teenage girl with teal eyes and long teal hair in twin tails），然后删掉其余所有描述，将在空中飞扬（flying through the air）留在最后。

那么，除了上述特征之外，她还有什么特点？可以看到，她穿了一件白色衬衫，白色短裙，黑色毛衣围在腰间，还穿了一件蓝色连衣裙，腿部暴露在外，还有一双蓝绿色溜冰鞋。随后，还可以看到蓝天白云。

完整说明文字：a teenage girl with teal eyes and long teal hair in twin tails, wearing a white colored blouse, white skirt, black sweater tied around waist, blue dress, exposed legs, teal rollerskated, blue sky with white clouds, flying through the air

那么，我为何要在这里描述的这么具体呢？如果我不在这里描述，那就意味着我希望LoRA学习这些东西，如果我不描述蓝绿色眼睛，那么最终生成的所有角色都会有蓝绿色眼睛。所以，一定要将能够改变的所有东西描述出来。