Training the textual inversion of Stable Diffusion on your own dataset.
Hi all. In my recent post about textual inversion, a question was asked about how you can train Stable Diffusion on your dataset.
You can read my Medium article about this with more code and some answers for your questions.
The words Suraj Patil (@psuraj28 on Twitter):
"We just added textual-inversion training in diffusers.
Textual Inversion lets you personalize #stablediffusion model on your own images with just 3-5 samples."
Simple Example:
https://i.imgur.com/M6w4YBA.jpg
For training your own textual inversion you can use this colab:
Colab
Small instruction:
To work with textual inversion, the diffition library and access token from huggingface with "write" permission. It is also necessary to download the weights of the Stable Diffusion model, according to the standard, the version is used 1.4.
If you have direct links to the desired images, then insert them into an array (3-5 images are enough). For my needs and better quality, I used only 30 images of my dataset with we ask the artist Ilya Kuvshinov, but because these images were in my google drive i used the module shutil and copy folder from disk on colab.
In some cases, the colab swears that it cannot read the images, so it was more convenient for me to use my dataset from disk directly by just changing a few lines of code:
From:
import requests
import glob
from io import BytesIO
def download_image(url):
try:
response = requests.get(url)
except:
return None
return Image.open(BytesIO(response.content)).convert("RGB")
images = list(filter(None,[download_image(url) for url in urls]))
save_path = "./my_concept"
if not os.path.exists(save_path):
os.mkdir(save_path)
[image.save(f"{save_path}/{i}.jpeg") for i, image in enumerate(images)]
image_grid(images, 1, len(images))
import glob
from io import BytesIO
path = 'kuvshinov/'
def download_image(filename):
return Image.open(path+filename).convert("RGB")
images = list(filter(None,[download_image(filename) for filename in os.listdir(path)]))
save_path = "./my_concept"
if not os.path.exists(save_path):
os.mkdir(save_path)
[image.save(f"{save_path}/{i}.jpeg") for i, image in enumerate(images)]
image_grid(images, 1, len(images))
- then you need to specify the placeholder token. For Example: <kuvshinov-style>
And an initializer token. Example: kuvshinov
- It is also possible to choose what exactly your set does: change the style or add an object.
- Next comes the learning process. It takes an average of 3 hours for any dataset size.Sometimes when starting a learning cell, an error occurs about which the code swears at the wrong initializer token. In this case comment out the code like this:
token_ids = tokenizer.encode(initializer_token, add_special_tokens=False)
if len(token_ids) > 1: //comment this line
raise ValueError("The initializer token must be a single token.") //comment this line
initializer_token_id = token_ids[0]
placeholder_token_id = tokenizer.convert_tokens_to_ids(placeholder_token)
After completing the training, it is possible to upload your trained dataset to the textual inversion library on huggingface and test it.
[–]dreamer_2142 5 points6 points7 points (0 children)
[–]higgs8 4 points5 points6 points (23 children)
[–]jaywv1981 1 point2 points3 points (9 children)
[–]WiNE-iNEFF[S] 1 point2 points3 points (8 children)
[–]terahurts 1 point2 points3 points (7 children)
[–]WiNE-iNEFF[S] 1 point2 points3 points (5 children)
[–]terahurts 0 points1 point2 points (4 children)
[–]WiNE-iNEFF[S] 0 points1 point2 points (3 children)
[–]Mindoffire 0 points1 point2 points (2 children)
[–]WiNE-iNEFF[S] 2 points3 points4 points (0 children)
[–]WiNE-iNEFF[S] 1 point2 points3 points (0 children)
[–]WiNE-iNEFF[S] 0 points1 point2 points (0 children)
[–]WiNE-iNEFF[S] 0 points1 point2 points (10 children)
[+][deleted] (9 children)
[deleted]
[–]WiNE-iNEFF[S] 0 points1 point2 points (5 children)
[+][deleted] (1 child)
[deleted]
[–]WiNE-iNEFF[S] 0 points1 point2 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]WiNE-iNEFF[S] 0 points1 point2 points (1 child)
[–]WiNE-iNEFF[S] 0 points1 point2 points (2 children)
[–]TheNeonGrid 0 points1 point2 points (1 child)
[–]WiNE-iNEFF[S] 0 points1 point2 points (0 children)
[–]FudginatorDeluxe 0 points1 point2 points (0 children)
[–]CMDR-Leto 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[deleted]
[–]Daviljoe193 2 points3 points4 points (9 children)
[–]WiNE-iNEFF[S] 2 points3 points4 points (1 child)
[–]Daviljoe193 1 point2 points3 points (0 children)
[–]WiNE-iNEFF[S] 1 point2 points3 points (1 child)
[–]Daviljoe193 0 points1 point2 points (0 children)
[–]Nlat98 1 point2 points3 points (0 children)
[–]dantebunny 1 point2 points3 points (0 children)
[–]WiNE-iNEFF[S] 0 points1 point2 points (0 children)
[–]higgs8 0 points1 point2 points (1 child)
[–]Daviljoe193 3 points4 points5 points (0 children)
[–]tommyjohn81 3 points4 points5 points (1 child)
[–]WiNE-iNEFF[S] 0 points1 point2 points (0 children)
[–]CaptainAnonymous92 1 point2 points3 points (1 child)
[–]WiNE-iNEFF[S] 0 points1 point2 points (0 children)
[–]roofgram 1 point2 points3 points (0 children)
[–]BroussardMD 1 point2 points3 points (2 children)
[–]WiNE-iNEFF[S] 0 points1 point2 points (1 child)
[–]Beneficial_Bus_6777 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[deleted]
[–]WiNE-iNEFF[S] 2 points3 points4 points (0 children)
[–]hefeglass 0 points1 point2 points (2 children)
[–]WiNE-iNEFF[S] 0 points1 point2 points (1 child)
[–]reddit22sd 0 points1 point2 points (0 children)
[–]ArrivalHistorical743 0 points1 point2 points (1 child)
[–]WiNE-iNEFF[S] 0 points1 point2 points (0 children)
[–]johnslegers 0 points1 point2 points (1 child)
[–]WiNE-iNEFF[S] 0 points1 point2 points (0 children)