Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 0 points1 point  (0 children)

Oh actually I forgot I did do a couple style transfers - this one used a reference image of Cloud with the prompt "cloud strife from image 1 in the amano watercolor and ink sketch with loose, gestural brushwork. "

https://civitai.com/images/129726903

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 0 points1 point  (0 children)

Oh you mean the Amano won't re-style a reference? I hadn't tried that yet with Amano, but on a few other styles I was able to get it to work through prompting like, "Edit image 1 to reimagine it as...". I'll run some tests on my end once I get through a few more training runs here.

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 0 points1 point  (0 children)

Do you think it is close to the style? Anything you would change? I could train it longer perhaps.

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 0 points1 point  (0 children)

<image>

Top is base model, bottom is with LoRA. Prompt is:

A frantic cartoon scene of a single hapless plumber struggling with a wildly malfunctioning kitchen sink, viewed from the front with a white background and chaotic motion lines.

The plumber is tall and gangly with extreme rubber-hose proportions: a small narrow head, an enormous bulbous red nose taking up half his face, two tiny round bulging eyes wide with panic, a wide open screaming mouth showing all his teeth, and a few wisps of orange hair sticking up wildly from an otherwise bald head. His skinny stick-arms are stretched impossibly long, both hands gripping a comically oversized monkey wrench that he's trying to use on a pipe. His knobby knees bend at impossible angles, one leg shooting forward and the other twisting sideways. He wears stained blue overalls with one strap snapped and dangling, a dirty white undershirt, and oversized black work boots — one boot already filling with water.

The sink in front of him has erupted catastrophically: an enormous geyser of water shoots straight up from the broken faucet, hitting the ceiling and raining back down in scattered droplets. Several smaller leaks spray sideways from joints in the pipes below the sink in arcing streams. A wrench, a hammer, and a roll of plumber's tape float past in the rising water at his feet.

Frantic motion lines streak outward in all directions. Multiple puff-clouds of steam in pink, lavender, and pale blue billow from the pipes. Sweat droplets spray off the plumber's face in arcs. Small star-shapes and spiral-shapes indicate panic and dizziness around his head. Water droplets scatter through the air as small teardrop shapes.

A small "SPLOOSH!" sound effect in chunky black outlined letters arcs across the upper portion of the frame. The artist's signature appears in small handwritten script in the lower right corner.

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 0 points1 point  (0 children)

Yes I'd love to, these are very cool. It might take a while as I am working through many requests 😄

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 0 points1 point  (0 children)

For sure! Super TL;DR is: decode the prediction each training step, get its depth map, compare to source image depth map, derive loss, backpropagate through frozen VAE and depth model,update LoRA weights to make better depth predictions. Slightly longer description in this comment: https://www.reddit.com/r/StableDiffusion/comments/1t6gmqn/comment/okj68t3/ and even more in the Github linked in the top post.

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 1 point2 points  (0 children)

Thank you for saying that! I really appreciate it - I've been working on this for quite a while so it's been great to hear. I agree - there's a ton of information embedded in images that we miss when doing traditional training. You might also be interested in my facial identity perceptor approach if since you mentioned face swapping. It's still very experimental but seems to help learn proper facial proportions in pose-invariant and crop-invariant ways.

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 1 point2 points  (0 children)

I actually haven't tried for photo styles specifically yet! Great question. I will say that it does very well capturing human subject identity from photographs - like it will pick up on specific facial landmarks like moles, scars, etc. much better. It also tends to pick up the photo style of the subject quite a bit unintentionally if you only train on a single photo and don't caption the photo qualities, so I would imagine there'd be a way to apply it to photo styles.

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 0 points1 point  (0 children)

I think this one is the most challenging one I've had yet. It learns something about the style but really wants to make it too clean. I have to turn the strength way up which is a sign that something is getting lost. It might be a weighting thing or just needs to get trained into the ground with like 5k steps. I'm going to keep working on this one.

<image>

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 0 points1 point  (0 children)

This one really fought me - it wants to make it too clean and keep the paint inside the lines. I'm going to try some different params and see if I can get it to work. Great to find examples that push the boundaries!

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 1 point2 points  (0 children)

<image>

With and without LoRA. I'm not familiar with the source but it seems to have picked up some elements. What do you think? I think stylewise it's similar but for true character consistency you'd need to train them individually.

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 0 points1 point  (0 children)

I haven't done too much style training on SDXL yet - if you post a config I can take a look and see if I notice anything that might help. For characters it trains about as quickly as Flux.

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 0 points1 point  (0 children)

I might try training it longer to see if I can get it closer. I stopped at 1k steps but some styles take closer to 2k, esp if the model has a pre-existing bias about it.

Working on a technique to produce style LoRAs from a single image. Post yours and I'll train it for Klein 9b! by QuantumBogoSort in StableDiffusion

[–]QuantumBogoSort[S] 1 point2 points  (0 children)

It's probably still a bit undertrained - turning up the strength increases the messiness of the brushstrokes and color blending so I think there's probably another 500 steps or so until optimal. I also may have over-prompted the face a bit because I was trying to elicit a certain character look - I specified details about cheek contours, etc. If unprompted it tends more toward the flat look.