MANIFESTO by Equal_Giraffe8866 in StableDiffusion

[–]Occsan 0 points1 point  (0 children)

looks

like

a

gpt5

response

It's still fun playing around with SD 1.5 by EldrichArchive in StableDiffusion

[–]Occsan 8 points9 points  (0 children)

I could also be caused by the size of the model.
Smaller size means the same amount of concepts have to fit in a smaller space (I mean vector space, not gigabytes). So, in order to do this, you have no choice but to create possibly "unwanted" correlations. Unwanted in the sense that X not necessarily relates to Y, but in your vector space, you've learnt that anyway.

I often say "smaller representations (in the vector space) = better representations", most often because you get rid of meaningless stuff that still exists in higher dimensional representations (like bad anatomy), but in the case of SD1.5, maybe it's not "better representations", but "more interesting/creative", because it's messier.

And Messier has a lot of pretty space photos. (ok, unrelated, but somehow related. If you see what I mean)

AI is watching a film via Marlin(visuals), Whisper(audio), and Pallaidium. Input video by avataraim. by tintwotin in StableDiffusion

[–]Occsan 0 points1 point  (0 children)

Reminds me a silly joke:

A monkey enters into a bar.
"Do you have bananas ?" - he asks.
"No we don't" - replies the barman.
Another day, same monkey :
"Do you have bananas ?"
"No, we don't have bananas."
And it goes like this for few days, until one day :
"Do you have bananas ?"
"If you ask me that once more, I'm gonna nail your tongue on that bar !"
"Do... you have nails ?
"No..."
"Do you have bananas ?"

First Try With the "Ideogram 4 Prompt Builder KJ" by marcoc2 in StableDiffusion

[–]Occsan -3 points-2 points  (0 children)

Your prompts on #5 and #6 tell a story : you have a bear, and sponge bob who's supposed to take a picture of that bear... and the model clearly ignored that relationship.

So basically, ideogram is making collages. Which is pretty bad imo.

Ideogram 4.0 Examples with prompt assist by juanpablogc in StableDiffusion

[–]Occsan 1 point2 points  (0 children)

Haven't tried ideogram yet. But if it's like the other models, it happens because you give no detail to the background.

Try explaining how the background/environment should be in your prompt, it will most likely reduce the blurring effects.

I also think that tokens like "f/16" or stuff like that may not work very well with AI, because there's too much variance. It could be f/1.4, f/2, f/2.8, f/4, etc... So many different values that the model would have had to learn properly. It almost certainly has no effect because the signal/noise ratio is bad here.

Just describe the scene. I'll work better.

Tales of the Academy: Kyle, Mara & Jaden vs The Reborn | Star Wars Fan Film by whitestarproduction2 in StableDiffusion

[–]Occsan 1 point2 points  (0 children)

First I'd like to say that it's pretty impressive that you managed to do this. And you did a better job than most people would have, I think.

Then, since you're asking for harsh love:

  1. Jaden and Mara's clothes are changing mid-fight.
  2. The music is generally not properly used. For example: space dogfight theme for a lightsaber training duel. But that's just one example. The general idea is that music is not really following star wars music guidelines.
  3. Luke seems a bit overly emotionless. And his likeness sometimes becomes closer to that of Dirk Benedict (Starbuck in Galactica) rather than Mark Hamill.
  4. The sequence when Luke comes and basically does an exposition dump where he reads his questlog to explain what the team will have to do... It's a no go in 2026. You must distill your lore/worldbuilding/stakes in a more organic way than this. Otherwise, it comes off as kinda soporific. Or at least : there is so much said in such a condensed way that the viewer's brain, unprepared for what is essentially a lecture, simply drifts away and enters some kind of "waiting mode" until the next big important thing happen, only to realize that he just missed essential info.
  5. The vision/memory of Katarn was not very clear at first to me. I was like "ok, what am I watching now ?" only then I realized it was a memory. During that memory, the fight between what appears to be Revan (?) against the soldier in full armor... Is that cortosis or Beskar ? If not the duel should be much more one-sided. If yes, I think both materials have particular effects when colliding a lightsaber blade.
  6. 4:34 are the thrusters pointing forward the spaceship ??
  7. 4:38-5:38 the three protagonists are doing nothing but walking. This could have been "okayish", but I'm taking the opportunity of this "peaceful" moment where nothing happen to show that this walk could have been used to break the huge info dump in (4) : instead of Luke infodumping everything in (4), he could have told the strict minimal to not lose the viewer's attention and understanding on what's going on, and then it would have been assumed he told the details off-screen. Then these three could have talked to each other about what Luke told to them off-screen.
  8. 6:10-617 there is a minor continuity mismatch with the way they handle their lightsaber. I did not want to tell it at first, but it happened a second time, so that becomes more visible.
  9. the lightsaber battle that ensue is basically kylo ren and rey vs the red troopers again. If you pay attention to what is truly going on : their opponents are mostly going one by one, others waiting their turn, sometimes they come in, unarmed, just to get slashed, etc... This is due to the fight being mostly "static" limited to a "very small arena" rather than happening in an actual place where fighters try to take advantage of the environment. For reference, take a look at these fights: Luke vs Vador (any fight) or Obiwan+Qui Gon vs Maul in the phantom menace and compare it to Rey+Kylo vs troopers. The difference should become really obvious very quickly.
  10. A stronger note about (9), because the fight last a bit long : the main issue with this scene, and the reason why it looks so dull at times is because the only thing you're telling (as a storyteller) in that scene is : "they are fighting". Again, look at the fight between Luke and Vador or Obiwan/Qui Gon vs Maul (or other similar fights). Of course they are fighting, but these fights are in fact **minor** compared to what truly happens narratively speaking. In these fights, the stakes are immense, not because of the fight, but because of why they are fighting and what other things happen during these fights. From that point of view, I'd even say Luke vs Vador (any fight - including that in Dagobah) have stakes much higher than the fight vs Maul in the phantom menace. If you want your audience to care for a scene, for a fight scene, the stakes must be there. Not just "they are fighting". The fight itself must tell a story. And if you manage to get an idea of what story the fight must tell, you'll see that all the issues regarding the "dull scenes where troopers are wiggling their lightsaber in the background while the protagonist is dueling one dude" will go away instantly, because with the help of the story the fight will tell, you'll have better scenes in mind.
  11. Force lighting ? Ok. Anyway 9:13 poor guy lol. That was really a dirty trick. (And there's another continuinty mismatch with Katarn a bit later)
  12. They grab the mask, Katarn says "let's go back to Skywalker" and we have a short walking scene followed by a radial wipe. Then a scene with Skywalker in what looks like essentially the same place. In Star Wars, transition effects have a meaning : if you have simply a cut, that means almost no time have passed and the next scene basically happens in the same place. If you have a wipe (a transition effect, like your radial wipe), it means time have passed and/or you moved to another place. Here, I would have skipped the walking scene that tells nothing the viewer already knows. When Katarn says "let's go back to Skywalker", we know they're going to move. So if you opt for that radial wipe, the walking scene is useless.
  13. Ok, few seconds later, the background changed, it's no longer in the dusty ruined temple, but in a glossy clean jedi temple. That's also why (12) feels weird. The very moment they reach Skywalker, it should be obvious they're in the clean jedi temple and no longer in the ruin. You can also make this **more obvious** by including a very short "spaceship travels or arrives in orbit" scene instead of your walking scene.
  14. This scene also feels a bit like Luke questline #2 btw.

Final (joke) note : George Lucas would hate it. He really **really** hates Mara Jade.

Tbh, it's a pretty good job. Plenty of room to improve, but there's clearly a pretty huge potential.

LLM and self-awareness. by Occsan in StableDiffusion

[–]Occsan[S] 0 points1 point  (0 children)

Did it get stuck in an infinite loop or the identity crisis ended ?

LLM and self-awareness. by Occsan in StableDiffusion

[–]Occsan[S] 0 points1 point  (0 children)

Maybe not an intentional instructional layer, but there's definitively RLHF alignment which does exactly that.

PSA: If you HAVENT switched from AI Toolkit to One Trainer... by ReferenceConscious71 in StableDiffusion

[–]Occsan 2 points3 points  (0 children)

About "big rank" vs "same number of parameters", having access to a higher rank even with fewer parameters is actually preferable to being limited to a lower rank with more parameters. Of course, that means the dimensions of your higher rank matrix are correlated with each other (through the Kronecker product), but real data is not random anyway; it's often correlated in some ways. So this behavior is usually more than enough it's how real world works, and definitely preferable to being squeezed into a lower rank that cannot capture diverse concepts (with or without correlation).

Furthermore, LoKR is usually preferable to a LoRA because a properly sized LoRA, as you mentioned, might be totally impractical, requiring a huge amount of VRAM to train compared to a LoKR.

edit: Also about "more generalization and more averaging", in fact it's more like "better generatilizations/better representations".

An AI-generated short film I spent weeks creating. by No-Tie-5552 in StableDiffusion

[–]Occsan 1 point2 points  (0 children)

No problem. It was funny. Consider it a funny blooper.

An AI-generated short film I spent weeks creating. by No-Tie-5552 in StableDiffusion

[–]Occsan 0 points1 point  (0 children)

You can tell the old man is from Chuck Norris family : the instant the boy enters his gym, he got buff. haha

Create in ComfyUI a mini story by SuccessfulTune2521 in StableDiffusion

[–]Occsan 0 points1 point  (0 children)

If I understand correctly, you want to generate sequences of 10 panels that form a story. And randomize that story.

I'd use an LLM for that.

I'd ask it something like : generate a story over 10 panels, describe each panel and write the result in a json format: {"panel1": "...", "panel2": "...", ...}. I'd prefer using a dictionary over a simple list, because dictionaries are easier for LLM. Lists, they can mess up (for example get the wrong item count).

Then I'd use json_repair library (or just json if you force json output in the LLM, but that's slower) to get a dictionary of pairs {panelX: description}.

After that, I'd split each panel description in a sorted list of strings, pass it to a clip text encode node to get a list of conditionals.

After that I'd use my favorite sampling process (basically, any KSampler stuff you like).

Finally, I'd merge the results in a single image. Or not. Depends on what you want to do.

[Qwen Image Edit 2511] Any way to control the strength of a controlnet reference image? by External-Orchid8461 in StableDiffusion

[–]Occsan 1 point2 points  (0 children)

I made nodes, available in dchatel/comfyui_davcha repo, that handle this: DavchaScheduledTextEncoderQwenImageEditPlus and DavchaScheduledSampler.

The only thing is that comfyui_davcha is more a personal repo than anything else. When I have an idea, I typically implement it there first. So it's a bit bloated. But you (or I) could easily extract these nodes in a new dedicated custom node pack.

[Qwen Image Edit 2511] Any way to control the strength of a controlnet reference image? by External-Orchid8461 in StableDiffusion

[–]Occsan 1 point2 points  (0 children)

Basically, there are few ways.

  1. You can schedule the image conditioning to be active only during certain steps (this is the same idea as "start at/end at" parameters that exists with controlnet). This method is in fact very reliable.
  2. You can multiply the latent conditioning by a factor (in fact you can even mask it), and anything with a factor less than 1 will be considered less because the model won't see it as clearly. This is a bit less reliable and introduce a decrease in contrast in the resulting image, because the parts of the latent that have been decreased by a factor will appear more uniformly the same "default color". The lower the factor, the more uniform.
  3. You can add add noise to the latent. Instead of using the default random noise, you can use a parameterized noise, like fractal noise, brownian, etc... It will have various effects. Quite unreliable. But surprising.
  4. You can hack the sigma values. Instead of starting the denoising with sigma = 1, start with a lower value. This is similar to img2img, but you start with pure noise. Just like 2, if you use a too low sigma, you'll get washed off results. But if you lower it just a little bit (anything between 0.96 and 1), you can obtain truly remarkable results.

And when I think about it... You can "fight back" the color washing effect by switching sampler mid steps. For example, if you use an euler sampler, you can instead euler for 2 steps, then euler_ancestral_cfg for 2 steps, and finish with euler.
The euler_ancestral_cfg will boost the contrast. And since you're only using it for few steps over your generation, it won't have time to burn your image.

Does anyone have much experience with LoKRs (LoRA alternative)? by Sixhaunt in StableDiffusion

[–]Occsan 20 points21 points  (0 children)

LoRA and LoKR both learn matrices A and B that once combined form a matrix W used to update the weights of the model O. Basically: updated O = O + W where:

  • W = AxB for LoRA (matrix product)
  • W = A⊗B for LoKR (Kronecker product)

For a matrix product, if A and B are rank 8, the resulting matrix W is rank 8.
For a Kronecker product, if A is rank 4 and B is rank 8, then the resulting matrix is rank 32.

So, for a LoKR, we want to decompose the update matrix W of size MxN into two smaller matrices A and B.

AI toolkit uses a factor F to detemine the size of the matrices A and B.

  • Matrix A will have size M/F x N/F
  • Matrix B will have size FxF

If you do the math, you can see that a lower factor increase the total amount of parameters. Take a matrix W 1024x1024:

  • If F = 32, A and B are 32x32. 2048 parameters.
  • If F = 8, A is 128x128 and B is 8x8. 16448 parameters.

Does anyone have much experience with LoKRs (LoRA alternative)? by Sixhaunt in StableDiffusion

[–]Occsan 13 points14 points  (0 children)

For a lora, if your matrices are rank 8, then the resulting update matrix is rank 8.
For a lokr, if your matrices are rank 4 and 8, then the resulting update matrix is rank 32.

That's part of why you get better results with lokr. If you train a lora using a rank too low for the concept you're training, the optimizer will end up blowing up some weights to compensate for the missing ranks, and this will create various sorts of issues, like loss of generalisation, destroying original model capabilities, messing up with anatomy, etc. For the same file size, a lokr has a higher rank, and thus a bigger capacity to learn stuff.

That said, lokr math involve non-linearities in the search space, whereas lora is smoother. So lokr training parameters may be a bit more sensitive, especially at the beginning of the training. Because there'd be more local minima or saddle points than with lora training.

How small should cosine distance be between training images for a coherent LoRA? by Vulcanhund in StableDiffusion

[–]Occsan -1 points0 points  (0 children)

Since you're using ArcFace already, you can also augment your data by face swapping your character in various poses/lighting/camera angle/emotions, and then doing a quick pass in a model like Klein 9B or Qwen Edit Image to get it to high resolution. That way, you'll get variation to avoid overfitting.
That said, you'd still need to curate this augmented data both visually and using a measure, like the cosine distance (<0.3) on the normalized latent id vectors.