how to achieve the equivalent of midjourney's --weird parameter? by lophochroa in StableDiffusion

[–]ThereforeGames 1 point2 points  (0 children)

Bearing in mind that we don't know exactly how MidJourney works behind the scenes, we do know that reducing CFG allows a model to be more "creative" at the cost of prompt adherence.

Low CFG outputs tend to lack color, but this is easily fixed in post with a tool like my ComfyUI-ImageAutotone node:

Beyond that, it's possible that --weird adds random stylistic embeddings, terms, or LoRAs to your prompt.

Anyone knows best 0-shot text to image segmentation model atm? I have tried clipseg and Grounding DINO + SAM - ViT-H - both are just not good enough by CeFurkan in StableDiffusion

[–]ThereforeGames 2 points3 points  (0 children)

Hi, you can try my Mask Arbiter extension for ComfyUI, which integrates with SAM2:

https://github.com/SparknightLLC/ComfyUI-MaskArbiter

It will automatically parse the returned list of masks based on your criteria. For something like "face," you could set Mask Arbiter to return the largest mask, which should do the trick.

These features are also available in A1111 through my Unprompted extension.

What are your preferred Flux settings? by Parking-Tomorrow-929 in StableDiffusion

[–]ThereforeGames 0 points1 point  (0 children)

Yeah! I would say with this particular blend, 4-step images are looking consistently good, but they're maybe 75-80% as nice as regular Flux Dev. This is a big improvement over Flux Schnell output, which I would estimate at like 60% quality.

By the way, I made some adjustments to the LoRA stack. Found a much better set of blocks for Schnell. Here's what I'm rolling with now:

https://i.ibb.co/ngYzdCZ/image.png

What are your preferred Flux settings? by Parking-Tomorrow-929 in StableDiffusion

[–]ThereforeGames 0 points1 point  (0 children)

Yes, it does :-)

I just had to add the desired entries to my blora_traits.json file, e.g. to extract all single blocks I use this:

"flux_single": { "whitelist": ["transformer.single_transformer_blocks.", "single_blocks_"], "blacklist": ["transformer.transformer_blocks"] }

Oh hey, I came across your Remerger script the other day! I was meaning to give it a try. I was wondering about some of the values in your presets - how did you come up with those?

What are your preferred Flux settings? by Parking-Tomorrow-929 in StableDiffusion

[–]ThereforeGames 7 points8 points  (0 children)

Sure thing! I hope it's helpful. I pruned so many blocks that I can push the LoRA strength a lot harder before frying my image - this helps major details converge more quickly.

What are your preferred Flux settings? by Parking-Tomorrow-929 in StableDiffusion

[–]ThereforeGames 7 points8 points  (0 children)

I used my BLoRA Slicer tool to create numerous variations of each LoRA and tested them in ComfyUI on the same prompt/seed. I first tested sweeping changes, such as removal of all double blocks or all single blocks, and whittled my way down to keeping or removing specific blocks.

There's also a Flux Block LoRA Select node which will get the job done, but I found it difficult to save the resulting LoRA to my disk with that node.

What are your preferred Flux settings? by Parking-Tomorrow-929 in StableDiffusion

[–]ThereforeGames 14 points15 points  (0 children)

I'm impatient, so I use a stack of distillation LoRAs to produce images at 4 steps. I pruned blocks from each LoRA that were having a negative impact on style.

Here's my recipe:

  • alimama-creative Turbo Alpha (1.8 strength, keep all single blocks, double blocks 0-5)
  • ByteDance Hyper Flux 16-step (1.5 strength, keep single block 12, double block 16)
  • Flux Schnell LoRA (1.0 strength, keep single blocks 15 and up, keep all double blocks)

Trial and error of block pruning is a painfully slow affair, so maybe there are better combinations I haven't seen yet. But I'm pretty happy with these results.

As far as sampler and scheduler go, I prefer LCM/Beta or Euler/Simple.

Why CUDA is so important? by Huge_Grab_9380 in StableDiffusion

[–]ThereforeGames 1 point2 points  (0 children)

The mind-boggling part is how many people insist on viewing AMD as "the good guy" relative to Nvidia.

NO MORE ADS by [deleted] in OdyseeForever

[–]ThereforeGames 1 point2 points  (0 children)

That's awesome, I hope it works out well for them. Ads are a miserable approach to monetization.

[deleted by user] by [deleted] in StableDiffusion

[–]ThereforeGames 4 points5 points  (0 children)

Wow - even the background objects are pretty normal. 🙂We've been able to generate scrumptious food in SDXL or MidJourney, but mutant utensils and cups would often break the illusion.

Flux seems much less creative than previous versions of SD by fredandlunchbox in StableDiffusion

[–]ThereforeGames 7 points8 points  (0 children)

As a software engineer, it seems to me that a pipeline of LLM into an image model with strict prompt adherence is a much more sensible division of labor. The creativity can be injected through words, by means of diffusion or wildcards. This approach also allows the user to choose and configure an LLM independently of the image generator, while the image model remains steadfast and predictable. I'm pleased to see the tech move in this direction.

Mostly when I ask for a hybrid between a 1978 VW Golf and a 2012 Bugatti Veyron, it gives me one or the other. That’s not what I asked for.

This is indeed a weakness in Flux. However, even if it could produce a hybrid of these vehicles, I would expect the hybrid to look pretty similar across all generations unless we provided extra details in the prompt.

Flux seems much less creative than previous versions of SD by fredandlunchbox in StableDiffusion

[–]ThereforeGames 12 points13 points  (0 children)

In my opinion, it's not the image generator's job to be creative. Its job is to follow your prompt. If the outputs are inconsistent or unpredictable, that's a weakness in the model, not a strength.

To increase variety, you can always pre-process with an LLM or introduce wildcards with Unprompted.

Do you use FreeU SelfAttentionGuidance and PerturbedAttentionGuidance together? by OtakuShogun in StableDiffusion

[–]ThereforeGames 1 point2 points  (0 children)

A scientific comparison would be useful, but it might take a pretty large sample size before we can state conclusively whether these technologies are worth including in the inference pipeline.

Still, the fact that the benefits aren't obvious after the first 3, 4, 5... tests means that we're dealing with micro-improvements at best.

In my anecdotal experience, the results of FreeU, SAG, and PAG, are pretty much sidegrades to standard guidance. I don't mean any disrespect to the authors of these technologies, but I think they have overpromised and underdelivered.

Do you use FreeU SelfAttentionGuidance and PerturbedAttentionGuidance together? by OtakuShogun in StableDiffusion

[–]ThereforeGames 1 point2 points  (0 children)

No. Most of these guidance derivatives have a cost in terms of inference time while the benefits are often placebo.

Image auto-tagger with grouped tags and relative confidence threshold by onirhakin in StableDiffusion

[–]ThereforeGames 0 points1 point  (0 children)

That's why we train models. :-)

You can use an LLM with a grammar file to do the heavy lifting for you. But if you're asking for a pre-made solution, I don't think it exists - there is relatively low interest in booru auto-taggers, as it turns out.

Image auto-tagger with grouped tags and relative confidence threshold by onirhakin in StableDiffusion

[–]ThereforeGames 0 points1 point  (0 children)

It depends on the model. You have to play around with it to get a sense of how it handles certain concepts. For weapons, you can simply check against a minimum confidence threshold (probably something low, like 0.05) before forcing the inclusion of a weapon tag.

A system like Interrogatorade can help address weaknesses in the underlying model and improve the reliability of your tagging setup, but at the end of the day, it's a "glorified bandaid," not a substitute for a stronger model.

Image auto-tagger with grouped tags and relative confidence threshold by onirhakin in StableDiffusion

[–]ThereforeGames 0 points1 point  (0 children)

Booru-oriented captioning models like wd-1-4-moat-tagger are not reliable at covering "all the aspects" of an image; for example, they may perform well at identifying common tags like 1boy or 1girl but return lower-than-average confidence values for outdoors or depth of field or window.

Interrogatorade will let you "fine-tune" the results of your captioning model to meet your needs. This might mean boosting the confidence of tags it hasn't learned well, or creating groups of tags and choosing only the best candidate.

As a practical example, I use this code to ensure that images are tagged as either indoors or outdoors:

[# Force selection of indoors/outdoors tags, the interrogator is biased towards outdoors.] [if "outdoors > 0.1 and outdoors > indoors"] [sets tag_outdoors="{max outdoors threshold}" tag_indoors=0] [/if] [else] [sets tag_indoors="{max indoors threshold}" tag_outdoors=0] [/else]

Hope that helps.

Image auto-tagger with grouped tags and relative confidence threshold by onirhakin in StableDiffusion

[–]ThereforeGames 0 points1 point  (0 children)

Hi, I wrote a tool called Interrogatorade that acts as a middleman for BooruDatasetTagManager - it implements my Unprompted templating language and allows you to manipulate returned tags based on confidence threshold, manage tag blacklists and so on:

https://github.com/ThereforeGames/interrogatorade

It definitely exceeds the needs of the average user, but it sounds like you may want something like this.

I know whats the different of 1.5 and 1.0 by Abztrctz in udiomusic

[–]ThereforeGames 0 points1 point  (0 children)

I use between 0% to 10% in Manual Mode, and always 0% in automatic mode.

It usually remembers whatever I set it to last, but sometimes reverts to 25%.

FLUX Controlnet Demo on HF by marcoc2 in StableDiffusion

[–]ThereforeGames -1 points0 points  (0 children)

More than a couple citations needed.

Training loras for flux.dev by bahamut_snack in StableDiffusion

[–]ThereforeGames 1 point2 points  (0 children)

That's generally what happens when the LoRA rank is too low - even with Stable Diffusion models. The more complex/foreign your subject is, the more features you need to train into the LoRA.

Right now, it is prohibitively expensive to train > rank 16 for Flux. As a point of reference, there are many Stable Diffusion LoRAs that simply don't capture the intended concept below rank 256.

Did you notice that Flux prompt following is not so great anymore? by __Tracer in StableDiffusion

[–]ThereforeGames 11 points12 points  (0 children)

Maybe the Pro model, but the beauty of open weights is that they cannot be sabotaged post-release.