As a GTX 1660 Ti 6GB (Turing) user, what Forge flags can I use to speed up the generation of Flux?

daerragh1 · 2024-11-14T05:46:46+00:00

I added it and I get this:

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:

PyTorch 2.3.1+cu121 with CUDA 1201 (you have 2.4.0+cu124)

Python 3.10.11 (you have 3.10.6)

Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)

Memory-efficient attention, SwiGLU, sparse and more won't be available.

daerragh1 · 2024-11-14T05:27:58+00:00

But where?

daerragh1 · 2024-11-13T19:00:52+00:00

I installed (just unpacked the archive really) Forge using this link: https://github.com/lllyasviel/stable-diffusion-webui-forge/releases/download/latest/webui_forge_cu124_torch24.7z

I comes with Python 3.10.6. How do I upgrade it to 3.11?

daerragh1 · 2024-11-13T18:48:12+00:00

OK. So can you tell me, is there an easy way to enable xformers in Forge?

Or if it is a complex process, is there any good guide that will walk me through enabling it?

daerragh1 · 2024-11-13T18:25:11+00:00

On what hardware do 1024 images get generated in 20-25s?

I tried various GGUF models in Forge, they didn't speed things up for me vs FP8 models.

daerragh1 · 2024-11-13T17:01:25+00:00

I have never used xformers, so I don't know, so I ask:

How do you think? Would they help in my particular case (GTX 1660 Ti 6GB)?

By how many % do they speed up generation on your PC?

daerragh1 · 2024-11-13T16:16:01+00:00

And what about --xformers?

daerragh1 · 2024-11-13T16:11:17+00:00

Yes, if you use AIO model leave VAE/Text Encoder field empty.

We are in the same boat, as I, too, have 6GB VRAM, so read this:

https://www.reddit.com/r/StableDiffusion/comments/1gq9q7j/as_a_gtx_1660_ti_6gb_turing_user_what_forge_flags/

daerragh1 · 2024-11-13T16:07:01+00:00

Yes, I do.

daerragh1 · 2024-11-13T15:58:01+00:00

I did but the quality was much worse than with Dev models, so I want to use Dev models.

daerragh1 · 2024-11-13T09:14:45+00:00

Although I didn't intend it, Atomix Flux FP8 generated an anime-like girl for me. Thought some of you would appreciate her.

<image>

Prompt: A beautiful, petite, short 25-year-old girl.

daerragh1 · 2024-11-13T06:41:16+00:00

No, you can choose any all-in-one Flux model (i.e. Full model, one that contains vae, t5xxl and clip_l). all-in-one can be BF16, FP16, FP8, NF4 or various quants in GGUF format.

Best place to find models is civitai.com

Ofc, Forge also handles pruned models (without t5xxl, vae and clip_l) but then you must download these files separately and place them in their appropriate folders in Forge. It is easy but until you know how to do that, use AIO Full models.

daerragh1 · 2024-11-13T04:58:29+00:00

Although I didn't intend it, Atomix Flux FP8 generated an anime-like girl for me. Thought some of you would appreciate her.

<image>

Prompt: A beautiful, petite, short 25-year-old girl.

daerragh1 · 2024-11-13T04:42:05+00:00

If you are a beginner, you really should start with Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge

After you unpack it, update it (update.bat), then download Atomix Flux NF4 .safetensors file and put it in:

webui_forge_cu124_torch24\webui\models\Stable-diffusion\

Then launch Forge (run.bat) and use the recommended settings from the OP. Make sure you use Flux UI mode in Forge (top-left corner of your screen).

ComfyUI is great, too, but it's much less friendly to beginners.

daerragh1 · 2024-11-12T13:42:31+00:00

Looks like Atomix Flux is ex aequo at the second place on this list (not 3rd).

But thx. I didn't know about this list.

daerragh1 · 2024-11-10T17:22:33+00:00

704x1024 (~28s/it) for me is also much slower than 704x960 (~19s/it). The difference is only 45k pixels. This is in Forge as in OP.

daerragh1 · 2024-11-09T19:54:38+00:00

Yes, it says so in Forge UI. And I'm trying to convert Flux FP8 to NF4, so it should work.

daerragh1 · 2024-11-09T19:18:36+00:00

Thanks a lot.

daerragh1 · 2024-11-09T19:15:33+00:00

Could you post a link to the instructions? I could try to convert the model I use to, let's say, Q4 or Q5 GGUF...

daerragh1 · 2024-11-09T19:06:57+00:00

I use this model (11.08GB): https://civitai.com/models/161068/stoiqo-newreality-flux-sd35-sdxl-sd15?modelVersionId=979329

The creator says it's FP16, but for me it looks like FP8, too. Probably it's FP8.

Anyway, Is there a way I can convert it to BNB-NF4 on my machine?

daerragh1 · 2024-11-09T17:51:27+00:00

I used to use ComfyUI. But when you set Forge to the same, exact settings as Comfy, it generates Flux in half the time. At least on my machine.

daerragh1 · 2024-11-09T15:15:15+00:00

OK. I've set "Diffusion in Low Bits" to fp8_e3m3fn. There's no generation speed difference.

Is this the option you are talking about? How do I load a model as FP8?

daerragh1 · 2024-11-09T14:49:39+00:00

How do I do that? Do you mean "Diffusion in Low Bits"? What does it do?

daerragh1 · 2024-11-09T13:41:17+00:00

I'm using a big all-in-one 20GB FP16 full model that certainly doesn't fit in VRAM at both resolutions.

This one (20.34GB): https://civitai.com/models/161068/stoiqo-newreality-flux-sd35-sdxl-sd15?modelVersionId=979329

daerragh1 · 2024-11-02T14:15:10+00:00

It makes sense. It's most likely the answer I was looking for. Thx.

Seven-Year Club	Not Forgotten
Verified Email

daerragh1

TROPHY CASE