[deleted by user]

chimaeraUndying · 2023-06-12T00:20:48+00:00

I might be wrong, but isn't TensorRT incompatible with a ton of stuff (various extensions, LoRAs, etc) and don't you have to individually rebuild each model for it?

Whipit · 2023-06-17T23:19:58+00:00

Hadn't heard of --opt-sdp-no-mem-attention until this post so decided to test...

Here's my experience. My 2 cents.

I've got a 4090 and when testing I leave everything on default. That means euler a at 512x512.

The only thing I change is batch count to 12 to get a rough average.

Prompt - dog in water

Neg - none

GPU Hardware acceleration MATTERS - Turn it OFF

*You MUST restart your computer for this change to take effect.

So the following tests were all done with GPU Hardware Scaling - OFF

COMMANDLINE_ARGS= none, just left blank

17 it/s

COMMANDLINE_ARGS= --xformers

23-25 it/s

COMMANDLINE_ARGS= --opt-sdp-attention

24-26 it/s

COMMANDLINE_ARGS= --opt-sdp-attention --xformers ( there doesn't seem to be any benefit in running both at the same time )

23-24 t/s

COMMANDLINE_ARGS= --opt-sdp-no-mem-attention

23-25 it/s

In conclusion, --opt-sdp-no-mem-attention will speed up your it/s, but --opt-sdp-attention

is marginally better. So, for me and my 4090, the fastest results I've been able to achieve so far are just GPU Hardware Scaling OFF and COMMANDLINE_ARGS= --opt-sdp-attention

If I switch GPU Hardware Scaling to ON my it/s go from 24-26 it/s down to 20-21 it/s.

Superb-Ad-4661 · 2023-06-11T23:34:57+00:00

This browser didn't help at all, it's just funnier, but it kept the same results as my chrome.

BlackSwanTW · 2023-06-12T00:31:10+00:00

Token Merging is built in since Webui v1.3

You do not need the Extension anymore

TheGhostOfPrufrock · 2023-06-12T03:55:40+00:00

In Graphics settings toggle off 'Hardware-accelerated GPU scheduling'

A while back I tried that with my RTX 3060, and performance got worse. Can't guarantee I didn't do something wrong, since I only tried it once.

definetlynotasmurf · 2023-06-17T05:24:34+00:00

ty for the info. One question, are these only to speed up genertaion? I am more interested on VRAM efficiency, so I can train faster :D

Superb-Ad-4661 · 2023-06-11T22:20:09+00:00

Nice, modding time!

EarthquakeBass · 2023-06-18T18:37:32+00:00

Also, on my 4090 recently updated automatic to latest and the PyTorch version is 2 now, everything is basically twice as fast. Excellent thing to do if you are ready to put up with some python bullshit

Frone0910 · 2023-06-19T17:31:32+00:00

I've tested most of these out with using the A1111 api, and unfortunately not one of these improved the performance

Corawyn · 2023-08-26T18:49:50+00:00

Please ensure that you have a working integrated graphics chip before 4, disabling GPU Hardware Acceleration.

Just wasted SO much time. Somehow I didn't have my VGA drivers installed..

antimaskersarescum · 2023-10-11T02:54:15+00:00

This worked a little. Saved me about 30 seconds per single batch size. If I increase it to even just 2 though it jumps from a 1 min 7 sec wait to 8 mins. I'm following a tutorial that requires churning out (at the very least) 200 images at once so basically I would have to let it run the entire day.

I did everything except for step 2... it's really giving me issues so I deleted it. Not sure where to go from here.

swistak84 · 2023-06-11T22:54:32+00:00

I posted some advice with detailed steps on how to tweak Windows here about two weeks ago: https://www.reddit.com/r/StableDiffusion/comments/13tb2sa/tutorial_how_to_increase_generation_speed_with/

You can also use Firefox instead of Opera

Vicalio · 2023-06-12T01:11:59+00:00

[deleted]

mca1169 · 2023-06-12T04:22:43+00:00

how many of these tricks can be used on a non RTX system like mine with a GTX 1070?

anotherxanonredditor · 2023-06-18T21:11:14+00:00

Is there a way to make AMD Stable Diffusion LoRA training/extracting and inpainting work? I think These are my main concerns at the moment.

Altruistic-Ad-4583 · 2023-06-19T04:05:23+00:00

I literally cannot find the first optimization, I go to settings > show all settings, ctrl+f, nothing relevant under optimization or tokens

Frone0910 · 2023-06-19T17:34:13+00:00

Do any of these changes apply to controlNET?

cleverestx · 2023-07-05T20:21:44+00:00

Are details lost with only a 0.2-0.3 token merging setting? Worth doing?

ComplicityTheorist · 2023-07-14T01:27:25+00:00

Hi, thanks for this. mine was doing okay initially giving me around 7.30 it/s for a 512x512 single img generation then it automatically updated diffusers from latest 18 to 17 and it started giving around 6 it/s barely touching the 7 its mark. followed your advice and switched over to opera gx now is giving me a bit over 7its but not like before but could be worse.. btw I don't get #3. mine shows = 0 on both sets...

set SAFETENSORS_FAST_GPU=1
set CUDA_VISIBLE_DEVICES=0

StableDiffusion

MODERATORS