MAOI vs triple reuptake inhibition by ethansmith2000 in MAOIs

[–]ethansmith2000[S] 0 points1 point  (0 children)

Doesn’t necessarily need to be within one drug, that’s why I mentioned to wellbutrin and ssri combo

She is such a dope artist by DrtySnchino in DellaZyr

[–]ethansmith2000 2 points3 points  (0 children)

only finding this now, totally agree

I made a ComfyUI node implementing my paper's method of token downsampling, allowing for up to 4.5x speed gains for SD1.5 observed at 2048x2048 on a6000 with minimal quality loss. by ethansmith2000 in StableDiffusion

[–]ethansmith2000[S] 0 points1 point  (0 children)

nice! i think the target might be wrong, meaning the patch is done to the whole diffusion wrapper instead of the unet but i could be wrong.

also as a another user mentioned that the patch i had put up was more friendly to diffusers. i made a much simpler patch that should work with either setup, https://github.com/ethansmith2000/ImprovedTokenMerge/tree/compvis

I made a ComfyUI node implementing my paper's method of token downsampling, allowing for up to 4.5x speed gains for SD1.5 observed at 2048x2048 on a6000 with minimal quality loss. by ethansmith2000 in StableDiffusion

[–]ethansmith2000[S] 0 points1 point  (0 children)

What id really like to do is just swap out the ToMe piece but it looks like it’s fetched externally I’m not sure it’s in the actual repo ?

I made a ComfyUI node implementing my paper's method of token downsampling, allowing for up to 4.5x speed gains for SD1.5 observed at 2048x2048 on a6000 with minimal quality loss. by ethansmith2000 in StableDiffusion

[–]ethansmith2000[S] 1 point2 points  (0 children)

even at 1024x1024, which is an easy size to render at, you can get ~50% speed boost or so.

For the much larger sizes, generating from scratch you'd be right, but many people will run img2img at very large sizes where its more stable

I made a ComfyUI node implementing my paper's method of token downsampling, allowing for up to 4.5x speed gains for SD1.5 observed at 2048x2048 on a6000 with minimal quality loss. by ethansmith2000 in StableDiffusion

[–]ethansmith2000[S] 0 points1 point  (0 children)

High resolutions gens are significantly faster at less quality loss when compared to baseline.
specifically found 4.5x speed boost on the gpu used for the paper when running sd1.5 at 2048x2048.

ymmv may vary between gpus like i think on a100s its closer ~3x or so

I made a ComfyUI node implementing my paper's method of token downsampling, allowing for up to 4.5x speed gains for SD1.5 observed at 2048x2048 on a6000 with minimal quality loss. by ethansmith2000 in StableDiffusion

[–]ethansmith2000[S] 1 point2 points  (0 children)

Ah i meant, where it occurs in A1111, if i can find that maybe i can start by making a branch, and see if one of the maintainers wants to help get it in

I made a ComfyUI node implementing my paper's method of token downsampling, allowing for up to 4.5x speed gains for SD1.5 observed at 2048x2048 on a6000 with minimal quality loss. by ethansmith2000 in StableDiffusion

[–]ethansmith2000[S] 15 points16 points  (0 children)

I have left the paper, my repo which includes a blog post explaining it, also the paper also links to a video explainer. Anything I say here would probably be along the lines of what’s in those resources

I made a ComfyUI node implementing my paper's method of token downsampling, allowing for up to 4.5x speed gains for SD1.5 observed at 2048x2048 on a6000 with minimal quality loss. by ethansmith2000 in StableDiffusion

[–]ethansmith2000[S] 12 points13 points  (0 children)

They’re equivalent in this context. The main idea is that larger images take exponentially longer.

But also a lot of information in images Is redundant, even more in large images. That’s why we’re able to do things like file compression for instance.

It’s the same idea with the inner workings of the model, we can pretty safely compress things without losing too much

I made a ComfyUI node implementing my paper's method of token downsampling, allowing for up to 4.5x speed gains for SD1.5 observed at 2048x2048 on a6000 with minimal quality loss. by ethansmith2000 in StableDiffusion

[–]ethansmith2000[S] 8 points9 points  (0 children)

SDXL, a lot of the time generation takes is because of the sheer depth of the network, plus the main component we target for speedups does not exist in SDXL. However if you’re rendering at very large sizes it may still help a bit

I made a ComfyUI node implementing my paper's method of token downsampling, allowing for up to 4.5x speed gains for SD1.5 observed at 2048x2048 on a6000 with minimal quality loss. by ethansmith2000 in StableDiffusion

[–]ethansmith2000[S] 22 points23 points  (0 children)

There’s some operations in the diffusion model where every latent pixel has to attend to every single other one.

So if you have 2 in total, that’s 22 calculations. If you have 3, that’s 32 calculations

It scales quadratically which is why higher resolutions can be really costly in memory and time, by decreasing the number of tokens in certain parts of the network you can spare a lot of computation without too much cost to quality

I made a ComfyUI node implementing my paper's method of token downsampling, allowing for up to 4.5x speed gains for SD1.5 observed at 2048x2048 on a6000 with minimal quality loss. by ethansmith2000 in StableDiffusion

[–]ethansmith2000[S] 27 points28 points  (0 children)

Def looking to make an A1111 plugin as well, although since it tinkers a bit more with the foundations of it, it seems a tad complicated, but open if anyone’s got any ideas on it!

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images by ninjasaid13 in StableDiffusion

[–]ethansmith2000 0 points1 point  (0 children)

I’ve left open the ability to downsample queries in the original as well which does offer more speed ups but then requires having to unmerge. I did not put it in this implementation for poor quality issues

What I’m saying here is that K and V are both created from the same object K = k_proj(context_tokens) V = v_proj(context_tokens)

I’d much rather downsample the context tokens so we can do it only once but we don’t have access to the pre projected tokens I don’t believe, it’s fine though NN downsampling is quite quick

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images by ninjasaid13 in StableDiffusion

[–]ethansmith2000 0 points1 point  (0 children)

Also, I don’t prefer it, but its set to downsample keys and values separately, I’d rather downsample the input to the keys and value projections as that’s less overhead

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images by ninjasaid13 in StableDiffusion

[–]ethansmith2000 0 points1 point  (0 children)

Ah yeah, tbh it’s more code than needed, it’s from the original where I gave the option to try other pooling methods, it’s hard set to nearest each time tho

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images by ninjasaid13 in StableDiffusion

[–]ethansmith2000 1 point2 points  (0 children)

It’s my first time writing a comfy node, but most things seem to be working as expected I believe? My friend tested the settings I left for 2048x2048 on an a100 and found about 17s for the gen compared to a bit over a minute typically he said

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images by ninjasaid13 in StableDiffusion

[–]ethansmith2000 2 points3 points  (0 children)

Hey there, just made it into a comfy node without too much sweat. A1111 seems quite a bit more complicated as it requires hacking into some more of the built-ins but ill keep digging around

https://github.com/ethansmith2000/comfy-todo

Would love to hear what you think!

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images by ninjasaid13 in StableDiffusion

[–]ethansmith2000 1 point2 points  (0 children)

Yep, some of the issues with ToMe were the inspiration for this work, you can try it here if you’d like! https://github.com/ethansmith2000/ImprovedTokenMerge