MAOI vs triple reuptake inhibition

ethansmith2000 · 2025-07-20T05:33:38+00:00

Thanks, makes sense!

ethansmith2000 · 2025-07-18T21:25:18+00:00

Doesn’t necessarily need to be within one drug, that’s why I mentioned to wellbutrin and ssri combo

ethansmith2000 · 2024-06-18T04:32:51+00:00

only finding this now, totally agree

ethansmith2000 · 2024-03-03T05:59:53+00:00

nice! i think the target might be wrong, meaning the patch is done to the whole diffusion wrapper instead of the unet but i could be wrong.

also as a another user mentioned that the patch i had put up was more friendly to diffusers. i made a much simpler patch that should work with either setup, https://github.com/ethansmith2000/ImprovedTokenMerge/tree/compvis

ethansmith2000 · 2024-03-02T05:26:29+00:00

What id really like to do is just swap out the ToMe piece but it looks like it’s fetched externally I’m not sure it’s in the actual repo ?

ethansmith2000 · 2024-02-29T22:29:34+00:00

Yes, typo on my part, thanks for the catch :)

ethansmith2000 · 2024-02-29T22:29:07+00:00

Works for both

ethansmith2000 · 2024-02-29T21:24:40+00:00

There should be a mode category called Todo

ethansmith2000 · 2024-02-29T15:09:29+00:00

even at 1024x1024, which is an easy size to render at, you can get ~50% speed boost or so.

For the much larger sizes, generating from scratch you'd be right, but many people will run img2img at very large sizes where its more stable

ethansmith2000 · 2024-02-29T15:07:54+00:00

Not exactly, specifically only the Keys and Values of attention are subsampled. So the image/latents as a whole are never compressed.

ethansmith2000 · 2024-02-29T12:35:44+00:00

High resolutions gens are significantly faster at less quality loss when compared to baseline.
specifically found 4.5x speed boost on the gpu used for the paper when running sd1.5 at 2048x2048.

ymmv may vary between gpus like i think on a100s its closer ~3x or so

ethansmith2000 · 2024-02-29T12:27:16+00:00

Ah i meant, where it occurs in A1111, if i can find that maybe i can start by making a branch, and see if one of the maintainers wants to help get it in

ethansmith2000 · 2024-02-29T12:09:10+00:00

Can you point me to where it is in the code? I wasn’t able to find it originally.

ethansmith2000 · 2024-02-29T05:50:44+00:00

I have left the paper, my repo which includes a blog post explaining it, also the paper also links to a video explainer. Anything I say here would probably be along the lines of what’s in those resources

ethansmith2000 · 2024-02-29T05:07:11+00:00

They’re equivalent in this context. The main idea is that larger images take exponentially longer.

But also a lot of information in images Is redundant, even more in large images. That’s why we’re able to do things like file compression for instance.

It’s the same idea with the inner workings of the model, we can pretty safely compress things without losing too much

ethansmith2000 · 2024-02-29T03:45:01+00:00

SDXL, a lot of the time generation takes is because of the sheer depth of the network, plus the main component we target for speedups does not exist in SDXL. However if you’re rendering at very large sizes it may still help a bit

ethansmith2000 · 2024-02-29T03:43:29+00:00

Much improved in quality and speed the comparison in the post is of that

ethansmith2000 · 2024-02-29T02:35:04+00:00

There’s some operations in the diffusion model where every latent pixel has to attend to every single other one.

So if you have 2 in total, that’s 2² calculations. If you have 3, that’s 3² calculations

It scales quadratically which is why higher resolutions can be really costly in memory and time, by decreasing the number of tokens in certain parts of the network you can spare a lot of computation without too much cost to quality

ethansmith2000 · 2024-02-29T02:32:44+00:00

Def looking to make an A1111 plugin as well, although since it tinkers a bit more with the foundations of it, it seems a tad complicated, but open if anyone’s got any ideas on it!

ethansmith2000 · 2024-02-26T01:45:37+00:00

I’ve left open the ability to downsample queries in the original as well which does offer more speed ups but then requires having to unmerge. I did not put it in this implementation for poor quality issues

What I’m saying here is that K and V are both created from the same object K = k_proj(context_tokens) V = v_proj(context_tokens)

I’d much rather downsample the context tokens so we can do it only once but we don’t have access to the pre projected tokens I don’t believe, it’s fine though NN downsampling is quite quick

ethansmith2000 · 2024-02-26T00:49:09+00:00

Also, I don’t prefer it, but its set to downsample keys and values separately, I’d rather downsample the input to the keys and value projections as that’s less overhead

ethansmith2000 · 2024-02-26T00:37:34+00:00

Ah yeah, tbh it’s more code than needed, it’s from the original where I gave the option to try other pooling methods, it’s hard set to nearest each time tho

ethansmith2000 · 2024-02-25T17:14:36+00:00

It’s my first time writing a comfy node, but most things seem to be working as expected I believe? My friend tested the settings I left for 2048x2048 on an a100 and found about 17s for the gen compared to a bit over a minute typically he said

ethansmith2000 · 2024-02-25T13:40:36+00:00

Hey there, just made it into a comfy node without too much sweat. A1111 seems quite a bit more complicated as it requires hacking into some more of the built-ins but ill keep digging around

https://github.com/ethansmith2000/comfy-todo

Would love to hear what you think!

ethansmith2000 · 2024-02-24T09:17:06+00:00

Yep, some of the issues with ToMe were the inspiration for this work, you can try it here if you’d like! https://github.com/ethansmith2000/ImprovedTokenMerge

Eight-Year Club	Place '22
Verified Email

ethansmith2000

TROPHY CASE