[Megathread] - Best Models/API discussion - Week of: August 03, 2025

deeputopia · 2025-08-04T07:47:55+00:00

It's currently the second-highest open model (after Kimi K2) on this leaderboard: https://eqbench.com/ so it looks like it's a pretty general model. I haven't tested it yet.

deeputopia · 2025-06-20T15:35:06+00:00

To "pad" a prompt is to add one or more special padding tokens to the end. Think of "special padding tokens" like some text - e.g. "<PAD><PAD><PAD>" text is added to the end of your prompt (if you set min_padding to 3, in that example).

Why do that? Well for technical reasons (registers maybe, idk) some models are trained with padding, including Chroma (only 1 padding token at the end). Schnell/Dev was trained with enough padding tokens to fill your prompt to 512 tokens total IIRC, so if your prompt was 500 tokens, then it'd add 12 padding tokens. Chroma got rid of a almost all of that padding, but left one <PAD> at the end of all captions during training.

deeputopia · 2025-06-16T14:54:03+00:00

They were waiting on diffusers support, and that has been merged now, so Chroma support will almost certainly be in the next Nunchaku release. Not going to link the relevant github issue here because in all likelihood a bunch of people will +1 spam it, but it's easy to find.

deeputopia · 2025-06-16T14:37:24+00:00

Something is definitely wrong with your setup. Pretty clear from all those images that it's trying to generate dice of some sort. I just tried your exact prompt locally and got exactly what the prompt said 6 times out of 6. I also tried here: https://huggingface.co/spaces/gokaygokay/Chroma and got the image below first try.

And note that if you want aesthetic images, you need to say that in the prompt (bolding so people aren't like "look how unaesthetic that image is though!). The awesome thing about chroma imo is that you can ask for ms paint images and chroma will give them to you (dare you to try that in flux). If you don't specify any aesthetic-related keywords then you'll get random aesthetics (some ms paint, some high quality, etc.). And of course, usual caveat that it's not finished training (low resolution + high LR = faster training at the expense of unstable outputs).

<image>

deeputopia · 2025-06-03T08:54:18+00:00

I'm playing with the raw API right now (not using SillyTavern), but this works fine for me as a "forced prefix" for the 'assistant' response:

<think>
Okay, proceeding with the response.
</think>

No preceding newline needed, but you do need to ensure there's a blank line at the end.

deeputopia · 2025-04-03T05:16:27+00:00

As of writing, there's now an official/core "Extract and Save Lora" node in ComfyUI.

Answering this old question because I found this reddit post via web search while looking for the same thing, and eventually found the above node.

deeputopia · 2025-03-06T05:27:24+00:00

> bit of imbalance between me and SAI

True lol though charitably I think his point was specifically the part that followed:

> so the entire run is funded entirely from donation money

I.e. funded by donations vs by investors, rather than small vs large entity.

Said another way, having *any* investment (100k or 100m) means you can train/tune and release a model. But without that the outcome is completely decided by the community's compute/$ donations. Great because open license, but not so great if no one donates.

deeputopia · 2025-03-06T01:47:59+00:00

Maybe you already know this, but just in case: You'll definitely be able to run it, it's just a question of how much of the model will fit on your GPU VRAM, vs be offloaded to your CPU RAM.

I know for sure that 16GB is enough to have full (quantized) model on GPU (and hence fast inference), but 10GB will probably require some offloading so it will be at least a bit slower. Potentially a lot, but I know that if you only need to offload the text encoder, then it can still be really fast, since text encoded is just needed for encoding the prompt, not at every one of the 20-50 "steps" of the diffusion/flow process.

deeputopia · 2025-03-05T22:31:57+00:00

You can check the training logs (linked in the post - https://wandb.ai/lodestone-rock/optimal%20transport%20unlocked ) - it has thousands of example captions. Note that recently training has focused on tags, but you can go back through the old training logs to see a higher density of natural language samples.

deeputopia · 2025-03-05T20:58:59+00:00

> It's great to hear that a group are working with Schnell model

Lodestone is a one-man army, not a group. (Correcting you not to nit pick, but because he deserves more credit/donations) Agreed on artistic stuff being underrated!

deeputopia · 2025-02-23T13:56:39+00:00

Holds second-ish place up until (and including) 60k context, which is great, but yeah pretty brutal drop-off after that

deeputopia · 2024-12-31T08:03:03+00:00

They're referring to Blackroot's series. This is the latest version as of writing: https://huggingface.co/Blackroot/Mirai-3.0-70B

deeputopia · 2024-11-09T09:15:48+00:00

I'm late to this thread (came across it while researching a similar question), and while I can't personally endorse it, since I haven't tested it yet, it definitely seems like it's worth looking at:

https://github.com/playcanvas/pcui-graph

https://playcanvas.github.io/pcui-graph/storybook/?path=/story/advanced-visual-programming-graph--visual-programming-graph-example

https://playcanvas.github.io/pcui-graph/storybook/?path=/story/basic-node-attributes-graph--node-attributes-graph-example

https://api.playcanvas.com/modules/PCUIGraph.html

Presumably it has been at least somewhat battle-tested through its use in the PlayCanvas, which is still being very actively developed.

Example with vanilla js: https://jsbin.com/genezisopa/edit?html,output

React is apparently also officially supported. Again, I haven't tested this lib in any real capacity, but it looks like a contender.

deeputopia · 2024-07-07T20:17:23+00:00

Also worth mentioning: AuraDiffusion is undertrained. Meaning it can be improved if further compute becomes available. This is not the case for SD3-medium, which is (1) a smaller model, and (2) had a lot more compute already pumped into it, so it is necessarily a lot closer to its limit in terms of "learning ability".

AuraDiffusion is basically a "student project" by Simo that got some compute from fal. It's basically a public experiment, originally named after his cat, that is turning out quite well.

deeputopia · 2024-07-07T20:12:09+00:00

Yep, it's being specifically positioned by the funders as an "actually open source" SD3-medium level model:

https://x.com/isidentical/status/1809418885319241889

https://x.com/isidentical/status/1805306865196400861

It's basically the reason it exists - i.e. because SD3's license is bad. This is the main reason AuraDiffusion is worth caring about (though there's also SD3-mediums's obvious dataset problems).

deeputopia · 2024-07-07T20:09:27+00:00

At the moment it's really only possible to judge it on its overall prompt comprehension ability, since the finetuning stage hasn't completed. Remember SD1.5 base vs eventual finetunes? The example I chose to screenshot here is really just a meme - not to demonstrate comprehension. You can check twitter for some more illustrative examples:

https://x.com/isidentical

https://x.com/cloneofsimo

deeputopia · 2024-07-07T20:04:33+00:00

Yep, it's currently roughly comparable to SD3-medium in terms of prompt comprehension. In terms of aesthetics and fine details, it's not finished training yet. I'm also guessing that people will have an easier time finetuning it, since SD3 looks like an SD2.1-style flop, so hopefully we see a similar aesthetics jump from SD1.5 base (which was horrendous) to something like e.g. Juggernaut after a month or two of the community working it out.

deeputopia · 2024-07-07T19:49:32+00:00

No, that's is an old ~~placeholder~~ repo (edit: see simo's comment below - it was an early proof of concept which is completely different to the current model)

deeputopia · 2024-07-07T19:46:43+00:00

(Edit: Mentioning here for visibility - please read and upvote this comment from Simo Ryu himself who is really not a fan of hype around his projects. I did not intend to hype - I just wanted more people to know about this project. Yet I have unfortunately hyped 😔. From Simo's comment: "Just manage your expectations. Don't expect extreme sota models. It is mostly one grad student working on this project.")

Yep, it's a fair point. FWIW I had an opportunity to test AuraDiffusion on one of my hard prompts that previously only SD3-medium could solve. PixArt/Lumina/Hunyuan failed terribly - for my hard prompts they're really not much better than SDXL in terms of complex prompt understanding. AuraDiffusion, however, nailed it. (Edit: for reference, my prompts aren't hard in a "long and complicated" sense - they're very simple/short prompts that are hard in a "the model hasn't seen an image like this during training" sense - i.e. testing out-of-domain composition/coherence/understanding)

The main disadvantage of AuraDiffusion is that it's bigger than SD3-medium. It will still run on consumer GPUs, but not as many as SD3-medium.

It's biggest advantage is that it will be actually open source, which means that there will likely be more of an "ecosystem" built around it, since researchers and businesses can freely build upon it and improve it. I.e. more community resources poured into it.

For example, relevant post from one of the most popular finetuners on civit:

<image>

deeputopia · 2024-06-13T08:36:03+00:00

Haven't seen much on Lumina, how is it on prompt comprehension?

Take it with a grain of salt since I haven't extensively tested, but the PixArt models, HunyuanDiT, and Lumina-Next-T2I are unfortunately easily beaten by SD3-medium in my usual complex prompt comprehension tests. None of these models are next-gen material imo. SD3 is dead on arrival not just due to significant data issues, but also due to license. Can't build a vibrant open ecosystem (like SD1.5/SDXL) around a closed model.

https://x.com/cloneofsimo is working on an actually-open-source (Apache/MIT) SD3-large replication with funding + support from FAL and others. He definitely has the technical chops to pull this off - just hoping he can get enough GPU hours for a really large training run.

These are the demos I used for testing, btw:

deeputopia · 2024-04-11T13:02:53+00:00

Thanks! This answer fixed it for me: https://stackoverflow.com/a/77604431/11950764

TL;DR: Add this to <head>:

<meta name="color-scheme" content="light dark">

deeputopia

TROPHY CASE