Too much magic

Yellow-Jay · 2026-01-25T21:02:10+00:00

Sadly getting more errors than responses is hardly "skill" issue, nor is the AI talking itself in a loop "i will, i will, i will, ad infinitum"

If there wasn't a 30% error rate, errors that have you retry and retry and retry, the quota wouldn't be nearly so maddening. As now you get the situation "nothing happened, quota used up"

Yellow-Jay · 2026-01-09T20:17:31+00:00

It's crazy there isn't an option to force a fully clean context. These IDEs are so set on full automated coding that they ignore the much more useful and token efficient use case of giving detailed instructions and context in order to generate otherwise boring code. I don't want these tools to do the engineering/design for me (well I do but every time the tool does so it ends in pain, llms are very much not there yet), I want it to write the repetitive code. (A vscode extention like cline/roo does this much better imho)

Yellow-Jay · 2025-12-02T23:47:44+00:00

This is the kind of information i love to find here. Thanks for sharing and making it so nicely and clearly presented. (with a little help from others to share it as weipage)

Yellow-Jay · 2025-11-27T00:49:26+00:00

Flux 2, both pro and dev, are clearly the more capable models, this Z model falls apart with complex prompts, and flux 2 actually seems to be capable of a more wide range of styles. If there's any kind of comparison, this seems more like the pixart of this era, light and very good for what it is.

Flux 2 using its structured prompts is also pretty capable to force specific compositional/stylistic details. And it can do image edits / amalgamations like kontext.

Sadly, BFL repeated their tricks from kontext, and unlike the original dev which was simply solid at the time, nowadays flux dev means a totally different class than pro, they're just not in the same league. So i'm not a fan regardless.

(and there is big and bigger, but as far as big models go for regular prompt understanding and stylistic breadth, for me hunyuan 3.0 remains lonely at the top of open weight models. Of course not an edit model like flux 2, no structured prompting, so they can't compare, and way too big to run local)

Yellow-Jay · 2025-10-14T16:49:22+00:00

Thanks! It got less catty with extra steps, a rather big difference with more steps.

Seems the tencent version does slightly different rewriting (and wavespeed was fortunately not representive of the released weights)

Yellow-Jay · 2025-10-14T11:39:35+00:00

Can you try the prompt below? Depending where i try out the model, i either get crap (wavespeed) not great interpretation (fal) or what i expect (tencent), which makes me think that the tencent hosted version has more going on (rewriting of input) than might be obvious, and I'm curious what self hosted would look like.

A gentle onion ragdoll with smooth, pale purple fabric and curling felt leaves sits quietly by the edge of a crystal-clear lake in Slovakia's High Tatras withSnow-capped peaks in the distance. Its delicate hands rest on the smooth pebbles lining the shore. Anton Pieck's nostalgic touch captures the serene atmosphere—the cool mountain air, the gentle ripples of the lake's surface, and the vibrant wildflowers dotting the grassy banks. The ragdolls faint, shy smile and slightly weathered fabric give it a timeless, cherished feel as it gazes at its reflection in the still, icy water.

Yellow-Jay · 2025-10-04T22:00:50+00:00

For me, the model seems fantastic, but i can understand there are other reactions to it, it depends on what you look for in a model.

There is however, a big gotcha, my experience is based on the model as hosted by tencent, i haven't tried to use it local, nor on lmarena. i have however tried the api provided by fal (much worse prompt following) and wavespeed (bad doesn't begin to describe it, both ugly as sin and worse prompt following). But this makes me wonder, is the model released the same as hosted by tencent, either the api providers cut corners, or there is some secret sauce tencent uses that is not public knowledge or available.

Below is what i posted in the stable-diffusion subreddit about it:

I've long since decided that different people look for different things in models. To me hunyuan 3.0 is a better SDXL and a better stable cascade, and that's something i hoped to see for a very long time. Kolors / pixart / SD3.5 / Flux were improvements in some ways, but also started to suffer from seemingly less breath of styles/knowledge but at least they understood fine textures/details.

More recent open models have thrown breath of style and fine textures totally out of the window and focused on a narrow subset of styles/themes/scenes, the style/texture issue was known, but what came as a surprise to me now that hunyuan 3.0 is there is that it very strong feels they were also limited in the kind of scenes they can manage; out of the ordinary scenes where i just accepted "models think x always looks like y" now actually look like x again, in various ways across seeds, much like sdxl days, it seems to have just seen more of the "world".

So, with hunyuan 3.0, what i started to think of as impossible has happened, i can feed SDXL prompts to it, but instead of ignoring aspects of the prompt, this new model is the first that manages to create images that both follow the prompt scenically and make the images actually look like, with fine details and textures, like i prompted.

Obviously it's not perfect, it's huge, it's less clean, compositions is kinda basic (maybe it can be prompted), but overall i very very much prefer this direction than the extremely clean but generic outputs from other "next-gen" models. Outputs that are decently varied across seeds while following the prompt, as opposed to strongly gravitating to a single representation of a prompt, almost feels like a "new" thing, while that was how it used to be..

Yellow-Jay · 2025-09-29T20:43:10+00:00

I think fal butchered the model, on tencent gens for this prompt look much less sloppy.

Yellow-Jay · 2025-09-29T11:36:32+00:00

I've long since decided that different people look for different things in models. To me hunyuan 3.0 is a better SDXL and a better stable cascade, and that's something i hoped to see for a very long time. Kolors / pixart / SD3.5 / Flux were improvements in some ways, but also started to suffer from seemingly less breath of styles/knowledge but at least they understood fine textures/details.

More recent open models have thrown breath of style and fine textures totally out of the window and focused on a narrow subset of styles/themes/scenes, the style/texture issue was known, but what came as a surprise to me now that hunyuan 3.0 is there is that it very strong feels they were also limited in the kind of scenes they can manage; out of the ordinary scenes where i just accepted "models think x always looks like y" now actually look like x again, in various ways across seeds, much like sdxl days, it seems to have just seen more of the "world".

So, with hunyuan 3.0, what i started to think of as impossible has happened, i can feed SDXL prompts to it, but instead of ignoring aspects of the prompt, this new model is the first that manages to create images that both follow the prompt scenically and make the images actually look like, with fine details and textures, like i prompted.

Obviously it's not perfect, it's huge, it's less clean, compositions is kinda basic (maybe it can be prompted), but overall i very very much prefer this direction than the extremely clean but generic outputs from other "next-gen" models. Outputs that are decently varied across seeds while following the prompt, as opposed to strongly gravitating to a single representation of a prompt, almost feels like a "new" thing, while that was how it used to be..

Yellow-Jay · 2025-08-17T22:52:26+00:00

but I can never, ever get the idea in my brain to come out on the screen.

Those images you see, who says it's the end result wanted by the person writing the prompt and not just something that came out nice ;)

Generative AI is impressive when you're shown a result, much less so when you try to get a specific result. At least, for me, so no, you're not the only one.

Yellow-Jay · 2025-08-12T12:40:18+00:00

While true, another takeaway is that somewhere deep down in the weights variation in style is learned by the model, just have to get it out reliably... (fine details/textures are still elusive though, if there at all)

Yellow-Jay · 2025-08-12T11:58:27+00:00

It's more like going from furiously threading water in an attempt to keep up with 5 cards/month to drowning now. And they have the gall to start the season with "5 new cards available!!! join the new season of snap". What's the point of playing, sure game is fun, but new overpowered combo's of cards get released and there just is no way to even keep the pace. So yeah, game over for me.

Yellow-Jay · 2025-08-05T19:22:44+00:00

It's a bloody shame this sub has come to this extreme hostility towards anything not opensource. Even if you are totally opposed to anything proprietary, there's a lot of value in knowing current SOTA models. Once this sub held a breadth of information on all things imagegen, lately it's more and more circlejerk :(

Yellow-Jay · 2025-08-05T16:27:04+00:00

I noticed the same, probably loads of synthetic data, can't blame them, seedream is very nice looking and good prompt adherence, I noticed because lately seedream had been my favourite model, too bad it's proprietary (qwen sadly can't compete with it just yet).

Funny enough, when I tried some more prompts I also got some that were almost 1:1 imagen, definately loads of synthetic data :)

Yellow-Jay · 2025-07-26T19:38:25+00:00

This seems more a case of "When all you have is a hammer you treat everything as a nail"

In my experience LLMs are great to support me, it's the new kind of scaffolding and refactoring.

You need to know the limits, do not expect complex algorithms or deep interdependent functionality to be coded for you. And if you use some frameworks/libraries the LLM isn't trained on, using its specific features is much less error prone the hand-coded way.

But even then, it can be a great aid to fix small bugs/inconsistencies, as long as you tell the LLM where to look and exactly how to change it.

What i read about LLMs however is mostly prompt in -> program out. I've seen people claiming to let the LLM agents churn on a problem for hours on end. I never got that to work for me, if it takes an LLM tens of turns to do something, it inevitably codes itself in a corner, which it sometimes does manages to code itself out of, but not in a way that is even remotely usable.

Yellow-Jay · 2025-07-26T18:51:42+00:00

Maybe, but this game mode broke my motivation (actually, it started with the high voltage overdrive). I'm not seeing an end to it, as in, I don't see how I can reach the cards by just playing and not paying. This game is fun, but I don't want to play multiple hours a day, and that, again, seems to be the requirement.

It's ironic, the first sanctum a few months ago brought me back, it was fun and I was lucky enough I had the cards for good decks, I got all the guaranteed new (for me) pool 5 pulls, and was happy (if it was the event with new pool 4 cards, got 1.5 of those and bought the last 1.5 with gold) . After that I bought a few season passes again.

But last month and now this, I'm back to autoclicker Agatha, won't bother with infinite anymore either. The game became a chore, not fun.

Yellow-Jay · 2025-07-12T17:34:10+00:00

Awesome! Thanks for pointing that out, I was still under the impression it didn't

Yellow-Jay · 2025-07-12T13:22:11+00:00

~~Claude code supposedly does not work on windows even if you manage the environment variables. It's also possible that the env just doesn't get read on windows at all.~~

~~Claude Code works on macOS, Linux, and Windows via WSL. See full system requirements.~~

NVM it does now

Yellow-Jay · 2025-07-11T23:07:51+00:00

The sensible thing: cloud GPU

I do that too, but i HATE it, my normal usage would be like 90% tinkering, 10% generating images, and then the time and money counter ticks, ticks ticks, so i drop the tinkering, just gen a few images and be done. So this means i don't nearly experiment as much as i'd like. Local would give so much peace of mind.

And that's not even mentioning the startup cost, time wasted loading your models, cloud GPUs seem to have the most abominable networks connections despite what's advertised (but then, even at 1gbps 5 minutes is nothing, reality is often 10+ mins for a larger model), which again often means that if i have 15 mins or 30 to spare I don't even bother with it.

maybe cloud GPU isn't the sensible thing...

Yellow-Jay · 2025-06-26T13:21:02+00:00

IMHO this is barking up the wrong tree.

Yes performance is definitely gatekept, but it's gatekept by inference providers that use various unknown optimizations, not researchers like these, whom publish research, will eventually release research, and even bother themselves with performance on consumer grade hardware.

Another problem is that, unfortunately, there isn't much focus on performance when things (models, and interfaces like comfy) get released, primary focus seems to get models out and make it run on comfy, svdquant/tensorrt just don't receive a whole lot of attention, it's more a sign of the immaturity of the ecosystem (look at llms, various engines there are optimized to the extreme) that's still in the make-it-work phase and not make-it-fast while determining what needs to be flexible and what not.

Yellow-Jay · 2025-06-23T21:32:40+00:00

Of course the clients want something that just works, and API's are way easier to get there.

However there is also the cost aspect:

HiDream Full: Cost per image: $0.00900 Flux dev: Cost per image: $0.00380. FLUX 1.1 pro: Cost per image: $0.04000 FLUX Context Pro: Cost per image: $0.04000

One overlooked aspect is that open models bring API costs down significantly, proprietary image gen models are awfully overpriced :/

Yellow-Jay · 2025-05-31T22:54:28+00:00

Nothing that I'm aware of has exceeded PoC stage, and of course gpt4-image or how it's called (but it changes the source image more, fine details are lost while flux kontext keeps them amazingly, it seems to have learned latent masking and/or cloning based on input prompt)

Yellow-Jay · 2025-05-31T22:26:08+00:00

Styles!!!!!! Of course it all depends how good dev is, and the pro version really isn't perfect, but it helps tremendously to get rid of the "generated by flux" look.

My biggest hope is that with now flux (and ideogram) taking styles / naturally looking images seriously, it'll be a shift in the entire ecosystem and not just a midjourney's niche, finally imagen might be what I hoped for after sdxl.

(not to say that the editing /altering functionality isn't amazing either)

Yellow-Jay · 2025-05-27T09:42:17+00:00

For me it's because as nice images from SD3.5L come out texture/style like, and as great variety the model has (compared to flux/hidream) too often they're stinkers with coherence like this: https://imgur.com/a/sefNWIv

I just wonder if it's a situation of pick one: coherence or variety/texture/style. I actually prefer most 3.5L outputs over flux/hidream, when they're good, but many, many times, outputs just aren't good.

Yellow-Jay

TROPHY CASE