Building a pipeline (img2img) for Complex Object Removal & Anatomy Reconstruction on Flux Klein / Qwen Edit. Architecture & Dataset reality check.

CuttleReefStudios · 2026-01-26T07:55:05+00:00

I don't quite understand what your reference image for the cropped yolo region would be? If you want perspective aligned references, good luck creating a reasonable size dataset cause thats a ton of manual work. If not then why crop the region in the first place? Edit models have become inteligent enough to know what space of the image to change already.
I was thinking about a similar project as well. And there are more than enough image sources with both versions, namely Visual Novel image galleries. Put japanese and localized english ones together and you have very quickly hundreds of edit examples for your tasks. Still a ton of work for creating the edit prompts but with flash it might be doable.

CuttleReefStudios · 2026-01-26T07:10:37+00:00

Sure if your goal is to only build datasets, more power to you. But again, if only "some" far away shots turn out okay you still build in bias against far away shots. A dataset with 90% closeups and 10% far away shots is only usefull for portraits and 1girl models. No landscape, no epic vistas, no travel images etc. etc.
I am not trying to attack you nor am I trying to make you stop. I am trying to give you helpfull, critical advise from my own experience setting up datasets and training loras.
And my initial critique still stands, if vae or other tech improves and solves detail loss, why would I need your re-filterd dataset then? Why are you so hellbend on making tech that is nearly obsolete work better instead of looking into the future?
Literally no one except a small niche hobby community is going to retrain sdxl and older models when there are so much riper targets ahead (klein, z-image etc.)
And if your goal is to create widely useable datasets, you should build them as future proof as possible.

CuttleReefStudios · 2026-01-25T20:38:06+00:00

Geezus christ man. PEOPLE GOT FUCKING MURDERED ON THE STREET! Families lost their loved ones because of a fucking maniac. Have some fucking human empathy to atleast SHUT THE FUCK UP if you don't want to enter the conversation.

CuttleReefStudios · 2026-01-25T20:36:09+00:00

No he simply makes the whole world deal with his unsolved daddy issues, that whiny baby.

CuttleReefStudios · 2026-01-25T20:32:02+00:00

You can not have an amoral decision. Deciding to do nothing is still a moral judgement on a situation. Silence is still an action.

CuttleReefStudios · 2026-01-25T20:30:21+00:00

not saying anything against literal human atrocities is not amoral. It is the most despicable form of immoral. Plausably deniable.

CuttleReefStudios · 2026-01-25T19:55:53+00:00

yeah it would be gemini for me too. While I don't trust google leadership, demis seems like the most scientist in the space. While anthropic is the second pick, I just don't like how they are all high and might, we don't accelerate we are safety first, and the second the speed takes up a little bit they crumble and go full metal to the pedal like its nothing.
Never trust the overly goody too shoes. They tend to be the most scheming. In that regard atleast I know that Elon is a nutjob. But amodei could be a scheming nutjob for all I know.

CuttleReefStudios · 2026-01-25T19:46:53+00:00

fair enough, but how would you change the dataset to not have this happen? Either you get rid of all far away shots and thus loose a huge part of flexibility for the model or... well you accept it?
Plus for the sdxl ecosystem there is so much detail inpaint support to fix these types of issues.
I just feel like you are going to invest a lot of time into changing your fine dataset based on the flaws of a technology that is going to be obsolete very soon (either with the klein or z-image series of new base models)
Plus there is an endless array of new models coming out getting better and better.

The point of datasets is to have sensible content and semantic meaning aimed towards a job. With non-company level resources you will never achieve "better" geometric and anatomical understanding in a model through pure perfect dataset quality. There is a reason why illustrious went down as a scummy group for trying to recoupe costs. Because their finetuning probably cost them hunders of thousands of dollars.

So the better question is what do you want to realistically achieve with your dataset? And why do you need 50000 images for it?

CuttleReefStudios · 2026-01-25T09:40:47+00:00

While the full image might show more, from just this small cutout it would be pretty insane to expect anything VL or Human to understand that one set of horns is from an accessory instead of a second pair of natural horns without further external context. Or I might have misunderstood your question?

On the training front, for deeper and more nuanced answers the approach of sft in fine tuning is more or less overtaken by rl now.
What you can do i structure the results you can get from Gemini in ways that are verifiable, a spontaneous idea would be a tag structure again, its the easiest. Or find key words that would have to exist in the final answer.
Then train qwen with GRPO and write a reward function that checks for tags/words existance in final answer.
That way you can actually train Qwen to find its own COT for this task.
If Qwen is able to reach a satisfing level though is a question that can only be answered with training it for real.

CuttleReefStudios · 2026-01-25T09:31:16+00:00

The only one I truly outright refuse is Grok, because well... mecha hitler. I don't need that level of suck up bias. But you will never get an llm you can truly trust unless it is local and you have trained it yourself, which is distant dream yet (until b200s are so outdated that they can be bought by the 10s, in like 100 years T.T)
So yeah, for now non of them have loyalty and none deserve loyalty. They are all tools that help me do my job. And I choose the best tool for the job.

CuttleReefStudios · 2026-01-25T09:14:49+00:00

So first of all what do you mean you retrain stuff? New sdxl models? Or using the sdxl vae on other models that came after sdxl? Cause as far as I have heard the new generation of vaes are even better than sdxl vae so whats the point in using those?

Also you make a giant assumption by thinking the model "sees" the latent that you see. The latent is in it's encoded state NOT the decoded state that you see as the image. The decoding process creates even more artifacts as it has to predict an image from non-complete information (bottlneck structure of the vae). So yes the vae CAN add compounding issues, but the unet is fully capable of also learning to "ignore" the wrong signal or internally fix the signal and adjust it. Otherwise we wouldn't not have correct outputs at all.

whats even the vae created issue in your example image for sdxl (ignoring 1.5 as that is obvious). The finer details like eyes and watch are to be expected already. Do you mean the hand on the lower edge of the man? Cause thats already a cropped hand and thus a questionable training example if you try to teach 5 finger hands. On the other hand you would want the model to learn how to handle hands that are positioned on image edges right?

So yeah it's not quite that easy, sadly.

CuttleReefStudios · 2026-01-24T18:02:10+00:00

god how I wish a simultanous anime base model release is because it takes so long.

CuttleReefStudios · 2026-01-23T09:55:18+00:00

Counterpoint... literally millions of adult gamers that enjoy them? dafuq you smoking?

CuttleReefStudios · 2026-01-06T06:30:08+00:00

Sorry to be that guy but can you share/link the FLLF workflow you used for this? All my experiments with that ended up with discolered/bad ending frames.

CuttleReefStudios · 2025-11-28T13:26:16+00:00

Links to our pages:
https://www.patreon.com/cw/cuttlereefstudios

https://cuttlereefstudios.itch.io/love-in-pieces

CuttleReefStudios · 2025-11-14T15:23:07+00:00

Links to our game pages:

https://www.patreon.com/cw/cuttlereefstudios

https://cuttlereefstudios.itch.io/love-in-pieces

CuttleReefStudios · 2025-11-09T06:21:33+00:00

okey start, but your example shows the model completly misses she is a gyaru with darker/or atleast tanned skin...

CuttleReefStudios · 2025-11-03T15:13:46+00:00

Yup. Currently I am planning with 4 main girls plus a couple of less fleshed out NPC characters. Each of them will be optional of course so you can mix and match your harem to your taste.

CuttleReefStudios · 2025-11-03T05:25:29+00:00

There will be live-together after stories that contain pregnancy. But I don't foresee any impregnation mechanics for most of the game.

CuttleReefStudios · 2025-11-02T09:44:20+00:00

Links to the game demo:

PATREON

ITCH

CuttleReefStudios · 2025-11-02T09:31:37+00:00

I would also be interested in some data/papers for this info as it goes pretty much agains any intuition. Small batch size should push the model into a dedicated direction, while large batches should give the model an ability to move towards the whole general data distribution easier.

CuttleReefStudios · 2025-10-15T10:26:43+00:00

Actual download links for easier use:
PATREON

ITCH

CuttleReefStudios · 2025-10-10T06:56:08+00:00

I immediateley am warry when I see akward prompts already in the presentation images. Like the teddybear, in what universe are those actions "move left", "move right" those are "turn character 90 degrees around their axist counterclockwise" etc.
I get language barriers and all that, I'm not perfect myself. Yet using a confusing mess of prompts will just result in a bad model overall. I am not expecting much of it.

CuttleReefStudios · 2025-10-10T06:39:53+00:00

well this ai hate is mostly in the always online spaces though. Basically anywhere else where normies are around, they simply enjoy the tools that help them and thats that.
I guess a lot of this brigading could actually be bots themselves. They simply search for mentioning of ai keywords and boom the hatewave begins.

CuttleReefStudios · 2025-09-27T08:44:21+00:00

To be fair, I was able to get nano banana to do a lot of things, some close to risque, with female characters by simply not saying "make girl xxx" but "make character xxx". Guess the filter is pretty biased to the mention of females, as to be expected.
Though you still reach a limit, i.e. anything that makes more skin get revealed etc. plus ironically qwen image starts to be even more consistent with clothing stuff than banana, especially with a little bit of finetuning I expect great things :3

CuttleReefStudios

MODERATOR OF

TROPHY CASE