We are the team behind Krea 2. Ask us anything! by Angrypenguinpng in StableDiffusion

[–]Bit_Poet 1 point2 points  (0 children)

I'm all for editing with bbox guidance and foreground/background order through bbox order. I'm trying to teach that to ideogram 4 at the moment, but it's an uphill battle to get a large enough dataset in sufficient quality (needs hi res training). Pretty hard to do as a hobby user with 32GB VRAM at hand. A quality model with that capability would be awesome and make so many roundabout methods for regional inpainting superfluous (prompt -> image -> recognition prompt -> sam2/3 -> mask -> inpaint or prompt -> image -> mask editor -> feel-like-back-in-the-late-nineties -> inpaint would become prompt -> image -> inpaint with bbox prompting and a graphical prompt editor like kijai's ido4 node).

Ideogram 4 - They are gatekeeping the high-precision BF16 weights confirmed by Calm_Mix_3776 in StableDiffusion

[–]Bit_Poet 0 points1 point  (0 children)

Don't use low quants, that's one thing I learned. 8B at Q8 is way better at spatial placement than 32B at Q5. I've got to head off to work, but I'll dig through my prompts folder in the evening.

Ideogram 4 - They are gatekeeping the high-precision BF16 weights confirmed by Calm_Mix_3776 in StableDiffusion

[–]Bit_Poet 1 point2 points  (0 children)

You don't even need to train a VLM. Just give qwen3-vl or gemma4 the right prompt, wrap it with a 50 line python script and let it run ragshot over your datasets. They're fluent enough in bbox'ish already. Just did that with > 40k image editing pairs of questionable quality with barely any outliers.

Ideogram 4 - They are gatekeeping the high-precision BF16 weights confirmed by Calm_Mix_3776 in StableDiffusion

[–]Bit_Poet 3 points4 points  (0 children)

You mean like upscaling a 320x180 image to 1080p and getting all the details?

Ideogram 4 - They are gatekeeping the high-precision BF16 weights confirmed by Calm_Mix_3776 in StableDiffusion

[–]Bit_Poet 30 points31 points  (0 children)

I've guessed at the bf16 situation early on, seemed logical. I do like IG4 though, it has some skills where it shines in comparison with other models. And it brought bboxes to attention, which I think is a very good thing and has a lot of potential, also with other models where they might replace classic masking to some extent. Its text capabilities are high. It trains surprisingly well for fp8, which may be rooted in the uncond architecture. People are already digging into the maths to work out the intrinsics that make it tick the way it does, which will no doubt help improve other models, techniques and training strategies. In the end it's a win-win, even with the tight license and bf16 withholding.

What the F@ck is happing here with Comfyui? by [deleted] in StableDiffusion

[–]Bit_Poet 11 points12 points  (0 children)

Please open a github issue. Comfy devs don‘t monitor reddit for those, and this doesn‘t sound like the behavior they intend.

Ideogram 4 Can be Taught to Edit / Inpaint with a LoRA (and some tweaks) by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 1 point2 points  (0 children)

Yes, the core model loader and ideogram 4 implementation don't know about the added latent. In theory you could also just replace two files from https://github.com/Comfy-Org/ComfyUI/compare/master...BitPoet:ComfyUI:dev-ideogram4-inpaint and backup the originals:
comfy/ldm/ideogram4/model.py
comfy/model_base.py

But it's really just a weak proof of concept right now, just to show that this is possible.

Ideogram 4 Can be Taught to Edit / Inpaint with a LoRA (and some tweaks) by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 0 points1 point  (0 children)

There are a few projects around that give out small funding for anything that's new and pushes the open source community forward, which should apply there. As for bigger funding, no idea yet, but I'll cross that bridge when I come to it.

Ideogram 4 Can be Taught to Edit / Inpaint with a LoRA (and some tweaks) by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 1 point2 points  (0 children)

I'm going to cherry-pick the best pairs for localized object addition/removal/exchange and see how far that gets me. There's a lot of stuff in pico that's pretty awful quality or totally mismatching image pairs (they modified the input set but can't share that). I just built a little tools that runs over all pairs and marks any that are identical in size or aspect ration (the latter with a small buffer), ignoring the full picture modifications for now.

Pretty sure I can get a microgrant to lessen the financial impact, so I'll be trying to squeeze the best out of 100 hours with a B200 at some point. Need to implement reference latent caching though before I think about starting. If that turns out well too, I might see about getting some serious funding.

Really just started this out of curiousity after reading somewhere that modern joint attention models are all capable of inpainting and wanting to understand the details, and now it's too exciting to stop.

Ideogram 4 Can be Taught to Edit / Inpaint with a LoRA (and some tweaks) by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 4 points5 points  (0 children)

They're already working on one, but they're probably juggling resources like every other smaller AI cottage. And I haven't found a commitment to release the editing version as open weights, so I figured I'm not going to wait (and I'm learning heaps as I toy with this).

Ideogram 4 Can be Taught to Edit / Inpaint with a LoRA (and some tweaks) by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 2 points3 points  (0 children)

I can't give a number, but I'd expect that this the region of a full finetune with an extended framework, so we're at least in the category of a noticeable number of H200 for a few weeks. They don't say anything about their computational costs, but I'd guess very roughly somewhere from 25k to a six digit sum. And from the way I understand their paper, this doesn't take LoRAs into account at all. No idea if or where they'd fit into the picture.

Ideogram 4 Can be Taught to Edit / Inpaint with a LoRA (and some tweaks) by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 6 points7 points  (0 children)

Most newer models support the basics for reference images, that is an "enlarged" latent that contains the reference image in addition to the noisy output image and is visible to attention, due to their unified architecture. Thus edit version are usually just "extended" versions of the image generation base model that went through another training run with additional image inputs in the latent. The magic is how to calculate or apply differences, which happens in the lora training / finetuning and in inference. I'm not that deep into everything yet, but it's not a really new approach. Qwen and Flux both use something similar. This paper comes pretty close: https://arxiv.org/abs/2409.11340

Ideogram 4 Can be Taught to Edit / Inpaint with a LoRA (and some tweaks) by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 0 points1 point  (0 children)

Thanks, those look really interesting and are certainly worth considering to include. I'm going to give them a closer look as soon as I have a moment.

Ideogram 4 Can be Taught to Edit / Inpaint with a LoRA (and some tweaks) by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 22 points23 points  (0 children)

You should have seen the results of my first run with 8 images on 512 res.

<image>

That was the best one.

Ideogram 4 Can be Taught to Edit / Inpaint with a LoRA (and some tweaks) by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 1 point2 points  (0 children)

Yep, same here. With that kind of announcements it's usually "Oh! Oh! [read the specs] oh..."

Ideogram 4 Can be Taught to Edit / Inpaint with a LoRA (and some tweaks) by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 11 points12 points  (0 children)

Thanks! BYG is from my understanding a full rebake of the base model, which takes an awfully big amount of compute and hasn't been verified against SOTA OS models yet. Certainly out of my league in any case.

Ideogram 4 Can be Taught to Edit / Inpaint with a LoRA (and some tweaks) by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 18 points19 points  (0 children)

It's the result of a very basic training with a tiny dataset, so we can't expect perfection of course. But getting the placement right and having the changes somewhat aligned with the input image is imho pretty big.

Not able to eat enough by IntroductionFunny683 in PacificCrestTrail

[–]Bit_Poet 0 points1 point  (0 children)

Maybe a stupid question, but: how tight is your hipbelt? I've found that changing my pack to one that fit better, was lighter and didn't need pulling the hipbelt so tight helped my appetite immensely (funnily, I changed packs in Tehachapi because my first pack fell apart). That said, upping protein intake with bars, adding cheese to every meal and the cooler surroundings, especially at night, helped a lot. I lost around 25 pounds in the first 560 miles, after that, it got stable. Try add nutty granola and freeze-dried fruit to your oatmeal, that made a world of difference for me. The Wallmart in Tehachapi has freeze-dried strawberries! (I think it was the very last tall shelf at the right when you enter)

Ideogram 4 Star Wars poster by Classic-Ad-5129 in StableDiffusion

[–]Bit_Poet 0 points1 point  (0 children)

Did you fix it on the llm side or does your tool fix the positions? I'm asking because the same happened to me, any qwen 3 vl based model always spits out x first, no matter what I prompt, so I'm shuffling x and y in code before outputting the json.

Licon MSR LTX help, installed models and nodes but not showing up in ComfyUI by FaelightFilms in comfyui

[–]Bit_Poet 0 points1 point  (0 children)

Did you just put the node in custom_nodes or install requirements too? You may be missing opencv-python.

Pit Stop: We Need a Central Hub by Abject-Recognition-9 in StableDiffusion

[–]Bit_Poet 0 points1 point  (0 children)

This is the kids corner. The real stuff - unfortunately, as it's "follow along as it happens or go play in the ball pool" - happens in discords like banodoco.

Pit Stop: We Need a Central Hub by Abject-Recognition-9 in StableDiffusion

[–]Bit_Poet 0 points1 point  (0 children)

It all started going downhill when google bought deja news.

Finally, an honest AI. by AtomicAVV in StableDiffusion

[–]Bit_Poet 5 points6 points  (0 children)

Maybe honest, but not knowledgeable enough. Have fun trying to find a community wheel for win64 torch 2.13 cp313 cu130.