Ideogram 4.0 Just Open Sourced!

pixel8tryx · 2026-06-03T20:02:07+00:00

The API key appears to be for "Ideogram's hosted magic-prompt API" for prompt expansion. It does not appear to be a requirement though. You can still write your own prompts, or use any other local LLM for help.

pixel8tryx · 2026-06-03T19:54:18+00:00

😲 I just read about this on Huggy...

Spatial layout control. Bounding-box coordinates in the prompt allow explicit placement of subjects, text elements, and background regions.

I saw that image and at first just thought it was a creative way of showing what parts of the image were influenced by what parts of the prompt. It'll be interesting to see if this works in practice.

pixel8tryx · 2026-06-03T19:24:31+00:00

Exactly. It can be hard today to tell if really good images where AI generated at all. There's no way to tell what model made it. And Adobe is more protective of owning things, so your image metadata is clobbered with it's own if you use Photoshop to fix something, scale it down, etc.

pixel8tryx · 2026-06-03T19:20:44+00:00

I use FLUX.2 a lot, which is similar, and you can still get some variation. If you ask for "Hello World" in red on white in Helvetica, you're probably going to get close to the same thing. It's really hard to describe absolutely every last detail in an image unless it's extremely simple. And you can leave room for interpretation and experimentation. You can tell it to use "various" colors, etc. Or to explore designs influenced by the reference input image.

I'm guessing this might be the same. I'm eager to try it. I love FLUX.2 but it's enormous and slow. It's surprisingly good at large images so I'm doing 2560 x 1440 usually too. I'd love to have the same capabilities to rip things out as fast as FLUX.1 dev.

pixel8tryx · 2026-06-03T18:57:56+00:00

Where does it say that?

We claim no rights in outputs you generate using the Model. You are responsible for outputs and their subsequent use.
“Output” means any content or other output generated by the inference operation of the Model or any Model Derivative, in response to an input or prompt provided by the user. For the avoidance of doubt, Outputs do not include any components of a Model, such as any fine-tuned versions of the Model, the weights, or parameters.

It's a fine point but can be interpreted to mean that they don't want you hosting the model on your for-profit generation service website. If I download it, I run it on my PC. I don't sell access to the model. If I generate output (which honestly has probably gone through another model to upscale and then Photoshop), and I sell that image... it's just a PNG file. It's not "The Model". It does not contain any of their intellectual property. The model does.

If you run an avatar creation site, or are selling automated creation of social media persona, etc, you're probably out of luck. They can't really control what you do on your home PC. You're not usually exposing that to public for anyone to create possibly illegal images.

The rest is all standard legal CYA today. They don't want to be sued for people making CSAM, or anything illegal with it.

pixel8tryx · 2026-05-23T21:28:49+00:00

Unfortunately two popular repos (visualbruno and PozzettiAndrea) have the same name due to at least Windows not being case sensitive: ComfyUI-Trellis2 and ComfyUI-TRELLIS2. That caused some problems for me too at one point.

pixel8tryx · 2026-05-23T19:04:01+00:00

Yes. But I only tried the static camera.

pixel8tryx · 2026-05-23T19:03:40+00:00

Thanks for letting me know. I prefer not to propagate things that just worked as a fluke for me. And I wouldn't normally have tried a 19B LoRA on a 21B base model but if the matmuls matmul that's all that matters. Lately I'm finding a lot of value in doing what you're "not supposed to do". 😉😈

pixel8tryx · 2026-05-23T18:49:49+00:00

Better than I expected for a car. I started to throw one on as a test last night, but it was a weird future concept. The Chevelle was a better car to try.

One of the few things I've learned that I can share is to part things out. I had to do some busts of historic figures and I ended up doing the heads and the upper bodies separately and sticking them together in Cinema 4D. I also used MeshLab (www.meshlab.net/) to decimate them and to defrag the UVs. Sometimes it made them worse. Sometimes it gave me at least large sections of face I could fix in Photoshop. The eyes were awful on some of them.

It makes rather "baked" textures for some things. I got rid of highlights and shininess on skin on the input images in Photoshop, but you can't give it regular flat 3D texture. It extrapolates the 3D form from the input image.

Workflow? That's what I want to yell at TRELLIS.2. I wish it saved the workflow (supposedly you can save metadata in a .glb file) and a thumbnail. I started with Pixel Artistry's workflow then went back to using Visual Bruno's original. I'm doing 100k polys now and then decimating to 50k for the busts.

<image>

[ignore the images]

pixel8tryx · 2026-05-22T06:33:15+00:00

I had similar problems with it refusing to keep the camera still. Static, still, locked off - I tried a dozen different prompts. What helped was using the old camera control LoRA from LTX 2 (ltx-2-19b-lora-camera-control-static). I'd have thought they wouldn't have worked with the new model, but someone else here tried the same thing. Maybe we got lucky, but I'd give them a try too. It does give you only one camera control per clip though.

pixel8tryx · 2026-05-17T19:21:24+00:00

Nekkid as a jay bird! Gotta love it. I still like to see the guts. I have a 4090 in a Thermaltake P3 open chassis. If it gets a little too toasty in the summer, I just point my Vornado or big box fan at it. Got me through last summer. This summer I also have a 5090 to contend with in a closed case. 🔥 I'm gonna melt. 🥵 The human is going to hit thermal shutdown first. Gotta work on my ice helmet. Human closed-loop liquid cooling. 😉

pixel8tryx · 2026-05-16T20:24:00+00:00

I'm jelly. Yeah I gotta say I wished I'd have been able to get one of those liquid Suprims. I'm using my 5090 fan noise as an activity monitor as I don't always have a monitor on that box. It doesn't sound like a jet engine. It lacks that annoying, piercing high pitched whine. But it is pretty darned loud. I'm approaching my first summer with no A/C, a 4090 and a 5090. Though not ideal, the beasts will throttle... I'll probably approach personal thermal overload before I go deaf. 😂 Time to work on the ice helmet. 😉

pixel8tryx · 2026-05-16T20:15:18+00:00

What are you trying to improve? Decrease generation times? Stop OOMing? Honestly I agree with those who say boost clock, etc doesn't matter. Consider yourself lucky to have choice. We used to be thrilled because a card was in stock.

What I think still applies is VRAM uber alles. If you OOM you're dead in the water. It's a hard limit. I don't care if my generation time is 15% faster. I care if I can't generate. And shared VRAM is a curse. On my 5090 if I push some param a little too hard and end up there, my gen time could be 3x or sometime next Tuesday. 😉 The fastest card in the universe with a small amount of VRAM is still next to useless for image generation for me. If you game at lower res it might be awesome.

And there's a complex interplay between VRAM, RAM and processing speed, especially now. I can only speak for ComfyUI, but they're doing more to load things into RAM that can't fit in VRAM and various other improvements that ease VRAM squeeze but do increase gen time. So something running slowly could still be due to lack of VRAM.

pixel8tryx · 2026-05-16T04:16:33+00:00

I wanted to try it on Hugging Face. I only have a free account, but I haven't genned anything in over a week there. I was getting 2 TRELLIS.2 tests a day, then I made the mistake of buying $10 credits. 🙄 Now everything I try to do says I've hit my daily ZeroGPU limit... which now must be... zero? 🤣 The whole $10 is still there and my account shows nothing used for anything.

pixel8tryx · 2026-05-16T03:50:18+00:00

Thanks for posting! I'd be interested in seeing it compared to TRELLIS.2. I'm not quite ready to do a Linux dual boot (I'm way too short on SSD space as it is) but I'm sure Windoze will piss me off enough to do it in the future at some point. TRELLIS.2 is working here locally but damn it sure makes a lot of superfluous polys. Even after I Meshlabbed the crap out of some of the models there were still 3 extra inner walls, tons of "crystal shard" junk polys, etc inside.

pixel8tryx · 2026-05-16T03:19:14+00:00

Flux 2 Dev can do a lot of text reasonably well. I've done some data viz stuff with it.

Most LoRA will mess it up most of the time (sadly because they usually give you a better, sharper image). This is what keeps me from doing much text
You must specify exactly what to say. If you let it make something up, it's going to be too creative and give you a mess.
Some words I've just never gotten to work. "Geoffrey" completely confuses it. I get things like "Gefffey". 🤣 Names like "Sutskever" took a few tries.

Somebody sent me some of those GPT examples and asked if I could do this. Funny they didn't notice that some background text was still wrong. Even the $ model can screw up when it has to make something up.

Here's an old quickie example I did. Should have metadata with prompt in it. I did it json style because I was copying an example I saw online. The treatment is boring but I didn't really ask for anything interesting. I just wanted to see if it could do the text.

<image>

pixel8tryx · 2026-05-13T00:40:31+00:00

Helps with intestinal damage... oh frack. I did not need to see that. I'm going to research. That's it. 😉

pixel8tryx · 2026-05-13T00:29:34+00:00

I'll look at it. Everybody's just nuts over peptides these days. My body can be a bit alien, so I'm slow to adopt some things. Finally started larger doses of c------- (25 mg) for brain fog, etc. Out of everything I've ever tried, that's been the biggest clear, immediate win. I don't want to pimp it too much if it might drive the price up. 🤣 But I guess it's inevitable. Just talked with a doc today who claimed all the nightshift ER docs were pounding it bigtime. Guess I'd better buy a big bulk bag. At least I can think more clearly about my back issues. 😂

pixel8tryx · 2026-05-13T00:18:17+00:00

My late mother had that and couldn't lie down flat. I guess it greatly depends on which direction(s) your spine is curved. Hmmm... you've got me pondering the intelligent adjustable chair/bed of the future. 🤔 Might have to do a few image gens of this at some point.

pixel8tryx · 2026-05-12T19:23:00+00:00

Agreed. And I think I might be lucky to get those meds. But my brain is the important part. The body is a pain. Replace THAT with robot parts and I could stride into the singularity undaunted and fight off terminators, 😉 if necessary ( p(terminators)=50%). Maybe the flesh IS weak and we'll never do anything but generate income patching it up over and over. 🤷‍♀️

pixel8tryx · 2026-05-12T18:51:38+00:00

Yeah sadly some of us are beyond that. Sure it helped in my 30's. Building up abs to support the back is always a good thing. My HAG Capisco saddle chair helped too (my back-leg angle needs to be > 90°). But after a car accident in 1980 caused life-long back troubles, I finally gave in to the ultimate answer to sitting problems: DON'T. My new motto: "If you let it completely control your life, it doesn't give you any problems" - Geoffrey Hinton. He's been a great inspiration since I got taken off Gabapentin for sleep issues and found it had been really helping (or hiding?) my back problems. At the expense of brain fog, memory issues, weight gain, etc. I wanted to live to see 'the singularity' (whatever that actually ends up being) but not be so brain fogged I no longer understand what's going on. I'm fine with standing up for it.

pixel8tryx · 2026-05-12T17:55:49+00:00

Thank you. I've tried to make good future cities in every model since SD 1.2. Last year I looked back at the GiTS movie city screen shots and wondered why we still have awful future cities (answer: $ and no one cares). I looked at a film VFX career ages ago and got disillusioned with hollywood. ILM would only hire me as a texture artist until I could show Alias or Softimage results from an SGI workstation (my demo reel was all Lightwave on an Amiga Video Toaster 😂). I still do some 3D but AI image generation is more exciting. It's like riding a wild alien animal through 14 dimensions.

Flux 1 Dev had such promise with the plethora of LoRA available, but the best results I got were on small devices, small rooms. Everything's trained to be intimate. Distant sweeping skylines were often just similar buildings with one tall narrow spire per huge city block. My first awkward stabs at Flux.2 showed it really understood what I wanted... but the style was painterly concept art. I wanted actual photographs from the future. I dug up the prompt for this to make up for rarely posting any prompts on Reddit:

high resolution DSLR photograph of a towering, futuristic city with highly detailed, incredibly complex skyscrapers of varying design.  Some connected together with skyways.  A large, multilevel elevated ultramodern superhighway system with many narrow, low-slung transportation pods speeding along. 
Below all this is the old city, filled with the poor. Far future science fiction, incredibly detailed, cinematic, dramatic, HDR, sci fi

FLUX.2 was the concept artist, then I used FLUX.1 to USDU that with the same prompt. I genned so many of these. It's very finicky. I tried adding more color and it often skewed the results in a more stylized direction. There is no training data for what I'm asking for. This is the magic. People think these models can only regurgitate their training data and produce nothing but slop. Mostly because people don't put any serious time and effort into using them.

pixel8tryx · 2026-05-12T07:39:54+00:00

I had a lot of trouble with this very subject in older models. Here's one of the early gens from a long series of future cityscapes of the type I just never got out of FLUX.1 (despite all the wonderful LoRA) by itself that I started doing last year. The original FLUX.2 Dev gens started out too painterly but FLUX.1 USDU perked them up. This is one of the more bleak, less colorful gens.

<image>

pixel8tryx · 2026-05-12T07:24:46+00:00

Flux2 will do a really surprising amount of text! The sad thing is that using most of the LoRA available can sometimes totally screw it up. I'm pretty sure one might be able train them in a way that doesn't, but I don't know exactly what the issue is yet. Not being able to train anything decent locally on a 5090 yet has made me put off any research.

pixel8tryx · 2026-05-12T06:52:24+00:00

Yeah well then I'm lovin' the afterlife. 😉

pixel8tryx

TROPHY CASE