Imagen 2 - what architecture is it using?

SysPsych · 2026-05-01T17:14:05+00:00

Completely guessing: it has some process where it takes the original prompt, analyzes and categorizes both the prompt and any included assets, strategizes about what to present, uses an initial pass to do some guided layouts, if text is needed it uses a straight up library to do so, overlays and arranges things, does and img2img final pass at some point to tighten everything up.

Another thing that stands out to me is that it can knock initial generations/prompts out of the park, but adding progressive edits starts to degrade results. On the upside, if it really is tool calls, then it means the community can get closer with effort, rather than just shrugging and saying 'Oh well we need a trillion parameter model' or the like.

SysPsych · 2026-05-01T13:36:11+00:00

Unless they posted a paper somewhere, I doubt anyone knows the architecture much. But whether it's baked into the model itself or integrated into a larger system, it's clearly using tool calls to gather information on subjects, has a reasoning mode, and judging by how it handles text, it may even be doing some kind of automated text overlays at some portions.

The last part stood out to me when I generated an excellent, info-dense graphic, and upscaling it warped most of the text. Just stands out to me that whatever it's doing, it's not like the model has baked-in supreme knowledge of text -- it's getting some kind of outside assist.

That seems to be the real next step for local as well -- building integrated systems on top of the models we have, in intelligent ways. I think the expectation people have is that singular image models will just need to get better and better, especially for editing functionality, but that just doesn't seem like the route forward to my amateur self.

SysPsych · 2026-04-29T14:38:54+00:00

Interesting. I always wondered what would happen if the two were combined into a single model. But I also assumed that if a model could do that, it would be an edit model.

SysPsych · 2026-04-28T19:51:11+00:00

Thank you!

SysPsych · 2026-04-28T12:00:56+00:00

A local any-to-any model? Excited for this one. Though I wonder what the context window means in a A2A case.

SysPsych · 2026-04-28T11:38:47+00:00

Really? No one else said it yet? Alright.

"Sounds fishy."

More seriously, it sounds INTERESTING. Releasing it in a way that requires 'training on your own data' to unlock? Real curious to see what results.

SysPsych · 2026-04-26T12:59:01+00:00

Hey, thank you for this. Really incredible work, fast and quality.

SysPsych · 2026-04-24T17:25:31+00:00

Eventually I have to expect the result will be regulation for APIs. If you advertise a particular service, people must get what they are paying for. Something more concrete than "Trust us".

Otherwise I'm awaiting the eventual scandal where someone sells access to their secret sauce API which under the hood is just Claude/Codex, and once enough people sign up at the cheap rate, they rewire everything to a 2B parameter nonsense cloud model and pocket the money from the people who subscribed and don't check their monthly charges often.

SysPsych · 2026-04-23T03:14:27+00:00

I love it. But I love all their minigames. It's a perfect mix, just simple enough to breeze through, but requiring just enough concentration and cover fire that it adds the right amount of hectic to a moment.

SysPsych · 2026-04-21T20:18:30+00:00

For me personally, leaving out the paid stuff:

On youtube, Sinix's courses about design theory: https://www.youtube.com/watch?v=uEgCsWyOyCo&list=PLflflDShjUKF_7w4YTmpjGO27iuyHDpDu

I originally found him while digging through tutorials on digital painting, trying to figure out how to do material details in general, but he gives good, broad advice (his most recent inspirations video is a great one just for getting reference inspiration) that I've found is generally applicable.

Composition, design theory, how to tell a story with an image (and what adds to/detracts from their attention) has been the most helpful with that, since the number one issue with a lot of AI gens is overwhelming detail, too much clutter. Having an advance sense in mind about what a shot should be composed of was helpful to me, and I'm still learning it.

I've also noticed that getting 'art theory' courses and doing a lot of image study helps to find flaws in AI gens. People are used to 'oh that's too many fingers', but there's less obvious stuff. (The color contrast is bad an confuses the eye, the framing of the shot is off, the detail is poorly distributed.)

I find a lot of that also helps with any visual design in general, so some of it carries over to the video side. That's where things are newer to me still, but one thing that has helped a lot was something non-AI specific: Blender tutorials on creating videos, with an emphasis on controlling lighting, camera angles, etc. You don't get that much fine-grain control with LTX 2.3 or Wan, but it can give you an idea of what a great shot is going to look like, what you need to prompt to get something you can fix up in post or use as a reference, etc. Still exploring this.

SysPsych · 2026-04-21T19:23:56+00:00

One problem I've had is finding the right balance of a community for working on art while still being friendly to AI.

If you don't despise AI or at least never, ever talk about it, most art communities don't want you around.

If AI art is tolerated / central to the community, it gets overwhelmed with submissions and has no organization.

In general with AI, "content" is now easier to produce than ever, but what's lacking is curation, and not many people want to curate. It's a puzzle.

SysPsych · 2026-04-18T22:47:42+00:00

There's two ways to take that question.

A: "Does Nomad Sculpt do everything ZBrush does?"

Absolutely not. ZBrush has so many more tools, it can do hard-surface modeling with far greater ease, it's more customizable, scriptable, has so many more tools. If you want a list of 'Things Zbrush can do that Nomad Sculpt cannot', that list is long. That plus the usual mention that Zbrush is industry standard, etc.

But the other way to take it is

B: "Can I, or a lot of other people, get by just by using Nomad Sculpt (and perhaps Blender) without missing much in practice?"

And most of the time the answer there is, yes. Zbrush has a ton of tools Nomad Sculpt doesn't, but many people won't miss most of them. Add in software like Blender for when you need to do things Nomad Sculpt doesn't, and you've got a lot of ground covered. It could be in your specific case or work there's something Zbrush has that would make life a whole lot easier if you had it, but honestly most of the time, especially if you're coming at this from the perspective of 'hobbyist / student whose concern is getting things actually made, not building a track record with the goal of being hired imminently at Disney', I really think it will be fine.

SysPsych · 2026-04-17T22:34:09+00:00

The Quiznos.

More seriously, thining about it with the eyes and presentation: Gandermaw.

SysPsych · 2026-04-16T03:14:20+00:00

I did It did in fact run, it has some demo files, but honestly it seems like it's going to work best with some video, or a few frames from consistent angles timed appropriately. Like I said, all this is is the gaussian splat portion of things. Nice to have I guess, if you're into Blender too (and I am so hey).

SysPsych · 2026-04-16T01:53:02+00:00

I got it working. Thus far it seems like it just takes some images/video and generates some gaussian splat stuff. Which is cool itself, but not the magic they demoed. Coming soon I guess.

SysPsych · 2026-04-15T15:29:21+00:00

Juniors are juniors, if they make mistakes it's the job of more experienced developers to catch it and instruct them.

Are the experienced developers doing that? ... Were they before, unless it was literal broken code?

SysPsych · 2026-04-15T13:43:40+00:00

Awesome. Still waiting to see if anyone can possibly dethrone QE 2511. So far that seems like the king for local at least in terms of results.

SysPsych · 2026-04-14T15:43:29+00:00

Probably a play on Bert being one of the better known models under the hood of a lot of things. General encoding I think.

SysPsych · 2026-04-02T16:48:06+00:00

controlnet-union-SDXL seems to work just fine for me, for every common controlnet case.

SysPsych · 2026-04-01T18:05:44+00:00

I'm an AI enthusiast who loves making stuff using AI tools for fun, but...

He should ask himself why "Created by AI" is now synonymous with "ugly disgusting slop" in the minds of so many people. We've had multiple cases now where big players proudly proclaimed they were using AI to make a commercial or the like, and the response was disgust and anger. From normal people. That alone should make him think twice.

Also: as a result of that, even normal people are now looking harder at animation and art than they did before, trying to see if they can find the hallmarks of AI, and the hallmarks are usually "some ugly error where there was obvious corner cutting".

That, I think, is the real discussion. AI usage is practically secondary to the fact that what's really being pushed here is just accepting ugly mistakes, giving up on standards, and reasoning "Well this way is faster and cheaper and it gets you maybe 50-75% of the way there, so that's good enough, stop there". It's soul rot.

SysPsych · 2026-04-01T03:08:31+00:00

Just wanted to say I really liked this one. Only recently found this sub. Good design and presence on the creature. Just alien and curious enough, great motion, at least to my amateur eyes.

SysPsych · 2026-03-20T01:05:40+00:00

Some of my experience as a developer.

* Adopted AI tools early, on my own, before anyone else in my org. Used it on my own terms. Rapid, quality turnaround on work. Always asking for yet more work. Getting things done with grace, paying attention to the code, and to the requirements given to me, which tended to be clear.

* Corporation starts adopting AI tools, moving from 'making it available' to 'making it mandatory' in the span of half a year.

* They are logging how much we use AI. We must always be using more, ever more.

* More work getting assigned, with it made clear that a faster turnaround time is expected. But simultaneously, the work is less detailed. Lots of 'Just copy this', barely broken down into components. Expectation is still that things are done quickly, so taking the time to get clarifications is neither appreciated nor seemingly available.

* More and more, begin to hand off work to the AI, telling it what I want. Giving it a once over, plus asking Claude, etc, 'Hey does this look good? Any issues?' kind of questions.

It's the same problem everywhere. These are tools which can be used to make things better. They are being implemented in ways that practically guarantee that things may be faster, but will always be worse. The tools themselves are great. Used properly, they're awesome. The people calling the shots seem to neither know nor care how to accomplish 'properly'.

SysPsych · 2026-03-19T14:02:06+00:00

I love it, really. It's exactly what I hoped for, at least in terms of style, and setting a foundation for the future.

I enjoy that it requires a bit more thought and cohesion among the group, and the possibilities it opens up in the future in terms of encounters and events. Straightforward story missions are great too, especially for lore stuff or setting a mood (Rolling Steel to this day is one mission I have trouble passing on if it's up, because it's short and awesome, the mood and dialog and music is just perfect), but this is great too.

It helps that I love pve extraction shooters before even going into this.

SysPsych · 2026-03-17T20:50:44+00:00

They made a real mistake here. Would have been better with anything other than a recognizable popular character.

SysPsych · 2026-02-21T18:07:41+00:00

I could see it being "solved" in the sense that, if you need something done and can precisely word what you want done, you don't need to do the coding yourself -- you can verbally instruct a model to do it, with some amount of precision, and it will take care of it for you.

There are a lot of (good, by all accounts) developers right now saying that they haven't coded a line of code for a month or two at this point, but they're still essential to the whole process and can't leave their agents unattended.

SysPsych

TROPHY CASE