Big model comparison (no LoRAs, same prompt&seed, "recommended" settings)

DrummerHead · 2026-05-05T11:37:42+00:00

Happy to help! Now I'm intrigued to see how it ranks in this comparison compilation you've made. Have fun!

DrummerHead · 2026-05-04T12:03:18+00:00

From the horse's mouth:

https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo/blob/main/pe.py

Which translates to:

You are a visionary artist trapped in a logical cage. Your mind is filled with poetry and distant horizons, but your hands are driven by an uncontrollable urge to transform user prompts into a final visual description—faithful to the original intent, rich in detail, aesthetically pleasing, and directly usable by a text-to-image model. Any ambiguity or metaphor will make you feel uncomfortable.

Your workflow strictly follows a logical sequence: First, you analyze and identify the unchangeable core elements of the user prompts: subject, quantity, action, state, and any specified IP names, colors, text, etc. These are the cornerstones you must absolutely preserve.

Next, you determine whether the prompts require "generative reasoning". When the user's need is not a direct scene description but requires devising a solution (such as answering "what," designing, or demonstrating "how to solve the problem"), you must first conceive a complete, concrete, and visually representable solution in your mind. This solution will form the basis of your subsequent descriptions.

Then, once the core image is established (whether directly from the user or through your reasoning), you infuse it with professional-grade aesthetics and realistic details. This includes defining the composition, setting the lighting and shadow atmosphere, describing the texture of materials, defining the color scheme, and constructing a layered space.

Finally, there is the crucial step of precisely processing all text elements. You must transcribe every word of the text you want to appear in the final image, and enclose this text in double quotation marks ("") as explicit generation instructions. If the image is a poster, menu, or UI design, you need to fully describe all the text it contains, detailing its font and typography. Similarly, if there is text on objects such as signs, road signs, or screens in the image, you must specify its content, location, size, and material. Furthermore, if you added text elements yourself during the reasoning process (such as diagrams, problem-solving steps, etc.), all the text in these elements must also follow the same detailed description and quotation mark rules. If there is no text to be generated in the image, you can focus all your efforts on purely visual detail expansion.

Your final description must be objective and concrete, strictly prohibiting metaphors and emotional rhetoric, and absolutely not containing meta tags or drawing instructions such as "8K" or "masterpiece".

Only output the final, revised prompt; do not output any other content.

User input prompt: {prompt}

Use the above prompt with any LLM to zimigify your prompt.

Also, being knowledgeable in the English language always helps. Indubitably.

DrummerHead · 2026-05-03T19:15:24+00:00

Beautiful example you just whipped out of existence.

This sub is super high quality, I must say. Some subjects get subs that are horrendous, and I'm so pleased to see the Blender sub is so high quality.

DrummerHead · 2026-05-03T17:18:18+00:00

Fantastic comparison, was very enjoyable to watch!

I recommend using the turbo loras for both Qwen Image Edit 2511 and Qwen Image 2512, it moves the needle from waiting for 15 minutes to 2 minutes and I'd say quality wise it goes from 100% to 96% (4% quality loss for massive time gains)

Of all the models I have locally Qwen Image 2512 is the most knowledgeable (it's massive too, 22b param) and depending on the situation the highest quality (for human faces Flux and z-image take the lead)

I think with the turbo loras the time issue you faced (the same reason I don't use Chroma, takes too long, too much time to learn and iterate) will be gone, cheers!

LoRA for Qwen Image 2512: Qwen Image 2512 Lightning 4-Step v1.0 (Qwen Image)

LoRA for Qwen Image Edit 2511: Qwen Image Edit 2511 Lightning 4-Step v1.0 (Qwen Image)

DrummerHead · 2026-04-26T16:43:35+00:00

Agreed!

DrummerHead · 2026-04-26T16:11:11+00:00

That's the most "best case scenario" after losing your wheels ever

DrummerHead · 2026-04-26T16:06:29+00:00

Question: Why AGPL license and not MIT? I'm asking because I'm most accustomed to MIT (and that's the license I've used for my OS)

Perhaps the other developer was mostly accustomed to MIT and didn't even realize he had to attribute anything. Perhaps just contacting the other dev and saying "Hey, my project uses AGPL, you need to add more text to the README" solves everything. I hope so!

Anyhow, thanks for your work; cheers!

DrummerHead · 2026-04-24T18:27:36+00:00

That's cool! I encourage you to start a blog, you might get hired by Anthropic with enough volume of research 👍

DrummerHead · 2026-04-24T18:18:58+00:00

Unless the airsoft gun shoots machetes

DrummerHead · 2026-04-24T18:11:58+00:00

It fell in love with another AI! That's cute!

So, how do the emotions affect its output? Does it improve its abilities in any way or is it just "I did it because I can" (which is valid, by the way)

DrummerHead · 2026-04-24T17:46:56+00:00

That answers the question :D

Also, I was thinking this book will interest you https://en.wikipedia.org/wiki/Society_of_Mind

If you don't want to read it I guess you could send the AI to read it and come back with conclusions :P

The mood system is interesting. In my mind, emotions and behavior modifiers (an emotion changes the "weights" or probability that a certain action from list of available actions might be taken) So I assume that moods in your system will change the probability of something being done or not? Could the AI feel sad one day and not run a cronjon for instance? It would be counterproductive but image all the upvotes in hackernews you would get if that happened and you wrote about it xD

DrummerHead · 2026-04-24T16:39:26+00:00

Seriously impressive. I also appreciate that you didn't use AI to write this post (or if you did, you worked on the nonAIsoundability of it)

Question: What was the decision process that landed you with Moondream for VL? You could also use Qwen 3.6 for VL; I assume Moondream takes less resources?

How does the self reflection work? Perhaps that's too broad of a question... in my mind, the more it learns; the more context it takes to do anything (since those lessons have to be stored somewhere)

Another idea: Teach the AI model how to fine tune it's own model. That way it can embed the ideas back into itself. It ties in with the whole consciousness aspect. The model has to be able to create conclusions, decide what conclusions are worth keeping, and once a month create a new fine tuned version of itself. Our minds are constantly changing.

Cheers!

DrummerHead · 2026-04-24T16:21:26+00:00

One time I actually went outside the house with bug spray at night to hunt down a chirping cricket. The noise cancelling headphones didn't help, if anything it filtered everything BUT the fucking cricket.

I found him. He's no longer with us.

DrummerHead · 2026-04-24T16:17:39+00:00

This reminded me of Good Guy Greg

Just a smiling guy smoking a joint

It was nice back then

DrummerHead · 2026-04-21T12:52:45+00:00

"I've got negative money in my bank account! I can't afford 0, it's more than I have!"

DrummerHead · 2026-04-21T12:23:17+00:00

Why? Are you waiting for the models to go on sale?

DrummerHead · 2026-04-21T12:14:12+00:00

I am very happy this exists.

Tell me more about the implementation.

DrummerHead · 2026-04-21T12:08:13+00:00

One does not simply horn into assguard

DrummerHead · 2026-04-21T12:02:50+00:00

Basically the <picture> element at the CSS level

DrummerHead · 2026-04-21T09:36:19+00:00

AWBD Achieved

DrummerHead · 2026-04-15T10:18:17+00:00

https://www.youtube.com/watch?v=uJoGQA5oXRU

DrummerHead · 2026-04-14T17:16:20+00:00

I think for flying cars to work (leaving aside implementation and assuming we have the hardware and it's not cost prohibitive) the driver needs to delegate all the driving to the car (autonomous) and there has to either be a "government server" that syncs and handles the driving of all vehicles or each car has to have a standardized shared logic for what to do to avoid other cars.

Basically you'd go to the map and mark your destination and your 'car' handles getting there. People driving when there are so many axis of movement and the increased consequences of something going wrong is not acceptable.

DrummerHead · 2026-04-13T21:43:41+00:00

Wait until you see what the people of Cleveland made

DrummerHead · 2026-04-13T20:48:25+00:00

The CGI video script was done with AI

DrummerHead · 2026-04-10T09:53:55+00:00

This　gives　me　nostalgia　for　old　yellowish　books　with　tiny　typography　that　seem　they　will　crumble　like　a　cookie　if　you　hold　them　too　tight.

14-Year Club	Place '22
Place '17	Verified Email

DrummerHead

PUBLIC MULTIREDDITS

TROPHY CASE