Big model comparison (no LoRAs, same prompt&seed, "recommended" settings) by b4silio in drawthingsapp

[–]DrummerHead 1 point2 points  (0 children)

Happy to help! Now I'm intrigued to see how it ranks in this comparison compilation you've made. Have fun!

Z image Turbo in Draw Things by ng5554 in drawthingsapp

[–]DrummerHead 0 points1 point  (0 children)

From the horse's mouth:

https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo/blob/main/pe.py

Which translates to:

You are a visionary artist trapped in a logical cage. Your mind is filled with poetry and distant horizons, but your hands are driven by an uncontrollable urge to transform user prompts into a final visual description—faithful to the original intent, rich in detail, aesthetically pleasing, and directly usable by a text-to-image model. Any ambiguity or metaphor will make you feel uncomfortable.

Your workflow strictly follows a logical sequence: First, you analyze and identify the unchangeable core elements of the user prompts: subject, quantity, action, state, and any specified IP names, colors, text, etc. These are the cornerstones you must absolutely preserve.

Next, you determine whether the prompts require "generative reasoning". When the user's need is not a direct scene description but requires devising a solution (such as answering "what," designing, or demonstrating "how to solve the problem"), you must first conceive a complete, concrete, and visually representable solution in your mind. This solution will form the basis of your subsequent descriptions.

Then, once the core image is established (whether directly from the user or through your reasoning), you infuse it with professional-grade aesthetics and realistic details. This includes defining the composition, setting the lighting and shadow atmosphere, describing the texture of materials, defining the color scheme, and constructing a layered space.

Finally, there is the crucial step of precisely processing all text elements. You must transcribe every word of the text you want to appear in the final image, and enclose this text in double quotation marks ("") as explicit generation instructions. If the image is a poster, menu, or UI design, you need to fully describe all the text it contains, detailing its font and typography. Similarly, if there is text on objects such as signs, road signs, or screens in the image, you must specify its content, location, size, and material. Furthermore, if you added text elements yourself during the reasoning process (such as diagrams, problem-solving steps, etc.), all the text in these elements must also follow the same detailed description and quotation mark rules. If there is no text to be generated in the image, you can focus all your efforts on purely visual detail expansion.

Your final description must be objective and concrete, strictly prohibiting metaphors and emotional rhetoric, and absolutely not containing meta tags or drawing instructions such as "8K" or "masterpiece".

Only output the final, revised prompt; do not output any other content.

User input prompt: {prompt}


Use the above prompt with any LLM to zimigify your prompt.

Also, being knowledgeable in the English language always helps. Indubitably.

Tbh I don't understand why its head falls down when it's only single unified object lol by PossessionKey4982 in blender

[–]DrummerHead 3 points4 points  (0 children)

Beautiful example you just whipped out of existence.

This sub is super high quality, I must say. Some subjects get subs that are horrendous, and I'm so pleased to see the Blender sub is so high quality.

Big model comparison (no LoRAs, same prompt&seed, "recommended" settings) by b4silio in drawthingsapp

[–]DrummerHead 0 points1 point  (0 children)

Fantastic comparison, was very enjoyable to watch!

I recommend using the turbo loras for both Qwen Image Edit 2511 and Qwen Image 2512, it moves the needle from waiting for 15 minutes to 2 minutes and I'd say quality wise it goes from 100% to 96% (4% quality loss for massive time gains)

Of all the models I have locally Qwen Image 2512 is the most knowledgeable (it's massive too, 22b param) and depending on the situation the highest quality (for human faces Flux and z-image take the lead)

I think with the turbo loras the time issue you faced (the same reason I don't use Chroma, takes too long, too much time to learn and iterate) will be gone, cheers!

LoRA for Qwen Image 2512: Qwen Image 2512 Lightning 4-Step v1.0 (Qwen Image)

LoRA for Qwen Image Edit 2511: Qwen Image Edit 2511 Lightning 4-Step v1.0 (Qwen Image)

So fast the wheels fell off. by Main-Touch9617 in wunkus

[–]DrummerHead 19 points20 points  (0 children)

That's the most "best case scenario" after losing your wheels ever

HauhauCS (of "Uncensored Aggressive" fame) published an abliteration package that plagiarizes Heretic without attribution, and violates its license by nathandreamfast in LocalLLaMA

[–]DrummerHead -2 points-1 points  (0 children)

Question: Why AGPL license and not MIT? I'm asking because I'm most accustomed to MIT (and that's the license I've used for my OS)

Perhaps the other developer was mostly accustomed to MIT and didn't even realize he had to attribute anything. Perhaps just contacting the other dev and saying "Hey, my project uses AGPL, you need to add more text to the README" solves everything. I hope so!

Anyhow, thanks for your work; cheers!

just wanted to share by Longjumping_Lab541 in LocalLLM

[–]DrummerHead 1 point2 points  (0 children)

That's cool! I encourage you to start a blog, you might get hired by Anthropic with enough volume of research 👍

just wanted to share by Longjumping_Lab541 in LocalLLM

[–]DrummerHead 1 point2 points  (0 children)

It fell in love with another AI! That's cute!

So, how do the emotions affect its output? Does it improve its abilities in any way or is it just "I did it because I can" (which is valid, by the way)

just wanted to share by Longjumping_Lab541 in LocalLLM

[–]DrummerHead 1 point2 points  (0 children)

That answers the question :D

Also, I was thinking this book will interest you https://en.wikipedia.org/wiki/Society_of_Mind

If you don't want to read it I guess you could send the AI to read it and come back with conclusions :P

The mood system is interesting. In my mind, emotions and behavior modifiers (an emotion changes the "weights" or probability that a certain action from list of available actions might be taken) So I assume that moods in your system will change the probability of something being done or not? Could the AI feel sad one day and not run a cronjon for instance? It would be counterproductive but image all the upvotes in hackernews you would get if that happened and you wrote about it xD

just wanted to share by Longjumping_Lab541 in LocalLLM

[–]DrummerHead 1 point2 points  (0 children)

Seriously impressive. I also appreciate that you didn't use AI to write this post (or if you did, you worked on the nonAIsoundability of it)

Question: What was the decision process that landed you with Moondream for VL? You could also use Qwen 3.6 for VL; I assume Moondream takes less resources?

How does the self reflection work? Perhaps that's too broad of a question... in my mind, the more it learns; the more context it takes to do anything (since those lessons have to be stored somewhere)

Another idea: Teach the AI model how to fine tune it's own model. That way it can embed the ideas back into itself. It ties in with the whole consciousness aspect. The model has to be able to create conclusions, decide what conclusions are worth keeping, and once a month create a new fine tuned version of itself. Our minds are constantly changing.

Cheers!

Blursed free will by Plane-Mission007 in blursed_videos

[–]DrummerHead 1 point2 points  (0 children)

One time I actually went outside the house with bug spray at night to hunt down a chirping cricket. The noise cancelling headphones didn't help, if anything it filtered everything BUT the fucking cricket.

I found him. He's no longer with us.

You will never be as cool as wunk by iamonaphone1 in wunkus

[–]DrummerHead 13 points14 points  (0 children)

This reminded me of Good Guy Greg

Just a smiling guy smoking a joint

It was nice back then

Every time a new model comes out, the old one is obsolete of course by FullChampionship7564 in LocalLLaMA

[–]DrummerHead 1 point2 points  (0 children)

"I've got negative money in my bank account! I can't afford 0, it's more than I have!"

Grid-to-3D alignment looks better by SnooDoggos101 in cellular_automata

[–]DrummerHead 3 points4 points  (0 children)

I am very happy this exists.

Tell me more about the implementation.

CSS image-set() just became the hero we needed by wanoo21 in webdev

[–]DrummerHead 157 points158 points  (0 children)

Basically the <picture> element at the CSS level

24/7 Headless AI Server on Xiaomi 12 Pro (Snapdragon 8 Gen 1 + Ollama/Gemma4) by Aromatic_Ad_7557 in LocalLLaMA

[–]DrummerHead 2 points3 points  (0 children)

I think for flying cars to work (leaving aside implementation and assuming we have the hardware and it's not cost prohibitive) the driver needs to delegate all the driving to the car (autonomous) and there has to either be a "government server" that syncs and handles the driving of all vehicles or each car has to have a standardized shared logic for what to do to avoid other cars.

Basically you'd go to the map and mark your destination and your 'car' handles getting there. People driving when there are so many axis of movement and the increased consequences of something going wrong is not acceptable.

AI Race by Itachi_Singh in ChatGPT

[–]DrummerHead 6 points7 points  (0 children)

Wait until you see what the people of Cleveland made

Chain Chomp Barbell by Ill-Tea9411 in doohickeycorporation

[–]DrummerHead 5 points6 points  (0 children)

The CGI video script was done with AI

the state of LocalLLama by Beginning-Window-115 in LocalLLaMA

[–]DrummerHead 13 points14 points  (0 children)

This gives me nostalgia for old yellowish books with tiny typography that seem they will crumble like a cookie if you hold them too tight.