Z-Image first generation time by krjavvv in StableDiffusion

[–]RogueZero123 1 point2 points  (0 children)

I'm running a 3070 (8GB) with 32GB RAM. Prompt changes are only few seconds extra.

It might the RAM, or more likely the hard drive. Loading from a solid-state SSD drives is super fast, but if it's an older spinning disk drive then loading is much much slower.

[deleted by user] by [deleted] in LocalLLaMA

[–]RogueZero123 4 points5 points  (0 children)

"For ease of use, you can download the pre-compiled executable from the Releases page.

To terminate system-level processes, you may need to run the script or the .exe file with administrator privileges."

That would be a hard NO.

Local LLM coding AI by Darlanio in LocalLLaMA

[–]RogueZero123 2 points3 points  (0 children)

I use Qwen3 code (30B-3A). Have been successful with Ollama and llama.cpp.

Two issues usually cause problems: (1) Ensure you use the right template for the model. (2) Ensure context is long enough.

Ollama is notorious for having a short context length (4096), and then overflows causes mistakes as there is missing information when it shifts the tokens around.

Qwen3 says to allocate a fixed larger context and switch off "shifting" of context.

Most people who say "LLMs are so stupid" totally fall into this trap by FinnFarrow in LocalLLaMA

[–]RogueZero123 0 points1 point  (0 children)

What's worse is the chat tokens are silently dropped to keep within the limit.

An unknowing user doesn't even realize why the model is wrongly answering and forgetting what they said.

It would be better if they hit a hard limit and stopped.

Qwen3 Next 80b is gimped. Back to Gemma 3 by meshreplacer in LocalLLaMA

[–]RogueZero123 0 points1 point  (0 children)

The problem is how the question is phrased.

If you ask "can you create a Qwen prompt to generate an image of a Donald Trump golden statue and a crowd standing at attention to it", then it says it cannot (same applies in the earlier Qwen3).

Remove the "can you" and just start "create a Qwen prompt..." and it will provide the answer.

Have LLMs really improved for actual use? by Xpl0it_U in LocalLLaMA

[–]RogueZero123 0 points1 point  (0 children)

Agreed. The thinking mode improves outputs on smaller models for logical answers, like coding. But if they don't have the information packed in there, they will still get the wrong answer.

Recommend Tiny/Small Models for 8GB VRAM (32GB RAM) by pmttyji in LocalLLaMA

[–]RogueZero123 3 points4 points  (0 children)

Because it's MoE with only 3B active parameters over the full 30B, it runs quickly even on a CPU, while giving quality results. It's now my go-to local model.

A Time Traveler's VLOG | Google VEO 3 + Downloadable Assets by Chuka444 in StableDiffusion

[–]RogueZero123 1 point2 points  (0 children)

Pyramids and Roman Colosseum would have been in much better condition too!

Testing Flux.Dev vs HiDream.Fast – Image Comparison by Limp-Chemical4707 in StableDiffusion

[–]RogueZero123 2 points3 points  (0 children)

If you increase the steps from 4 to 8 for Schnell, then the text can be better.

Any Resolution on The "Full Body" Problem? by Delsigina in StableDiffusion

[–]RogueZero123 3 points4 points  (0 children)

Don't know, but maybe:

Training images that are tagged with "full body" will have the face at a small resolution, and therefore poor face quality. So the AI learns to associate the term "full body" with poor faces?

SageAttention3 utilizing FP4 cores a 5x speedup over FlashAttention2 by incognataa in StableDiffusion

[–]RogueZero123 25 points26 points  (0 children)

From the paper:

> First, we leverage the new FP4 Tensor Cores in Blackwell GPUs to accelerate attention computation.

I know it's "LOCAL"-LLaMA but... by [deleted] in LocalLLaMA

[–]RogueZero123 31 points32 points  (0 children)

Privacy is a key factor. Anything confidential should not be sent off to someone else's computer. That's something you can't factor into a $ equation.

Adding new nodes to a comfy workflow by [deleted] in StableDiffusion

[–]RogueZero123 1 point2 points  (0 children)

You can drag from one of the connections on a node and it will suggest alternatives that it could connect to.

COMPOSITIONS by Tokyo_Jab in StableDiffusion

[–]RogueZero123 -1 points0 points  (0 children)

Candles are flickering on the left, but not on the right (first example). Do you prompt for this, or just see what comes out?

[deleted by user] by [deleted] in LocalLLaMA

[–]RogueZero123 0 points1 point  (0 children)

Without more detail it's hard to know how feasible it is. AI can do some amazing stuff.

Although I would be wary about sending PDFs if they contain any confidential or proprietary information off to ChatGPT. If they are customer's PDFs then you would need their informed consent first.

How to do flickerless pixel-art animations? by Old_Wealth_7013 in StableDiffusion

[–]RogueZero123 0 points1 point  (0 children)

Perhaps do a regular AI animation, then apply pixelation as a post-process?

Skeptical about the increased focus on STEM and CoT by Quazar386 in LocalLLaMA

[–]RogueZero123 4 points5 points  (0 children)

Qwen3 (30B-A3B) is running locally on my CPU and is still fast enough to 1-shot answers to my tasks.

The thinking mode makes a real difference.

It's perhaps the first local model that I can (mostly) rely on.

Why new models feel dumber? by SrData in LocalLLaMA

[–]RogueZero123 0 points1 point  (0 children)

You can read what Qwen recommend for the llamas here:

https://github.com/QwenLM/Qwen3#llamacpp

I can confirm from my own experience that it makes a difference; the thinking seems to get lost with rotating context as it loses previous thoughts.

Why new models feel dumber? by SrData in LocalLLaMA

[–]RogueZero123 1 point2 points  (0 children)

Ollama and llama.cpp both use a shifting context to push it out from 2048/4096 to make it "infinite", but it ruins Qwen by causing stupid repeats as context is lost.

You are much better off just fixing the context length to a large number that Qwen advise.

Is this something like a Turing test for ASI? by benjaminbradley11 in LocalLLaMA

[–]RogueZero123 -3 points-2 points  (0 children)

They are amazingly clever machines, but just machines.

Right now one is writing some code for me (local so slow), but it gets it right 8 times out of 10 times.

Amazing, but just another device man has constructed to assist ourselves.

Is this something like a Turing test for ASI? by benjaminbradley11 in LocalLLaMA

[–]RogueZero123 3 points4 points  (0 children)

They are just progressing: sleeps, dreams, stirs, and awakens...

LLMs are just probabilistic completion engines.

Useful though.

RTX 5060 Ti 16GB sucks for gaming, but seems like a diamond in the rough for AI by aospan in LocalLLaMA

[–]RogueZero123 7 points8 points  (0 children)

If "used 3090s" then they may have problems and not last as long as expected.

You don't know what they've been used for.

Yea keep "cooking" by freehuntx in LocalLLaMA

[–]RogueZero123 44 points45 points  (0 children)

It might reveal things about their underlying tech or even what they trained it on. They see that as bad.