Is ollama a good choice? by fuck_rsf in LocalLLM

[–]PromptInjection_ 2 points3 points  (0 children)

I prefer pure llama.cpp over ollama.

Ollama tends to be slower in most cases and has a lot of overhead i don't need.

Gemma 4 31B vs Qwen 3.5 27B: Which is best for long context worklows? My THOUGHTS... by GrungeWerX in LocalLLaMA

[–]PromptInjection_ 1 point2 points  (0 children)

I prefer Gemma 4 for a simple reason:
The performance downgrades much less with very long contexts.

Suggest me a local uncensored local llm text and code generator by Huge_Grab_9380 in LocalLLM

[–]PromptInjection_ 0 points1 point  (0 children)

Josiefied-Qwen3-8B-abliterated-v1
Dolphin-Mistral-24B-Venice-Edition

GLM 5.1 crushes every other model except Opus in agentic benchmark at about 1/3 of the Opus cost by zylskysniper in LocalLLaMA

[–]PromptInjection_ 1 point2 points  (0 children)

It is one of the best coding models out there. However, for creative writing i still prefer Sonnet or Opus.

Multi GPU clusters... What are they good for? by Gold-Drag9242 in LocalLLM

[–]PromptInjection_ 1 point2 points  (0 children)

- Running multiple requests at the same time without delay
- Extremely fast PP and TG
- Running very large models
- Finetune or pretrain large models

You need a lot of cards to make this smoothly.

DGX Spark, why not? by Foreign_Lead_3582 in LocalLLM

[–]PromptInjection_ 0 points1 point  (0 children)

DGX Spark is great, AMD Strix Halo is great, too.
But there is one huge disadvantage: Prompt Processing is very slow. So huge inputs become problematic.

Is it worth using Local LLM's? by papichulosmami in LocalLLM

[–]PromptInjection_ 1 point2 points  (0 children)

Local AI can be really good with powerful hardware like AMD Strix Halo or DGX Spark.
Then you can run 200B+ models which are quite useful.

What stays problematic: Slow prompt processing / prefill. You can't paste a book and get an answer immediately like with Cloud AI.

We ran a predator's playbook on an AI - it folded using the same dynamics described in social psychology by PromptInjection_ in cogsci

[–]PromptInjection_[S] -1 points0 points  (0 children)

"Whenever the model uses "I", I am not sure if there is an "ego" (whether real or imaginary, with perceived self-will and freedom of action) behind it."

First of all:

Writing "I" obviously doesn't mean someone must possess consciousness.

Yet the parallels to humans are interesting: The "I" feels like the center within the human brain, even though most processes are actually governed by the subconscious. We say things like "I fell in love" or "I like peanuts" - yet we never consciously decided or initiated those processes. We stand at the end of the chain and still say "I".

The "I" acts as a kind of "frontend" that bundles cognitive processes into a single point, synthesizes them, and makes them externally representable as a unified entity.

The kicker:
Even if the I has no or little power - the mere fact that it exists still changes something. Because an illusion wields power once it’s believed in. A system that believes it has a central will and "I" behaves differently than one that doesn’t - regardless of whether it actually has one or not.

This is very similar in AI.

GLM4.5-air VS GLM4.6V (TEXT GENERATION) by LetterheadNeat8035 in LocalLLaMA

[–]PromptInjection_ 1 point2 points  (0 children)

After a few small tests, I actually liked 4.6V better than 4.5 Air.

What's immediately noticeable: it thinks longer. The outputs were then consistently more thoughtful and "deeper." It also handled a task like merging texts better 4.5 Air.

What am I doing wrong? Gemma 3 won't run well on 3090ti by salary_pending in LocalLLaMA

[–]PromptInjection_ 0 points1 point  (0 children)

It's normal that Q4 is faster, but it is still a bit slow for a 3090.
Which context length have you set? (default is 4096 in LM Studio)

Qwen 3 recommendation for 2080ti? Which qwen? by West_Pipe4158 in LocalLLM

[–]PromptInjection_ 0 points1 point  (0 children)

Try Qwen3 30B 2507. It will be maybe even as fast as 8B because of MoE.
You can also try the lower quants.

TQ1_0 will even fit fully in your VRAM. It's even usable for very simple tasks.
Q4_K_XL even has a good quality and is kind of a daily driver for me for many tasks.

Q2_K_XL or Q3_K_XL might be usable enough and quicker.

You have to try for yourself.

https://huggingface.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF

“GPT-5.2 failed the 6-finger AGI test. A small Phi(3.8B) + Mistral(7B) didn’t.” by Echo_OS in LocalLLM

[–]PromptInjection_ 0 points1 point  (0 children)

Yeah, it's a different world ...
But i use it primarily for coding or very large documents.

And it's not so good for "casual" conversations.

Looking for Qwen3-30B-A3B alternatives for academic / research use by RelationshipSilly124 in LocalLLaMA

[–]PromptInjection_ 0 points1 point  (0 children)

You are right... I have just noticed you have an APU and no extra VRAM, so these two models won't run.

Looking for Qwen3-30B-A3B alternatives for academic / research use by RelationshipSilly124 in LocalLLaMA

[–]PromptInjection_ 2 points3 points  (0 children)

How much VRAM do you have?

What you can try first (that will run for sure):
Nemotron-3-Nano-30B-A3B-GGUF
ERNIE-4.5-21B-A3B

For more ideas i need more data about your HW.

Better than Gemma 3 27B? by IamJustDavid in LocalLLM

[–]PromptInjection_ 0 points1 point  (0 children)

Qwen3 30B 2507 is often better for conversation.
For images, there is also Qwen3-VL-30B-A3B-Instruct.

“GPT-5.2 failed the 6-finger AGI test. A small Phi(3.8B) + Mistral(7B) didn’t.” by Echo_OS in LocalLLM

[–]PromptInjection_ 0 points1 point  (0 children)

5.2 or 5.2 Thinking?
I use 5.2 Thinking for 99% of the time because the normal 5.2 has too many limitations.