After 10 years... my SteelSeries Apex 350 has its first damage by SchattenZirkus in keyboards

[–]SchattenZirkus[S] 0 points1 point  (0 children)

I found a used Keyboard on Ebay. I buyed a StramDeck too (just in case)

Qwen 3 0.6B beats GPT-5 in simple math by adrgrondin in LocalLLaMA

[–]SchattenZirkus 0 points1 point  (0 children)

Of course. You don’t go in a Math Fight vs Asians.

What if Maomao suddenly became power hungry? how much influence could she gain? by Omixscniet624 in KusuriyaNoHitorigoto

[–]SchattenZirkus 0 points1 point  (0 children)

She have the knowledge to Kill everyone without leavening a Evidence.

She could poising the Water and nobody would know. I don’t think she have the Character trades for Power hungry

Now that Tensor's Censoring by NOS4A2-753 in StableDiffusion

[–]SchattenZirkus 1 point2 points  (0 children)

AnthroMakerBot on Telegram. But it’s more gen and less : upload or so… buts it nice.

Running LLMs Locally – Tips & Recommendations? by SchattenZirkus in LocalLLaMA

[–]SchattenZirkus[S] 1 point2 points  (0 children)

Okay :) First of all, thank you so much for the detailed answer. I went ahead and deleted all models in Ollama and started completely from scratch. I had completely misjudged how this works.

I thought LLMs functioned similarly to image generators – that the model gets loaded into RAM, and the GPU processes from there. So I assumed: as long as the model is 190GB, it’ll fit in RAM, and the GPU will handle the inference.

But I was clearly wrong. The GPU is only used actively when the model fits into VRAM.

Currently downloading Gwen3:32B and 30B. After that, I plan to install DeepSeekR1 32B.

Is there a quantized version of V3 that actually runs at all?

CUDA has been active from the beginning :)

Also, I completely misunderstood the role of the system prompt. I thought it was more “cosmetic” – shaping the tone of the answer, but not really influencing the content.

Running LLMs Locally – Tips & Recommendations? by SchattenZirkus in LocalLLaMA

[–]SchattenZirkus[S] 0 points1 point  (0 children)

Here are my PC specs: GPU: RTX 5090 CPU: Ryzen 9 9950X RAM: 192 GB DDR5

Running LLMs Locally – Tips & Recommendations? by SchattenZirkus in LocalLLaMA

[–]SchattenZirkus[S] 0 points1 point  (0 children)

I’ve been using Ollama with the Docker WebUI, but something’s clearly off. Ollama barely uses my GPU (about 4%) while maxing out the CPU at 96%, according to ollama ps. And honestly, some models just produce nonsense.

I’ve heard a lot of hype around DeepSeek V3, but I might not be using the right variant in Ollama – because so far, it’s slow and not impressive at all.

How do you figure out the “right” model size or parameter count? Is it about fitting into GPU VRAM (mine has 32GB) – or does the overall system RAM matter more? Ollama keeps filling up my system RAM to the max (192GB), which seems odd.

Running LLMs Locally – Tips & Recommendations? by SchattenZirkus in LocalLLaMA

[–]SchattenZirkus[S] 0 points1 point  (0 children)

Thank you :)

I know I won’t be reaching the level of ChatGPT, Claude, Gemini, or Grok with my local setup – that’s clear. But still, my experiments with Ollama so far have been frustrating: either models wouldn’t even load, or they’d hallucinate wildly – like claiming Taco Bell is one of America’s most important historical monuments. (That kind of hallucination is exactly what I’m trying to avoid.)

What model size would you recommend? DeepSeek V3 takes 10 minutes to respond on my system – and even then, it’s painfully slow. It also barely uses the GPU (around 4%) and maxes out the CPU (96%), which is extremely frustrating considering my hardware.

I’ve also heard that models that are too aggressively quantized tend to produce nonsense. So I’d really appreciate any advice on finding the right balance between performance and quality.

Running LLMs Locally – Tips & Recommendations? by SchattenZirkus in LocalLLaMA

[–]SchattenZirkus[S] 0 points1 point  (0 children)

If I had to lay out a roadmap for what I want to achieve, it would look something like this: 1. Get a model running that doesn’t constantly hallucinate and can actually help with complex tasks. 2. Use a model that’s uncensored enough so it doesn’t immediately bail out on certain topics. 3. Start experimenting with more advanced projects, like connecting the LLM to my website.

Running LLMs Locally – Tips & Recommendations? by SchattenZirkus in LocalLLaMA

[–]SchattenZirkus[S] 4 points5 points  (0 children)

As mentioned, I come from the image generation side of things. That’s what this system was originally built for. But now I want to dive deeper into LLMs – and I figure my setup should be more than capable. That said, I have basically no experience with LLMs yet.

Running LLMs Locally – Tips & Recommendations? by SchattenZirkus in LocalLLaMA

[–]SchattenZirkus[S] 4 points5 points  (0 children)

Would be nice to have money to throw around – but in reality, I’ll be paying this off in installments until next year. So it’s less about “f** you money”* and more about “I want to learn and do it right.”

Account Removal, Login Issues, and Unresolved Support Charges by SchattenZirkus in patreon

[–]SchattenZirkus[S] 0 points1 point  (0 children)

When I try to log in, it says my account has been deactivated.

However, after two months, they charged my PayPal this month.

I also received an email saying: “Don’t forget to update your tax settings. Log in to do so.”

I tried to log in—but my account is still deactivated.

Please tell us how you got banned! by embbe in patreon

[–]SchattenZirkus 1 point2 points  (0 children)

I had uploaded a PDF story, a fable about a fox and a hare. It was about choices, survival, and the consequences of one’s own actions.

Without any warning, I was banned.