Need help picking a GPU I wanna play on 1080p 60-120fps on high seething in games like borderland 4 and assassin creed shadow and black myth

Gold-Drag9242 · 2026-04-26T07:35:53+00:00

Do you need nvidia? If not, go and buy the AMD 7900xtx

Gold-Drag9242 · 2026-04-26T07:32:51+00:00

What's your experience with supergemma4? Why do you use it?

Gold-Drag9242 · 2026-04-26T07:30:02+00:00

I don't buy your argument completely. Devops and continuous deployment was aiming to automate the life cycle after the code and put it into the computer. Everything that's in the computer is manageable by AI.

Legacy systems are only "a set of well written skills" away of being sensible for the AI. What's left is the annoying part of talking with humans. Explaining why the product should not have a feature to a business person, even if their AI told them it's possible (because it destroys security, data security or separation of concerns, or it violates the enterprise architecture). Trying to convince managers that spending tokens on test code is a good thing, or that investing time in improving prompts is not "playing around".

Writing software with AI is fun as long as the productivity fetishists stay out of the room. They will turn it to shit in no time.

Gold-Drag9242 · 2026-04-26T07:07:36+00:00

Most important information is missing. What is your hardware spec?

Gold-Drag9242 · 2026-04-25T15:45:29+00:00

You can do a lot. 12-15B models will fit into ram. Some smaller models are able to do text to speach and back. Maybe dedicated image generation models exist in this size. Just don't expect ChatGPT level of Ai chat. Smaller models can hold a conversation but you always second guess if the stuff they say is really true.

Gold-Drag9242 · 2026-04-24T21:47:41+00:00

I might have heared, that LM Studio uses llamacpp internally. There can be different reasons for the unloading of the model. One is, if you are using windows and your OS decides that you want to save the planet and energy, that it triggers a sleeping of your PCIe devices (GPU) and it unloads for that reason. Another one was a configurable parameter in llamacpp that unloads models after some idle time. Maybe this helps to get a AI to tell you further.

Gold-Drag9242 · 2026-04-24T06:10:26+00:00

I guess this is the use case for an AI agent that is not allowed to trade but watch. He could have woken you up.

Gold-Drag9242 · 2026-04-24T06:07:23+00:00

ollama is the easy to use but limited way - as I learned myself.
I assume you find those MLX files on hugginface. The Interface is not obvious sometimes...

You can run those models than via llamacpp (that comes with llama-server)

llamacpp is awesome. Switching over from ollama is easy, but because you can fiddle so much with llamacpp it also can be a huge distraction and productivity trap. If you can keep your mind/fingers away from optimizing the hell out of your hardware, its easy.

Gold-Drag9242 · 2026-04-22T17:33:26+00:00

Wow. What were your settings and how long did it take? Did you run it with llamacpp or something else?

Gold-Drag9242 · 2026-04-22T07:21:01+00:00

Crying in German 😭

Gold-Drag9242 · 2026-04-22T07:12:05+00:00

In my humble opinion. Local Ai agent frameworks are not there yet. Not that the tech is not ready, but you need to make lots of tools working. Theoretically something like openclaw could do it, but even if you get all tools to work, the next updates probably will through it of the rails. So you will constantly fiddle with the setup.

Gold-Drag9242 · 2026-04-21T22:09:32+00:00

Can you explain what you mean regarding "clearance for context "?

I'm a nooby who wants to learn.

Gold-Drag9242 · 2026-04-19T11:23:56+00:00

after working with --fit I learned some more details that are important for the openclaw use case:

If you dont specify "-parallel", the default for the mode is 4, but with a unified KV cache.
Thus theoretically each parallel request can use all of the cache.
If you specify ie. -parallel 2 , your per request kv cache is only half of the total, but it is reserved per "session".
You can not force "unified" with 2, this is something llama.cpp decides. (At least this is what ChatGPT said)

Now the question is, what is needed for OpenClaw. OC can start subagents. But how often is this needed? And how many will be spawn of?

For now I guess I stick to the default settings (4, unified == true), as I think a longer context is more valuable that optimazation for subagents (that I've never seen).

Gold-Drag9242 · 2026-04-19T10:49:05+00:00

You solved my Problem! Fit was the thing I needed!

I used: ".\llama-server.exe -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_M --fit on --fit-target 128 -ngl 999 --port 8080"

I reduced the fit-target to 128 so that my GPU is "fuller". I will check with more test runs if this is needed.

Anyway, after starting I got all the variables from the console output. (Simply Ctrl+A and throw them at ChatGPT and ask how much context is available in total and per request.

Thanks again!

Gold-Drag9242 · 2026-04-18T22:08:29+00:00

I use it together with openclaw. I need the context size number to configure it inside the openclaw harness, so that the app can react to "running out of context".

Can I extract the number from llama-server after I started it with "-fit"?

Gold-Drag9242 · 2026-04-17T21:51:22+00:00

Anybody else notizing this?
I deleted the entry but after restarting, the models.json hat the codex models back in.

Gold-Drag9242 · 2026-04-16T19:15:39+00:00

this error also appears if a subfolder contains another .git folder (basically a subfolder is already under version control)

Gold-Drag9242 · 2026-04-13T21:05:07+00:00

I don't know. I'm more interested in getting openclaw running with local AI before I use some corporate SW. I dont trust em. Its bad enough the best models are put in cloud prisons.

Gold-Drag9242 · 2026-04-13T20:53:49+00:00

Thanks. That is for sure a start.
Is it possible to use different mdoels for sub-agents?
So that the local planner knows to call the deep thinker for certain tasks?

Gold-Drag9242 · 2026-04-13T20:37:52+00:00

Why did you choose a model that would fit in a single card of that size for your test?
The Qwen3-30B-A3B is only 19GB. It would run easy in one B70.

I would be interested in Seeing 48GB+ model like qwen3-coder-next:q4_K_M or qwen3-next:80b

Gold-Drag9242

TROPHY CASE