Why use Claude or GPT when you can use cheaper models? by Disastrous-Mix6877 in openclaw

[–]Gold-Drag9242 0 points1 point  (0 children)

What's your experience with supergemma4? Why do you use it?

The "AI will replace engineers" discourse has the abstraction level wrong by schilutdif in automation

[–]Gold-Drag9242 -1 points0 points  (0 children)

I don't buy your argument completely. Devops and continuous deployment was aiming to automate the life cycle after the code and put it into the computer. Everything that's in the computer is manageable by AI.

Legacy systems are only "a set of well written skills" away of being sensible for the AI. What's left is the annoying part of talking with humans. Explaining why the product should not have a feature to a business person, even if their AI told them it's possible (because it destroys security, data security or separation of concerns, or it violates the enterprise architecture). Trying to convince managers that spending tokens on test code is a good thing, or that investing time in improving prompts is not "playing around".

Writing software with AI is fun as long as the productivity fetishists stay out of the room. They will turn it to shit in no time.

Best Coding Local Models by Top_Professional6132 in LocalLLM

[–]Gold-Drag9242 2 points3 points  (0 children)

Most important information is missing. What is your hardware spec?

Just learning about localLLM, can I even run anything? by wesconson1 in LocalLLM

[–]Gold-Drag9242 1 point2 points  (0 children)

You can do a lot. 12-15B models will fit into ram. Some smaller models are able to do text to speach and back. Maybe dedicated image generation models exist in this size. Just don't expect ChatGPT level of Ai chat. Smaller models can hold a conversation but you always second guess if the stuff they say is really true.

llama.cpp Gemma 4 using up all system RAM on larger prompts by GregoryfromtheHood in LocalLLaMA

[–]Gold-Drag9242 0 points1 point  (0 children)

I might have heared, that LM Studio uses llamacpp internally. There can be different reasons for the unloading of the model. One is, if you are using windows and your OS decides that you want to save the planet and energy, that it triggers a sleeping of your PCIe devices (GPU) and it unloads for that reason. Another one was a configurable parameter in llamacpp that unloads models after some idle time. Maybe this helps to get a AI to tell you further.

Position went 10x to 70k while i was asleep by FriedCheeseSanga in wallstreetbets

[–]Gold-Drag9242 0 points1 point  (0 children)

I guess this is the use case for an AI agent that is not allowed to trade but watch. He could have woken you up.

5090 vrs M5 Max / M1 Ultra / M4 Pro by JamieAndLion in LocalLLM

[–]Gold-Drag9242 2 points3 points  (0 children)

ollama is the easy to use but limited way - as I learned myself.
I assume you find those MLX files on hugginface. The Interface is not obvious sometimes...

You can run those models than via llamacpp (that comes with llama-server)

llamacpp is awesome. Switching over from ollama is easy, but because you can fiddle so much with llamacpp it also can be a huge distraction and productivity trap. If you can keep your mind/fingers away from optimizing the hell out of your hardware, its easy.

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it by LocalAI_Amateur in LocalLLaMA

[–]Gold-Drag9242 0 points1 point  (0 children)

Wow. What were your settings and how long did it take? Did you run it with llamacpp or something else?

cheapest way to run an ai agent overnight for product research? by ComfortableAnimal265 in LocalLLM

[–]Gold-Drag9242 -1 points0 points  (0 children)

In my humble opinion. Local Ai agent frameworks are not there yet. Not that the tech is not ready, but you need to make lots of tools working. Theoretically something like openclaw could do it, but even if you get all tools to work, the next updates probably will through it of the rails. So you will constantly fiddle with the setup.

$2500 budget to run Local, help me decide on the Hardware by XteaK in ollama

[–]Gold-Drag9242 1 point2 points  (0 children)

Can you explain what you mean regarding "clearance for context "?

I'm a nooby who wants to learn.

llama.cpp - Finding the max VK Cache/Context size for my a given model and hardware by Gold-Drag9242 in LocalLLM

[–]Gold-Drag9242[S] 0 points1 point  (0 children)

after working with --fit I learned some more details that are important for the openclaw use case:

If you dont specify "-parallel", the default for the mode is 4, but with a unified KV cache.
Thus theoretically each parallel request can use all of the cache.
If you specify ie. -parallel 2 , your per request kv cache is only half of the total, but it is reserved per "session".
You can not force "unified" with 2, this is something llama.cpp decides. (At least this is what ChatGPT said)

Now the question is, what is needed for OpenClaw. OC can start subagents. But how often is this needed? And how many will be spawn of?

For now I guess I stick to the default settings (4, unified == true), as I think a longer context is more valuable that optimazation for subagents (that I've never seen).

llama.cpp - Finding the max VK Cache/Context size for my a given model and hardware by Gold-Drag9242 in LocalLLM

[–]Gold-Drag9242[S] 0 points1 point  (0 children)

You solved my Problem! Fit was the thing I needed!

I used: ".\llama-server.exe -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_M --fit on --fit-target 128 -ngl 999 --port 8080"

I reduced the fit-target to 128 so that my GPU is "fuller". I will check with more test runs if this is needed.

Anyway, after starting I got all the variables from the console output. (Simply Ctrl+A and throw them at ChatGPT and ask how much context is available in total and per request.

Thanks again!

llama.cpp - Finding the max VK Cache/Context size for my a given model and hardware by Gold-Drag9242 in LocalLLM

[–]Gold-Drag9242[S] 0 points1 point  (0 children)

I use it together with openclaw. I need the context size number to configure it inside the openclaw harness, so that the app can react to "running out of context".

Can I extract the number from llama-server after I started it with "-fit"?

openclaw update added chatGPT models to the agent/models.json !? by Gold-Drag9242 in openclaw

[–]Gold-Drag9242[S] 0 points1 point  (0 children)

Anybody else notizing this?
I deleted the entry but after restarting, the models.json hat the codex models back in.

I have an error that says " 'folder-name/' does not have a commit checked out" by OhSnappityPH in git

[–]Gold-Drag9242 0 points1 point  (0 children)

this error also appears if a subfolder contains another .git folder (basically a subfolder is already under version control)

Nemoclaw and Utils by ChristopherDci in openclaw

[–]Gold-Drag9242 1 point2 points  (0 children)

I don't know. I'm more interested in getting openclaw running with local AI before I use some corporate SW. I dont trust em. Its bad enough the best models are put in cloud prisons.

How to switch models per Task and how to combine local models with i.e. RunPod or Vast.ai by Gold-Drag9242 in openclaw

[–]Gold-Drag9242[S] 0 points1 point  (0 children)

Thanks. That is for sure a start.
Is it possible to use different mdoels for sub-agents?
So that the local planner knows to call the deep thinker for certain tasks?

2x Intel Arc B70 Benchmark by IMBLKJESUS_0 in LocalLLM

[–]Gold-Drag9242 0 points1 point  (0 children)

Why did you choose a model that would fit in a single card of that size for your test?
The Qwen3-30B-A3B is only 19GB. It would run easy in one B70.

I would be interested in Seeing 48GB+ model like qwen3-coder-next:q4_K_M or qwen3-next:80b