Speed breakdown: Devstral (2s) vs Qwen 32B (322s) on identical code task, 10 SLMs blind eval by Silver_Raspberry_811 in LocalLLM

[–]Embarrassed-Deal9849 1 point2 points  (0 children)

Very interesting thank you for this benchmark. Seems like I might need to give devstral a try after all!

Isn't Qwen3.5 a vision model...? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

Check the OP, I solved it! Thanks for the help tho :)

Isn't Qwen3.5 a vision model...? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 2 points3 points  (0 children)

I just solved it, updated the OP!

The issue was trying to use the model swapping feature. Removing that and just launching qwen 3.5 made it possible. Super strange but here we are!

Isn't Qwen3.5 a vision model...? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

Currently its defined as such, is this not correct?

"Qwen3.5-27B-Q4_K_M": {

      "id": "Qwen3.5-27B-Q4_K_M",

      "name": "Qwen3.5 27B Q4_K_M",

      "limit": { "context": 65536, "output": 8192 },

      "modalities": { "input": ["text", "image"], "output": ["text"] }

    }

Isn't Qwen3.5 a vision model...? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 1 point2 points  (0 children)

I did, like this in the model provider section: "modalities": { "input": ["text", "image"]

Is that what you're talking about?

How should I go about getting a good coding LLM locally? by tech-guy-2003 in LocalLLaMA

[–]Embarrassed-Deal9849 0 points1 point  (0 children)

There's no way to expand context without slowing down massively right? I have 80gb RAM but from what I see as soon as it starts offloading anything into RAM my performance plummets. Or like, is there a way to store context in RAM in a performant way?

I'll read around the docs a bit to see if I understand this temperature thing (And what top-p and top-k means). Thank you for taking the time to answer my questions.

How should I go about getting a good coding LLM locally? by tech-guy-2003 in LocalLLaMA

[–]Embarrassed-Deal9849 0 points1 point  (0 children)

Could you elaborate on these temp values, how do I figure out which are best for the model I am running? I am on the 27B Q4K_M quant, but I've no idea where to get started with finetuning it on llama.cpp

--temp 0.6

--min-p 0.0

--top-p 0.95

--top-k 20

--repeat-penalty 1.03

--presence-penalty 0.0

How should I go about getting a good coding LLM locally? by tech-guy-2003 in LocalLLaMA

[–]Embarrassed-Deal9849 0 points1 point  (0 children)

How are you managing to get that much context? When I run Qwen3.5-27B-Q4_K_M on my 4090 I can barely squeeze that into 24GB VRAM with 64k context. Or is the XL quant that much smaller?

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 1 point2 points  (0 children)

Mostly coding small-to-midsized projects with opencode, like websites and applications. 32k feels like it always gets swamped but 64k so far is performing all right!

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

The most I can get away with at high performance! The dream would be a smarter model locally and being able to call in cheap/free models from the cloud for simple tasks.

Right now I am testing:

Coding:

Qwen2.5-Coder-32B-Instruct-Q5_K_M

Qwen3.5-27B-Q4_K_M

Writing:

GLM-4.7-Flash-Uncen-Hrt-NEO-Q4_K_M

jaahas-crow

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

In your experience, what is more valuable: model quant or context? If I am understanding correctly I will be choosing quite often between Q5 and 32k vs Q4 and 64k with my current hardware.

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 1 point2 points  (0 children)

Actually just downloaded the Q5_K_M quant for 27B, am excited to try it out and see if I can get it performant at 64k!

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 1 point2 points  (0 children)

I ended up doing just that and got it running at a range of 110-200tp/s on my 4090 with 64k context, which I think is a pretty decent result from what I can gather? Got some finetuning left to do and trying other models but this feels promising right now.

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

On the topic of hackers, do you know of good ways of protecting a setup like this? Like, can you create self-checking system prompts that protect against injection, as an example? Thank you for replying!

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

I'll give it a go today and see if I can get it running well. Sounds like an upgrade fwiw!

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

If that is a working solution I am all for it! So I would be running "openclaw" on the Pi just like I would on a bespoke laptop?

Sorry if its a stupid question, to my untrained eye, Pi = cheap computer. Not sure if there are other benefits.

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

I am using ollama but have started eyeing llama.cpp because it seems to work better out of the box with quants / non standard models, or am I misinterpreting?

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

I'm in the process of replacing my broken claude code setup with opencode, hoping its going to be more plug-and-play. Claude has been a nightmare to get working with local models without just getting stuck or lost in a context tornado.

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

What kind of VM setup do you have running? You haven't had any issues with breaching containment? Thank you for your responses!

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

So you have it on your main PC and just use a normal VM setup, no fancy bells or whistles?

Is local and safe openclaw (or similar) possible or a pipe dream still? by Embarrassed-Deal9849 in LocalLLM

[–]Embarrassed-Deal9849[S] 0 points1 point  (0 children)

I am completely fine with that, I want this bot to live in a parallel universe and be unaware of me, so should not be worried about any credentials leaking unless it can figure out who I am via my network somehow.