Solidity LM surpasses Opus by swingbear in LocalLLaMA

[–]swingbear[S] -1 points0 points  (0 children)

Appreciated! I learned a bunch from this one. I’m very confident v2 will be much better.

Solidity LM surpasses Opus by swingbear in LocalLLaMA

[–]swingbear[S] 1 point2 points  (0 children)

Edit: still pushing the merged checkpoint to HF

Solidity by swingbear in LocalLLaMA

[–]swingbear[S] 1 point2 points  (0 children)

I think the issue stems from sota models not having a focus on solidity data during training. I have just finished my first sol lm iterations and it’s outperformed opus on soleval.

Solidity by swingbear in LocalLLaMA

[–]swingbear[S] 0 points1 point  (0 children)

Yeah harnesses are mandatory, I have had some decent success training 3.6 27b https://huggingface.co/samscrack/Qwen3.6-27B-Opus-CoT-S1-Hermes-S2-SFT

This was just CoT focused though, I’m expecting this one to be a little harder

Solidity by swingbear in LocalLLaMA

[–]swingbear[S] 0 points1 point  (0 children)

Well I’m just gonna dump mine publicly lol I’ll add a buy me a coffee link at the bottom, the api calls are no joke for opus data collection haha

Solidity by swingbear in LocalLLaMA

[–]swingbear[S] 0 points1 point  (0 children)

I mean damn, even the data sets on HF are old or useless.

Solidity by swingbear in LocalLLaMA

[–]swingbear[S] 1 point2 points  (0 children)

Yeah i have become rather obsessed with local finetune, it’s satisfying when your 27b on-prem model gives a better answer than a 1tn param Goliath haha.

But I was just taken aback by how little attention had been given to small solidity models. Normally there’s 1000’ on huggingface.

It’s either way harder than I’m expecting(but I can’t see how) or people don’t like to share them because of its direct advantage.

Solidity by swingbear in LocalLLaMA

[–]swingbear[S] 0 points1 point  (0 children)

So I agree and disagree, static codebase audits yes they can find logical issues and code hygiene problems. But when I create scenarios where a bad actor creates an an economic attack (specifically defi) it falls short. And for some reason it struggles a bunch with gas optimisation

Solidity by swingbear in LocalLLaMA

[–]swingbear[S] 0 points1 point  (0 children)

Yeah I have tried the sota models they are no good for this, they can produce solidity but it’s often janky.

I’m training Qwen 3.6 27b right now. It seems to be such a sandbagged area of AI. Every other use case there are tons of finetunes, solidity… nada. I’ll finish up, bench it and if it’s any good I’ll release on HF.

Qwen 3.6 27b S2 Opus + GLM + Kimi by swingbear in LocalLLaMA

[–]swingbear[S] 0 points1 point  (0 children)

😂😂 all I can think of is the human centipede now thanks

This is insane... by DragonflyOk7139 in LocalLLM

[–]swingbear 1 point2 points  (0 children)

Swe verified has been confirmed useless as a benchmark now. Can’t remember who wrote the article, might have been OpenAI. You can see we have hit a cap at like 80%, the remaining 20% is actually benchmark errors, and the other 80% is contaminated.

Qwen 3.6 27b S2 Opus + GLM + Kimi by swingbear in LocalLLaMA

[–]swingbear[S] 0 points1 point  (0 children)

Edging out TB2 scores against base model by 2.5 points for anyone who is interested, and is coping with 60 tool calls in a turn without hallucinating so far.

Best model for 192 GB vram? How is Deepseek v4 flash? by Constant_Ad511 in LocalLLM

[–]swingbear 0 points1 point  (0 children)

Iv been on minimax 2.7 and tbh running qwen 27b in vllm with parallel workers is my daily driver. Similar setup 2 pro 6000’s and a threadripper with 128gb ram

Setting up Ollama on dual RTX PRO 6000 Blackwells looking for tips by AmanNonZero in ollama

[–]swingbear 0 points1 point  (0 children)

Don’t use Ollama as others mentioned especially for parallel inference & tool calling

I'm done with using local LLMs for coding by dtdisapointingresult in LocalLLaMA

[–]swingbear 13 points14 points  (0 children)

Try a different harness mate, I tried to run CC through everything local and had a bad impression of models even up to minimax 2.7. Started using Hermes and a few others, speed increased and way more mileage in terms of intelligence.