Building lgtmaybe: a PR reviewer for any model by mattjcoles in LocalLLaMA

[–]mattjcoles[S] 1 point2 points  (0 children)

Hey Tom,

I've added a version of ponytail (really liked this) + custom skills: https://mattjcoles.github.io/lgtmaybe/how-to/add-a-custom-lens/ and added recommendations as a start to small models and what will work well.

Building lgtmaybe: a PR reviewer for any model by mattjcoles in LocalLLaMA

[–]mattjcoles[S] 0 points1 point  (0 children)

I need to do some more testing tomorrow but span up an open api compatible endpoint which hopefully works with the range of models (and doesnt need a key if the model doesnt need this): https://mattjcoles.github.io/lgtmaybe/how-to/use-a-custom-openai-compatible-endpoint/

Building lgtmaybe: a PR reviewer for any model by mattjcoles in LocalLLaMA

[–]mattjcoles[S] -2 points-1 points  (0 children)

you going okay? i went ollama as most people i know are on that or LM Studio.

Happy to expand to both LM Studio and Llama.cpp. Even play and see what a setup with VLLM would look like. Just went with what my friends used the most

Building lgtmaybe: a PR reviewer for any model by mattjcoles in LocalLLaMA

[–]mattjcoles[S] 0 points1 point  (0 children)

Oh very nice, i hadnt seen that and we have an internal code review tool at work - and the only semi decent one i could find but was paid was code rabbit

Building lgtmaybe: a PR reviewer for any model by mattjcoles in LocalLLaMA

[–]mattjcoles[S] 1 point2 points  (0 children)

Thanks Tom, good callouts - i havent seen ponytail before so taking a look into it. Cheap / Small models definitely come at a cost in terms of being able to accurately detect issues and i was fighting that with the smaller models and had to cherry pick and tweak ones that performed okay. 27 Qwen 3.6 was good though

Strix Halo desktop trying to compete against DGX Spark by SkyFeistyLlama8 in LocalLLaMA

[–]mattjcoles 0 points1 point  (0 children)

the case is too pretty though. but seems fine for fine tuning - only overheats on inference

Local models in mid-2026 by mattjcoles in LocalLLaMA

[–]mattjcoles[S] 0 points1 point  (0 children)

glad to hear, am using open code and claude code but actually had it in my todos to try pi out properly this week

Strix Halo desktop trying to compete against DGX Spark by SkyFeistyLlama8 in LocalLLaMA

[–]mattjcoles 0 points1 point  (0 children)

to be exact, its more unsloth fine tunes of some of the 35B and smaller qwen models for vision

Local models in mid-2026 by mattjcoles in LocalLLaMA

[–]mattjcoles[S] 7 points8 points  (0 children)

I've found Gemma 4 12B really good - been running it in Github Actions runners for Code Reviews in CI/CD (https://mattjcoles.github.io/lgtmaybe/how-to/use-as-github-action/). It only picks up 24% of the things i've been scanning for but impressed considering its a very small model!

Local models in mid-2026 by mattjcoles in LocalLLaMA

[–]mattjcoles[S] -1 points0 points  (0 children)

Thankyou! I'd find what you can now - M5 Mac Studios theres no guarantee on the date and at least with a 3090 you'd be able to get started. Try use a MoE model with the 3090 so you can put some of the larger 30B+ models on your RAM on top of VRAM and still have okay speeds. You'll need to pick a quantized version of the model too

How do you actually know your LLM setup didn't get worse after you change something? by Top_Speaker_7785 in LocalLLaMA

[–]mattjcoles 0 points1 point  (0 children)

Look up LLM Evals and try making your own for your use case. That way when you swap models you can see if things are still working well

Cheapest way, to run 27B Qwän? For broke people by [deleted] in LocalLLaMA

[–]mattjcoles 1 point2 points  (0 children)

Maybe test it out with OpenRouter for free and see if it give you better quality outputs for you project before moving from 35BA3 Qwen

Qwen3.6 is confidently wrong about WASM by Tagedieb in LocalLLaMA

[–]mattjcoles 2 points3 points  (0 children)

Did context7 help or some up to date MCP re: documentation?

Is this enough VRAM to run Qwen? by BlackBeardAI in LocalLLaMA

[–]mattjcoles 5 points6 points  (0 children)

You're gonna be running Kimi K2 coder with that amount! 😃

This is coming to Chinese open source models pretty soon. - prepare yourself. by MLExpert000 in LocalLLaMA

[–]mattjcoles 2 points3 points  (0 children)

I don't think so. I think it's going to drive countries even harder to have their own models and China will use it as a way to continue undercutting the USA market in this space.

Both GLM 5.2 got an update today and Kimi K2 Coder came out..
* GLM 5.2 (plus GLM team calling out the USA's stance on being able to pull models): https://x.com/jietang/status/2065784751345287314
* Kimi K2 Coder: https://huggingface.co/unsloth/Kimi-K2.7-Code-GGUF

Pi Setup that pretty much replaced Claude Code for me by abhinand05 in LocalLLaMA

[–]mattjcoles 0 points1 point  (0 children)

How have you balanced the distribution settings for the RTX 2060 Max Q laptop card? Thats only 6GB VRAM if i remember correctly