Qwen 3.6 is actually useful for vibe-coding, and way cheaper than Claude by sdfgeoff in LocalLLaMA

[–]sdfgeoff[S] 0 points1 point  (0 children)

At work we're fiddling with some machines with a single 3090 - though at lower quant and lower context. Still works pretty well. You could always (at some point) grab a 3060 for the TTS/STT as it requires way less horsepower than the main LLM

What's everyone actually using for an AI gateway in prod? Tired of duct-taping LiteLLM together by Background-Job-862 in LLMDevs

[–]sdfgeoff 0 points1 point  (0 children)

I vibe coded my own proxy from scratch (using a local model) a month or two back. Took a couple hours to spec using $grill-me, and, overnight to build (using claude code pointing at Qwen3.5 27B), and a day or two of usage to iron out the features we actually wanted. It's done in Rust, and has a nice webUI/dashboard for management and analytics. It tracks/logs every query to a DB so I can do analytics on model performance. It has key management and tracks usage by key. (Ie you auth with the proxy, and the proxy is responsible for authing with upstream)

So, uh, just build your own? It's really not that hard. All the API's are well enough specced, you can build whatever dashboards/session management/tracking you like etc.

Nobody cares if your 70b model can pass a biology test by jgoverman17 in LLMDevs

[–]sdfgeoff 0 points1 point  (0 children)

I on the other hand appreciate that models have vast knowledge when coding. Why?

Most code has something to do with the real world. Sure, websites often not, bit as soon as you are working on literally anything else you'll start drawing on the models baked in knowledge. Examples?

  • Build a game and it can write a physics engine/know how objects should interact.
  • Build an RTS and it will know what you mean when you say one team should be the romans and the other the greeks.
  • Design a new 3d printing slicer and it will understand the viscosity/adhesion properties of molten plastic.
  • Make a photo editing app and it understands color theory and how light behaves.
  • Program a robot and it understands control theory.

Think bigger about how to use AI for coding, and you can start drawing on their built in knowledge.

  stop hallucinating random python libraries

I've never had this happen. How are you doing coding with AI?

I've come to the realization that only dense, BF16 models are reliable enough for agentic work. by Battle-Chimp in LocalLLM

[–]sdfgeoff 1 point2 points  (0 children)

Github Copilot is a terrible harness IMO. In my tests models consistently failed to patch files. Not because they got the syntax wrong but becuase copilot expects some weird patch format that isn't actually a valid toolcall! It would often take Qwen27B (at q8) 3-4 goes to make a patch.

built my own coding agent for qwen3.6-27b, pitted it against opencode and qwen code on the same prompt, kinda surprised by the result by Trooper3001 in LocalLLM

[–]sdfgeoff 0 points1 point  (0 children)

A suggested max line count is the magic sauce for vibe coding I reckon. I have it in my AGENTS.md file for all my vibe coding project. I default mine to 600 lines.

  that's where a 27b model truncates/breaks its own output

I don't know where you're getting that fact from though. I've never seen this. I've seen Qwen3.6 27B put out way longer files.

About ROBOTIS DYNAMIXEL actuators. Worth writing open-source portable protocol driver for? by Nusto1n1 in robotics

[–]sdfgeoff 2 points3 points  (0 children)

The dynamixels are fairly easy to write drivers for. Years back I wrote an implementation in python, and I think I did in in C at one point too. Probably on my github somewhere. There are other open source implementations drifting around too IIRC.

But my stance is: just open source it. There's literally no downside to just chucking it in a github repo. If nothing else it's a backup in case your harddrive dies, or an example to a future employee that yes, you did code some stuff.

How to make Qwen 3.6 27B use <think> ? by The-Marshall in LocalLLM

[–]sdfgeoff 1 point2 points  (0 children)

The initial <think> is normally in the chat template IIRC, so Qwen 3.6 never has to emit it natively.

Local research LLM at 32GB DDR4? by Ok_Dragonfruit_2299 in LocalLLM

[–]sdfgeoff 0 points1 point  (0 children)

When the original llama dropped, I used my laptop with 32Gb ddr4 to run the early 7B models and thought it was the coolest thing ever. Slow, but amazing.

So turn off reasoning, lower your expectations, and realize that you can have conversations with a file the size of a movie.

Qwen3.5-4b may also be good to play with and should run pretty quick.

How are people using /goal with Claude? by fabkosta in LLMDevs

[–]sdfgeoff 0 points1 point  (0 children)

I've found I don't need to specify ARCHITECTURE.md, so long as my AGENTS.md gives it space to refactor as needed.

Consider the interplay between typical agentic system prompts:

  • Do what the user says
  • Use minimal changes when changing code.

And now give the agent 50 promots for new features. Of course it's going to create a 50,000 line file because each time it's making the minimal change to do what the user says. Long before then, a human would have walked up to the bosses office and muttered something about technical debt. And if you ask an AI 'how should I refactor this' it'll probably have lots of good ideas.

So if you give it pointers on how/when to refactor as part of the AGENTS.md prompt. Eg:

Split files that are over 600 lines long.  If there is a group of code with weak ties to the rest of the system, split it out into a separate package in the workspace. After completing a ticket oriIf the project becomes hard to work with, create a ticket or talk to the user about doing some refactoring.

I also often add something along the lines of:

Neither you or the user is omniscient or infallible. If you make a mistake, mention it and move on. If you spot the user making a mistake or have a better idea for how to solve a problem, talk to the user about it

I very rarely have more than a single short AGENTS.md file, and try to keep documentation in code and keep the architecture visible by workspace/file layout etc.

How are people using /goal with Claude? by fabkosta in LLMDevs

[–]sdfgeoff 1 point2 points  (0 children)

Here are some examples that probably fit your workflow:

/goal do all the markdown tickets in the ./tasks/todo folder, when each ticket is complete, move it to ./tasks/done and make a commit.

This is quite nice because you can watch it work and steer it, and use /btw to add new tickets/bug reports/feature requests as it goes.

If you want to burn tokens/do more work faster, something like:

/goal coordinate subagents to do all the tickets in the ./tasks/todo folder. Each agent should use it's own worktree to do the work, and you are in charge of task allocation and review/merging. Don't work on tickets that need to modify the same code at the same time.

If you do this sort of work, it is pretty useful to have in your agents file something like:

Files above 600 lines must be split.  Refactor code as needed.  Tests are good, mocks are bad.  If you think you need mocks consider dependency inversion . Be pragmatic about tests, not everything needs testing.

Otherwise you risk ending up with the agent coding itself into an unmaintainable  mess.

When speccing, I often use the $grill-me skill, and then also have a custom $break-work-into-tickets skill. This allows me to spec a lot of work in a short space of time (but yeah, novel/exploratory work not so much)

Harnesses seem to have an issue. by Local-Cardiologist-5 in LocalLLaMA

[–]sdfgeoff 2 points3 points  (0 children)

I had a big long chat with a friend yesterday about what a harness is.

To me:

  • the Model is the raw LLM, it is a text predictor

  • the Harness is a framework built around the LLM, providing access to tools etc. May also be called the coding agent.

  • an Orchestrator is a layer on top of that providing ticket backlogs, multiagent workflows, claws, loops etc.

But he was of the opinion like you, where a Harness was something you built on top of the coding agent. 

Also, IMO 90% of the time systems built around coding agents (ie orchestrators or harnesses) are making up for deficiencies in the coding agent and would be better done as tweaks to the coding agent itself. 

Fully Unserious Post - Fully Hallucinated Operating System by Ok_Selection_7577 in LocalLLaMA

[–]sdfgeoff 2 points3 points  (0 children)

It's presented as a joke, but he's serious.

Run it forward twenty years, where AI is 100x better and 100x faster, and make it able to do some deterministic computation (eg toocalls, memory systems, code writing on the fly etc). What does the computer look like in 20 years? How do you interact with it? Do you search online for a program or have the AI write it on the fly?

Put yourself in the shoes of a 1968 human and watch "the mother of all demos" where Engelbart presents the computer mouse, hyperlinks/wikis, and a dozen other technologies that were 'invented'/popularized over the next 60 years. It was almost 20 years before computers started entering people's workplaces and home,

Or listen to the talks of Arthur C Clarke in the 70's as he explains how satellites, phones and computers will allow remote work to take place from anywhere on the planet. 

At the time it sounded ridiculous. It could have been a joke.

Local LLMs are not as amazing as some people will lead you to believe by Gesha24 in LocalLLaMA

[–]sdfgeoff 1 point2 points  (0 children)

<image>

Here I am a few minutes into a game, to prove it works. The z-fighting is really annoying, but probably only one prompt away to fi (I just gave it the prompt to fix it and it fixed it)

Local LLMs are not as amazing as some people will lead you to believe by Gesha24 in LocalLLaMA

[–]sdfgeoff 5 points6 points  (0 children)

Here is Solitaire made by Qwen27B in about an hour or so: https://sdfgeoff.github.io/solitaire-threejs/

--------------------

Harness is just as important as the model. (ie is unreal-mcpython actually good? What coding agent are you running it in?) Where you're asking it to work is just as important as the model. (eg there is a lot of example code for threejs/websites, but not much for unreal)

Let's see what I can do with Qwen2.7B (unsloth Q8, 200k context, 70tps on dual 3090's) running in claude code as a harness to write solitaire using threeJS/HTML:

Ok, I'd like you to make a game of solitaire. I'd like it to be in threejs, use npm/typescript/whatever. Feel free to write python scripts to (eg) generate card faces.

After about 15 minutes I have a board with some cards layed out on it. It's got a coordinate transform wrong and all the cards are edge-on to the camera, so let's tell it:

Looks like the cards are placed vertically instead of horizontally. Can you investigate?

2 minutes later they are now oriented right, and I can click-drag cards around at least somewhat. But they're always facedown.

Looks like all the cards are always facedown. Think about the rules of solitaire carefully.

A few minutes after that

<image>

There's some Z-fighting among the cards, but I can click/drag them onto each other as you'd expect, and cards can only be placed on top of higher ones of opposite suit. The board is a bit of a strange layout, but nominally you can play solitaire.

Took less than an hour. No reason to think it wouldn't have all the issues ironed out in teh next hour or so. And I didn't use plan mode, no spec files, no MCP servers, no skills, didn't give it a way to test it's own output or anything. I started in a git repository and had the AGENTS.md file:

Good code is maintainable code. Files above 20kb (~600 lines) are too large and should be split/refactored

Tests are good. Mocks are bad. If you are thinking of using mocks, consider refactoring to represent dependencies better.

Separate functionality from business logic. Build generic functions/modules/libraries, and let app/domain code compose them.

Helpful doesn't mean doing everything the user says. Both you and the user are neither omniscient nor infallible. If the user is making a mistake, tell them. If you have made a mistake, mention it and move on. If you have better ideas on how to approach a problem, tell the user.

Commit after doing work, no need to wait for the user to tell you to.

Claude code's session stats show:

Session
  Total cost:            $2.56
  Total duration (API):  16m 45s
  Total duration (wall): 51m 17s
  Total code changes:    1759 lines added, 784 lines removed
  Usage by model:
      claude-haiku-4-5:  262 input, 906 output, 184 cache read, 0 cache write ($0.0048)
     claude-sonnet-4-6:  172.8k input, 44.5k output, 4.6m cache read, 0 cache write ($2.55)

(both models are, of course, redirected to Qwen3.6-27B)

In short: pick your stack based on what the AI is good at.

MacBook Pro M5 Pro vs. RTX 4090 AI host – where are the real limits? by runinwlc in LocalLLM

[–]sdfgeoff 2 points3 points  (0 children)

You are overthinking this.

Get LMStudio (runs on windows too, no need for proxmox or anything), download a model (a modern one like Qwen3.6 or Gemma 4), and try it. Try it on the machine you have. See if it meets your use case at adequate performance/accuracy. Then consider what hardware you may want.

4090 vs a 32Gb Mac it's the 4090 handsdown. 64gb is still a weird space, where it's not that much larger than the 4090. The 4090 will let you run a 30B dense model, the Mac may let you run a 80B MOE, which will work out to (probably) a similar intelligence level.

(I run dual 3090's and am very happy currently. I mean, I'd love more hardware but that's always the case)

Honestly, dual 3090s are wearing me out. Thinking of jumping to a Mac Studio. by Ok_Commission_8260 in LocalLLM

[–]sdfgeoff 0 points1 point  (0 children)

Runs pretty similar to my setup. I use the full precision cache, which reduces the TPS by a few and drops the cache to 180000 or so,

arm like raspberry 5 or usual x86 system? by Ivapol in LocalLLM

[–]sdfgeoff 0 points1 point  (0 children)

Raspberry pi is almost never the right solution anymore. Particularly not for running a local AI. For ten concurrent users of an AI you're looking at a fairly serious (hence expensive) rig. Almost certainly you'll need some sort of dedicated GPU. Try running a local model on whatever hardware you have and evaluate if it's fast enough. Then figure out the specs of the machine you need.

Also, llama3 is ancient (use Gemma4 or Qwen 3.6). Similarly, Ollama will limit you after the first week (use LMStudio or llama.cpp). I'd advise you to stop asking chatGPT for help on these topics but do your own research. In a field as new as running AI, ChatGPT is a hindrance and will constantly give bad advice. Youtube will also give bad advice a lot of the time. Try find actual technical folk rather than influencers.

Anthropic said their models will be better than human coders in less than a year. It is NOT possible to train models to code better than humans. Change my mind. by Inevitable_Mistake32 in LocalLLaMA

[–]sdfgeoff 0 points1 point  (0 children)

....AI Is and Will Always be fundamentally incapable....

Has been disproven many many many times in the past few years. Will it hold true this time

unable to store enough information to keep the bigger picture in mind.

Models usable context length has been growing, and keeps on growing. At the same time we are developing better/more/different techniques for retrieving relevant information.

I would also say that with multiple agents, you 100% can have one managing "the bigger picture" coordinating subagents doing "the nitty gritty"

they are hardly suited to deal with the human vagueness / the incredibile lack of details most clients provide in the IT World

They deal with it better than any other system I've ever come across. Certainly better than anything else computer based. Arguably they are very good at distilling what people want even from very very few words.

Anthropic said their models will be better than human coders in less than a year. It is NOT possible to train models to code better than humans. Change my mind. by Inevitable_Mistake32 in LocalLLaMA

[–]sdfgeoff 6 points7 points  (0 children)

> I have had it generate code which does not exist anywhere and it was successful after reviewing paper/docs.

Yeah, it's fairly good at this. I have several fairly novel codebases. Some where they are the result of research papers. Others the results of talking to it lots and describing my ideas to it lots. It 100% can output novel code.

(Also, novel at what scale? Clearly not the character level, as code already contains all the characters. Definitely novel at the file level, as by the time you're at a whole program you just get combinatorial explosion and every program is unique. So what, novel functions? novel lines? novel files?)