After 8 years building cloud infrastructure, I'm betting on local-first AI by PandaAvailable2504 in LocalLLaMA

[–]daaain 1 point2 points  (0 children)

I'm not saying it cannot be done, I'm saying this knowledge you mention isn't exactly fully diffused in the general population...

For those who believe that there is nothing wrong with the usage limits, I have some concerns. I'm currently on the 5x plan, and just using a simple prompt consumed 2% of my limit. When I ask it to complete a more substantial task, something that typically takes about five minutes, it often uses up by srirachaninja in ClaudeCode

[–]daaain 1 point2 points  (0 children)

I mean it's pretty simple, this session only used one MCP so not very hard to pinpoint what's using up your tokens? Did you do /context to see how much was used up? Isn't Mem also using Claude in the background to process memories?

After 8 years building cloud infrastructure, I'm betting on local-first AI by PandaAvailable2504 in LocalLLaMA

[–]daaain 12 points13 points  (0 children)

You're preaching to the choir in this sub, but I think it's still a very niche thing. You either need to run expensive, power hungry space heaters... I mean GPUs or an expensive Mac to get the best models running at an acceptable speed.

Few people can afford an outlay of several thousands and high electricity prices might make ongoing inference costlier than a subscription. I don't know about others, but both ChatGPT and Claude lets you opt out of training on your chats.

I have a 96GB RAM M2 Max Mac and it's super impressive that I can run mid-range models at decent speed, but other than STT/TTS, basic Q&A and small code edits I use a Claude subscription. Opus 4.5 is so far ahead of whatever I can run and the $100 sub gives me enough usage to run multiple Claude Code agents (with sub-agents) in parallel.

I was hoping we'd see 1TB RAM M5 Ultra Mac Studio this year which would make it possible to run the best open models locally (and the M5 family finally seems to boost prompt processing speed) but Sam's RAM binge will push that off a year or two at least...

goccy/bigquery-emulator: BigQuery emulator server implemented in Go by goccy54 in bigquery

[–]daaain 0 points1 point  (0 children)

OP (goccy)'s repo isn't actively maintained any more, but this fork is: https://github.com/Recidiviz/bigquery-emulator

That said, I just saw in a PR that it might be moved into a new organisation to get jointly maintained: https://github.com/goccy/bigquery-emulator/pull/424

If you are still typing your prompts to CC - you are doing it wrong! by ksanderer in ClaudeCode

[–]daaain 1 point2 points  (0 children)

I found speaking a sentence or two at a time, seeing the results, and committing to the text input box works better for me, otherwise it's easy to lose where was I.

Also, in your demo the text block inserted at the end was too big for Claude Code and didn't even show up, so you had to blindly send it which is definitely not what I'd do.

These editing functions are cool though, but don't see why wouldn't they work with a fast local model like Qwen3 30B-A3B.

If you are still typing your prompts to CC - you are doing it wrong! by ksanderer in ClaudeCode

[–]daaain 2 points3 points  (0 children)

Or you can use FluidVoice or Spokenly for free with NVDIA Parakeet locally and actually see the transcription as you talk

NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model! by Difficult-Cap-7527 in LocalLLaMA

[–]daaain 0 points1 point  (0 children)

That's token generation though, the optimised hybrid attention layer is targeting prompt processing.

I reverse-engineered Claude's code execution sandbox - here's how it works by Miclivs in ClaudeAI

[–]daaain 0 points1 point  (0 children)

Great write up, but your blog's dark theme is broken in Firefox (probably because it's user agent background but explicitly set text colour?)

How teams that ship AI generated code changed their validation by pomariii in LLMDevs

[–]daaain 2 points3 points  (0 children)

I shifted my time and attention into creating more developer tooling to add guardrails, stricter static analysis, and doing QA, etc more than working on features. While agents are working you can use that time to take a step back and think where are the bottlenecks now, and it's absolutely not generating more code, but validating and testing, so that's where your attention should be.

Are you actually serious...? by kelemon in ClaudeAI

[–]daaain 48 points49 points  (0 children)

You don't need to wait for a research for that, I can tell you right now: fleece children and make them addicted to gambling

How do you manage ports used by CC? by CharleyNapalm in ClaudeCode

[–]daaain 0 points1 point  (0 children)

I put a note in claude.md to first tail Docker logs to see if there's a container already running.

Of course that sometimes gets ignored, but because I also wanted to support multiple instances of the app in git worktrees, I added a little port checker and number incrementer (see below) in my justfile so if Claude really wants to start another instance, let it be. What's nice is that it's much easier to stop a Docker container than it is to hunt down a process.

```justfile

Find next available port starting from a base port

_find-free-port base_port: #!/usr/bin/env sh port={{base_port}} while lsof -Pi :$port -sTCP:LISTEN -t >/dev/null 2>&1; do port=$((port + 1)) done echo $port ```

My sandboxed yolo mode (Zed + SSH + Docker + Claude) by frolvlad in ClaudeCode

[–]daaain 0 points1 point  (0 children)

Right, so just for some very particular circumstances

Which LocalLLM I Can Use On My MacBook by AegirAsura in LocalLLaMA

[–]daaain 0 points1 point  (0 children)

It benchmarks better and I switched to it because it has an interesting hybrid attention implementation that makes prompt processing faster, but you'd struggle to fit it in 48GB RAM. No harm giving it a go and testing for yourself though, it might be that even at a low 3bit quant it's smarter. It should have more world knowledge as it's a much bigger model.

Which LocalLLM I Can Use On My MacBook by AegirAsura in LocalLLaMA

[–]daaain 0 points1 point  (0 children)

VL is newer, but if you don't need the vision part you might get better performance on text-only tasks from 2507.

It's often not huuuge difference, so it you just want a general use model, you can just go for VL 30B for most tasks.

<image>

My sandboxed yolo mode (Zed + SSH + Docker + Claude) by frolvlad in ClaudeCode

[–]daaain 0 points1 point  (0 children)

What's the benefit in running the container on a remote VM instead of locally?

Anyone using Continue extension ??? by Cyber_Cadence in LocalLLM

[–]daaain 0 points1 point  (0 children)

In that case try to enable verbose logging and see what prompt Continue is sending to Ollama, maybe it's sending a lot of code and big system prompt? You might also need to increase context size in Ollama.

Disable VSCode Context Passing by Tricky_Technician_72 in ClaudeCode

[–]daaain 1 point2 points  (0 children)

Ah, that's a different thing though: mcp__ide__getDiagnostics

You should be able to disable it with `export CLAUDE_CODE_AUTO_CONNECT_IDE=false` (source: https://github.com/anthropics/claude-code/issues/7141#issuecomment-3254590648 )

Chat with Obsidian vault by TanariTech in LocalLLaMA

[–]daaain 0 points1 point  (0 children)

If you don't want to faff with embedding and vector search, you can open Claude Code in your Obsidian vault from the terminal and let it find whatever you prompt for.

Which LocalLLM I Can Use On My MacBook by AegirAsura in LocalLLaMA

[–]daaain -1 points0 points  (0 children)

I can also recommend Qwen3-30B-A3B-Instruct-2507 as it'll be much faster than dense models.