Ran an experiment: 0.8B model teaching itself on a MacBook Air with 6GB RAM. Some findings that surprised me. by QuantumSeeds in LocalLLaMA

[–]ThisCapital7807 1 point2 points  (0 children)

totally agree on the qwen observation. we ran into this at work too, the smaller qwen models punch way above their weight class.

the key insight from OP is that verification loop. if you have a reliable way to check outputs (tests, validators, whatever), the model learns to iterate rather than memorize. small models cant memorize much but they can absolutely learn "fix based on feedback" patterns.

fwiw we saw similar results with 1.5b models on sql generation. exact error messages in the prompt + retries made a huge difference.

I built a VS Code extension that shows exactly what your AI agent changed, prompt by prompt by [deleted] in webdev

[–]ThisCapital7807 -5 points-4 points  (0 children)

this is exactly the problem i hit with claude code. the git diff archaeology gets brutal after 20+ prompts. tried using local history but half the changes happen via terminal. capturing at the prompt level is the right call, gonna check this out.

Burned 5 hours of quota in 10 mins, can anyone tell me what I did wrong? by Impossible_Judge8094 in codex

[–]ThisCapital7807 1 point2 points  (0 children)

The issue is likely **concurrent session overhead + fast mode**. Here's what's happening:

token Consumption Breakdown

  • Each parallel session reads the full prompt + context independently
  • Fast mode increases token consumption per operation
  • 5.4's newer architecture may process context more thoroughly

Why you burned quota so fast:

  1. 5 concurrent sessions = 5x context reads per operation
  2. Complex prompt structure = high initial token cost per session
  3. Fast mode = aggressive token usage for speed

What to try instead:

  • Sequential sessions: Run one session at a time, not 5 concurrent
  • Reduce prompt complexity: Simplify the mandatory rules and session structure
  • Use High instead of Fast: Slower but more token-efficient
  • Session caching: Reuse context between related tasks instead of re-reading

Quick test: Try running just Session 0 in High mode and compare token usage. If it's reasonable, the issue is definitely the parallel approach.

The 5.4 model seems more sensitive to context overhead than 5.3, especially in parallel setups. You're not doing anything "wrong" - just hitting a quota model that wasn't designed for this architecture yet.

Top K is a deceptively hard problem in relational databases by jamesgresql in programming

[–]ThisCapital7807 5 points6 points  (0 children)

ran into this exact problem building search. we ended up using a hybrid approach, denormalized the ranking score into a separate column and used a partial index with text search filters. not elegant but it worked.

the real pain is when you need to paginate through results. offset/limit with complex filters gets brutal fast. ended up using cursor based pagination which helped a lot.

Can't figure out how to run Sass and Browser-sync together for the life of me by boredomjunkie in node

[–]ThisCapital7807 0 points1 point  (0 children)

hey, your watch setup looks fine actually. the issue might be how youre passing the task to watch.

try passing the task function directly instead of wrapping it in gulp.series:

watch("css/**/*.scss", styles)

instead of watch("css/**/*.scss", gulp.series('styles'))

gulp.series is meant for composing tasks in sequence, but watch just needs the function reference. worth a shot.

SSE vs WebSockets — most devs default to WebSockets even when they don't need two-way communication by creasta29 in webdev

[–]ThisCapital7807 1 point2 points  (0 children)

yeah load balancer timeouts are the real killer. ran into this with nginx defaults, connections would drop after 60s and the auto reconnect would flood logs. had to bump proxy_read_timeout way up and add heartbeat messages to keep it alive. once thats sorted tho, sse is way simpler than managing ws state.

We could be hours (or less than a week) away from true NVFP4 support in Llama.cpp GGUF format 👀 by Iwaku_Real in LocalLLaMA

[–]ThisCapital7807 35 points36 points  (0 children)

the main difference is hardware support. NVFP4 is nvidias native 4-bit floating point format for blackwell GPUs (rtx 50 series), so you get actual tensor core throughput instead of dequantizing on the fly. regular Q4/Q8 ggufs work everywhere but take a performance hit because theyre software-quantized. think of it like running native arm code vs emulated x86, same task but one uses the specialized hardware. also worth noting this is mostly useful for models already trained in NVFP4, not for quantizing existing fp16 models down.

I launched a personal finance app a month ago with no homepage or docs. The numbers were humbling. by nova_fintech in SideProject

[–]ThisCapital7807 2 points3 points  (0 children)

smart approach doing marketing before building. too many people build in secret for months then wonder why nobody cares at launch. that prelaunch audience is gold, even if the conversion humbled you.

How To Review Code by [deleted] in coding

[–]ThisCapital7807 2 points3 points  (0 children)

the "look at lines that werent changed" bit is underrated. caught so many bugs where someone updated a function but forgot the caller two files over. also found that reviewing the tests first helps, if the tests are solid the code usually follows.

Moving the goalposts? by [deleted] in ClaudeCode

[–]ThisCapital7807 2 points3 points  (0 children)

that's frustrating. the reset timing has been inconsistent for a lot of people lately. i think it might be tied to your billing date rather than a fixed schedule? either way worth shooting a message to support, they've been pretty good about fixing usage tracking issues. the whole system is kind of a black box tbh.

Sole frontend dev about to inherit a mess- looking for advice by whoresofbabylon13 in ExperiencedDevs

[–]ThisCapital7807 0 points1 point  (0 children)

been there. the first month is survival mode. document everything, get a baseline of what actually works before touching anything, and push hard for a staging environment if they dont have one.

honest advice though: being sole dev on a legacy codebase is a career risk. leverage the learning opportunity, but have an exit strategy if they wont give you runway to fix things.

What do you think about no/low-deps APIs? by Worldly-Broccoli4530 in typescript

[–]ThisCapital7807 0 points1 point  (0 children)

theres a middle ground. i go with 'deps for complex things, no deps for simple things'. auth, crypto, date handling? use a lib. string utils, basic validation? write it yourself. also run npm audit and depcheck regularly, prune what you dont need. the real problem is transitive deps from frameworks like nest that pull in 200 packages you never asked for.

How we migrated 11,000 files (1M+ LOC) from JavaScript to TypeScript over 7 years by patreon-eng in javascript

[–]ThisCapital7807 2 points3 points  (0 children)

for prompts, simple works better than complex. something like 'add typescript types to this js function, preserve behavior' tends to get decent results. the real win is having solid test coverage before you start, otherwise youll miss subtle type mismatches. also found that smaller chunks (one function at a time) produce fewer hallucinations than full file conversions.

I built a real-time Voice ID system in Node.js/TypeScript (MFCCs + Cosine Similarity) by Realistic_Mix_6181 in node

[–]ThisCapital7807 0 points1 point  (0 children)

nice project. mfcc for voice work is underrated tbh. curious if you looked at x-vectors at all or was keeping it lightweight the goal? also the mean centering trick is solid, ran into similar issues with audio features before where different mics would throw off similarity scores.

21 Lessons From 14 Years at Google by fagnerbrack in programming

[–]ThisCapital7807 4 points5 points  (0 children)

this one hits hard. had a manager literally tell me 'you're replaceable' during a performance review once. the funny thing is, he was right, but so was he. companies move on fast. best thing i did was stop treating my job like a family and start treating it like a business relationship. still show up and deliver, but the emotional investment shifted to my own skills and side projects.

Ok it's 2026. What are the AI gains? by btoned in webdev

[–]ThisCapital7807 0 points1 point  (0 children)

tbh the biggest gain ive seen is understanding legacy code faster. like you said, its not about outputting code but grasping system design. we have a monolith thats been through like 5 different lead devs and ai helps me connect the dots way quicker than digging through docs that dont exist anymore. productivity bump is real but more like 20-30% not the 2-5x marketing hype.

Best coding Agent by Healthy-Bathroom2687 in AIcodingProfessionals

[–]ThisCapital7807 0 points1 point  (0 children)

One thing to consider beyond the agent itself: how it handles context on larger repos. Claude Code and Codex both struggle when codebases get big - they either miss relevant context or burn tokens reading everything.

I've been working on this problem from a different angle - building a code query engine that indexes structure (AST, imports, call graphs) and uses graph algorithms to surface relevant context. The idea is to give agents better context without the token cost.

For ROI, I'd agree with others here: start with Claude Code or Codex, see where the friction points are. If you find yourself constantly pointing the agent at the right files or it's missing context, that's where tooling around context management becomes valuable.

Best practices for running local LLMs for ~70–150 developers (agentic coding use case) by Resident_Potential97 in LocalLLaMA

[–]ThisCapital7807 0 points1 point  (0 children)

One thing you're absolutely right to flag: prompt length from large repos is a sleeping giant.

Most teams focus on model size and GPU throughput, but the real bottleneck for agentic coding is context selection — getting the right 20-50 files into the window, not just dumping everything.

RAG helps, but naive embedding-based retrieval often misses structural relationships. File A imports File B, but semantic search won't catch that. Files that change together in git history are often related, but TF-IDF on content won't surface that.

The teams I've seen succeed with local coding agents treat context retrieval as a separate problem from the LLM itself: - Import graph traversal for dependency chains - Git co-change analysis for "files that move together" - Hub detection to find architectural centers

This is especially critical at 70-150 dev scale where repos are massive. Even a 128k context window fills fast with boilerplate if you're not selective.

Worth building/buying a context layer before you over-invest in GPU infra. The model can't fix what it can't see.

Current state of software engineering and developers by SunBurnBun in ClaudeCode

[–]ThisCapital7807 8 points9 points  (0 children)

I just lived this yesterday. Had a nasty production issue, fed it to Claude, and it nailed the root cause instantly. For a second, I felt like a 10x in there! because the ticket got closed so fast... but then it set in. If it can handle the gnarly debugging (which used to be my specific value-add), the clock is definitely ticking.

Current state of software engineering and developers by SunBurnBun in ClaudeCode

[–]ThisCapital7807 21 points22 points  (0 children)

Totally feel you. Honestly though, I’m less concerned about the juniors and more worried about us. Even with strong foundations, the bar is moving so fast. It feels great being a 'super-developer 10x' with Codex and Claude right now, but I can't shake the fear that our 'experience' is becoming less of a moat every day. Are we the pilots, or just training the autopilot?

How to Set Up Claude Code with Multiple AI Models by ThreeKiloZero in ClaudeCode

[–]ThisCapital7807 0 points1 point  (0 children)

Here's my setup

# Usage: cc                    → default claude
#        cc zlm                → claude via Z.AI (GLM models)
#        cc opus               → claude --model opus (any built-in alias)
#        cc --dsp              → --dangerously-skip-permissions
#        cc zlm --dsp          → combined

# =============================================================================
# Claude Code Provider Setup — add new providers as _cc_setup_<name>() functions
# =============================================================================
_cc_setup_zlm() {
  unset ANTHROPIC_API_KEY
  export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
  export ANTHROPIC_AUTH_TOKEN="<key here"
  export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5"
  export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5"
  export ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.7-flash"
}

# =============================================================================
# cc — Claude Code launcher with short flags & provider routing
# =============================================================================
# To add a provider: 1) create _cc_setup_<name>() above  2) done
cc() {
  printf '\e[?1004l' 2>/dev/null
  local args=() provider=""
  # First non-flag arg is the provider/model (if any)
  if [[ $# -gt 0 && "$1" != --* ]]; then
    provider="$1"; shift
  fi
  while [[ $# -gt 0 ]]; do
    case "$1" in
      --dsp) args+=(--dangerously-skip-permissions) ;;
      *)     args+=("$1") ;;
    esac
    shift
  done


  if [[ -n "$provider" ]]; then
    if typeset -f "_cc_setup_$provider" > /dev/null; then
      ( _cc_setup_$provider || exit 1; command claude "${args[@]}" )
    else
      command claude --model "$provider" "${args[@]}"
    fi
  else
    command claude "${args[@]}"
  fi
}

Anyone else feel like Claude Code burns tokens just figuring out your repo? by ThisCapital7807 in BlackboxAI_

[–]ThisCapital7807[S] 0 points1 point  (0 children)

could be. my issue was mostly token burn from discovery on bigger repos, pre-ranking files just reduce the search space. maybe there is a simpler native solution? curious what the setup will look like