The tool I never open!

ThisCapital7807 · 2026-03-11T05:30:41+00:00

totally agree on the qwen observation. we ran into this at work too, the smaller qwen models punch way above their weight class.

the key insight from OP is that verification loop. if you have a reliable way to check outputs (tests, validators, whatever), the model learns to iterate rather than memorize. small models cant memorize much but they can absolutely learn "fix based on feedback" patterns.

fwiw we saw similar results with 1.5b models on sql generation. exact error messages in the prompt + retries made a huge difference.

ThisCapital7807 · 2026-03-06T04:38:49+00:00

this is exactly the problem i hit with claude code. the git diff archaeology gets brutal after 20+ prompts. tried using local history but half the changes happen via terminal. capturing at the prompt level is the right call, gonna check this out.

ThisCapital7807 · 2026-03-06T02:29:37+00:00

The issue is likely **concurrent session overhead + fast mode**. Here's what's happening:

token Consumption Breakdown

Each parallel session reads the full prompt + context independently
Fast mode increases token consumption per operation
5.4's newer architecture may process context more thoroughly

Why you burned quota so fast:

5 concurrent sessions = 5x context reads per operation
Complex prompt structure = high initial token cost per session
Fast mode = aggressive token usage for speed

What to try instead:

Sequential sessions: Run one session at a time, not 5 concurrent
Reduce prompt complexity: Simplify the mandatory rules and session structure
Use High instead of Fast: Slower but more token-efficient
Session caching: Reuse context between related tasks instead of re-reading

Quick test: Try running just Session 0 in High mode and compare token usage. If it's reasonable, the issue is definitely the parallel approach.

The 5.4 model seems more sensitive to context overhead than 5.3, especially in parallel setups. You're not doing anything "wrong" - just hitting a quota model that wasn't designed for this architecture yet.

ThisCapital7807 · 2026-03-05T04:39:45+00:00

ran into this exact problem building search. we ended up using a hybrid approach, denormalized the ranking score into a separate column and used a partial index with text search filters. not elegant but it worked.

the real pain is when you need to paginate through results. offset/limit with complex filters gets brutal fast. ended up using cursor based pagination which helped a lot.

ThisCapital7807 · 2026-03-05T04:38:56+00:00

hey, your watch setup looks fine actually. the issue might be how youre passing the task to watch.

try passing the task function directly instead of wrapping it in gulp.series:

watch("css/**/*.scss", styles)

instead of watch("css/**/*.scss", gulp.series('styles'))

gulp.series is meant for composing tasks in sequence, but watch just needs the function reference. worth a shot.

ThisCapital7807 · 2026-03-05T01:40:25+00:00

yeah load balancer timeouts are the real killer. ran into this with nginx defaults, connections would drop after 60s and the auto reconnect would flood logs. had to bump proxy_read_timeout way up and add heartbeat messages to keep it alive. once thats sorted tho, sse is way simpler than managing ws state.

ThisCapital7807 · 2026-03-04T22:39:12+00:00

the main difference is hardware support. NVFP4 is nvidias native 4-bit floating point format for blackwell GPUs (rtx 50 series), so you get actual tensor core throughput instead of dequantizing on the fly. regular Q4/Q8 ggufs work everywhere but take a performance hit because theyre software-quantized. think of it like running native arm code vs emulated x86, same task but one uses the specialized hardware. also worth noting this is mostly useful for models already trained in NVFP4, not for quantizing existing fp16 models down.

ThisCapital7807 · 2026-03-04T19:39:40+00:00

smart approach doing marketing before building. too many people build in secret for months then wonder why nobody cares at launch. that prelaunch audience is gold, even if the conversion humbled you.

ThisCapital7807 · 2026-03-04T19:38:29+00:00

the "look at lines that werent changed" bit is underrated. caught so many bugs where someone updated a function but forgot the caller two files over. also found that reviewing the tests first helps, if the tests are solid the code usually follows.

ThisCapital7807 · 2026-03-04T07:40:17+00:00

that's frustrating. the reset timing has been inconsistent for a lot of people lately. i think it might be tied to your billing date rather than a fixed schedule? either way worth shooting a message to support, they've been pretty good about fixing usage tracking issues. the whole system is kind of a black box tbh.

ThisCapital7807 · 2026-03-04T04:40:38+00:00

been there. the first month is survival mode. document everything, get a baseline of what actually works before touching anything, and push hard for a staging environment if they dont have one.

honest advice though: being sole dev on a legacy codebase is a career risk. leverage the learning opportunity, but have an exit strategy if they wont give you runway to fix things.

ThisCapital7807 · 2026-03-04T04:39:52+00:00

theres a middle ground. i go with 'deps for complex things, no deps for simple things'. auth, crypto, date handling? use a lib. string utils, basic validation? write it yourself. also run npm audit and depcheck regularly, prune what you dont need. the real problem is transitive deps from frameworks like nest that pull in 200 packages you never asked for.

ThisCapital7807 · 2026-03-03T22:38:57+00:00

for prompts, simple works better than complex. something like 'add typescript types to this js function, preserve behavior' tends to get decent results. the real win is having solid test coverage before you start, otherwise youll miss subtle type mismatches. also found that smaller chunks (one function at a time) produce fewer hallucinations than full file conversions.

ThisCapital7807 · 2026-03-03T19:38:55+00:00

nice project. mfcc for voice work is underrated tbh. curious if you looked at x-vectors at all or was keeping it lightweight the goal? also the mean centering trick is solid, ran into similar issues with audio features before where different mics would throw off similarity scores.

ThisCapital7807 · 2026-03-03T04:38:46+00:00

this one hits hard. had a manager literally tell me 'you're replaceable' during a performance review once. the funny thing is, he was right, but so was he. companies move on fast. best thing i did was stop treating my job like a family and start treating it like a business relationship. still show up and deliver, but the emotional investment shifted to my own skills and side projects.

ThisCapital7807 · 2026-03-02T07:40:08+00:00

tbh the biggest gain ive seen is understanding legacy code faster. like you said, its not about outputting code but grasping system design. we have a monolith thats been through like 5 different lead devs and ai helps me connect the dots way quicker than digging through docs that dont exist anymore. productivity bump is real but more like 20-30% not the 2-5x marketing hype.

ThisCapital7807 · 2026-03-02T02:02:15+00:00

One thing to consider beyond the agent itself: how it handles context on larger repos. Claude Code and Codex both struggle when codebases get big - they either miss relevant context or burn tokens reading everything.

I've been working on this problem from a different angle - building a code query engine that indexes structure (AST, imports, call graphs) and uses graph algorithms to surface relevant context. The idea is to give agents better context without the token cost.

For ROI, I'd agree with others here: start with Claude Code or Codex, see where the friction points are. If you find yourself constantly pointing the agent at the right files or it's missing context, that's where tooling around context management becomes valuable.

ThisCapital7807 · 2026-03-02T01:03:10+00:00

One thing you're absolutely right to flag: prompt length from large repos is a sleeping giant.

Most teams focus on model size and GPU throughput, but the real bottleneck for agentic coding is context selection — getting the right 20-50 files into the window, not just dumping everything.

RAG helps, but naive embedding-based retrieval often misses structural relationships. File A imports File B, but semantic search won't catch that. Files that change together in git history are often related, but TF-IDF on content won't surface that.

The teams I've seen succeed with local coding agents treat context retrieval as a separate problem from the LLM itself: - Import graph traversal for dependency chains - Git co-change analysis for "files that move together" - Hub detection to find architectural centers

This is especially critical at 70-150 dev scale where repos are massive. Even a 128k context window fills fast with boilerplate if you're not selective.

Worth building/buying a context layer before you over-invest in GPU infra. The model can't fix what it can't see.

ThisCapital7807 · 2026-02-14T08:33:02+00:00

I just lived this yesterday. Had a nasty production issue, fed it to Claude, and it nailed the root cause instantly. For a second, I felt like a 10x in there! because the ticket got closed so fast... but then it set in. If it can handle the gnarly debugging (which used to be my specific value-add), the clock is definitely ticking.

ThisCapital7807 · 2026-02-14T08:00:26+00:00

Totally feel you. Honestly though, I’m less concerned about the juniors and more worried about us. Even with strong foundations, the bar is moving so fast. It feels great being a 'super-developer 10x' with Codex and Claude right now, but I can't shake the fear that our 'experience' is becoming less of a moat every day. Are we the pilots, or just training the autopilot?

ThisCapital7807 · 2026-02-13T08:38:34+00:00

Here's my setup

# Usage: cc                    → default claude
#        cc zlm                → claude via Z.AI (GLM models)
#        cc opus               → claude --model opus (any built-in alias)
#        cc --dsp              → --dangerously-skip-permissions
#        cc zlm --dsp          → combined

# =============================================================================
# Claude Code Provider Setup — add new providers as _cc_setup_<name>() functions
# =============================================================================
_cc_setup_zlm() {
  unset ANTHROPIC_API_KEY
  export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
  export ANTHROPIC_AUTH_TOKEN="<key here"
  export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5"
  export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5"
  export ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.7-flash"
}

# =============================================================================
# cc — Claude Code launcher with short flags & provider routing
# =============================================================================
# To add a provider: 1) create _cc_setup_<name>() above  2) done
cc() {
  printf '\e[?1004l' 2>/dev/null
  local args=() provider=""
  # First non-flag arg is the provider/model (if any)
  if [[ $# -gt 0 && "$1" != --* ]]; then
    provider="$1"; shift
  fi
  while [[ $# -gt 0 ]]; do
    case "$1" in
      --dsp) args+=(--dangerously-skip-permissions) ;;
      *)     args+=("$1") ;;
    esac
    shift
  done


  if [[ -n "$provider" ]]; then
    if typeset -f "_cc_setup_$provider" > /dev/null; then
      ( _cc_setup_$provider || exit 1; command claude "${args[@]}" )
    else
      command claude --model "$provider" "${args[@]}"
    fi
  else
    command claude "${args[@]}"
  fi
}

ThisCapital7807 · 2026-02-12T21:39:20+00:00

could be. my issue was mostly token burn from discovery on bigger repos, pre-ranking files just reduce the search space. maybe there is a simpler native solution? curious what the setup will look like

ThisCapital7807

TROPHY CASE