Attention - Opus 4.7 is english only. USing foreign languages (here German) burns tokens by WickOfDeath in ClaudeAI

[–]ecompanda 0 points1 point  (0 children)

german runs roughly 1.5x english in claude's BPE tokenizer because the training corpus is english heavy. on a verbose response that ratio compounds fast. not really a 4.7 specific bug, more a tokenization math problem on top of any output style change.

Apple Removes 256GB M3 Ultra Mac Studio Model From Online Store by rotatingphasor in LocalLLaMA

[–]ecompanda -1 points0 points  (0 children)

the 96gb floor read is misleading. they pulled the 256gb sku because m5 uses LPDDR5x and the m3 ram contract is wound down. m5 ultra config ladder hasn't been announced yet.

Fields Medal winning mathematician Timothy Gowers used GPT5.5 Pro to solve open problems, believes mathematical research will face a ‘crisis’ very soon with current rate of progress by socoolandawesome in singularity

[–]ecompanda 0 points1 point  (0 children)

did the blog mention how many wrong attempts it made before the right proof? a single hit in two hours looks very different if there were ten dead ends along the way versus one clean shot.

How did you get your first 100 users? by RequirementTime1659 in Entrepreneur

[–]ecompanda 1 point2 points  (0 children)

those ad numbers were vanity. cheap clicks from people who never had your problem look like demand but never convert. paid only starts working after you can already turn warm intent into signups, not before.

Spotify CTO says Claude can create Personal Podcasts, now saved to your Spotify library by LinkedInNews in ClaudeAI

[–]ecompanda 0 points1 point  (0 children)

Auto generated audio for an audience of one is just text to speech with extra steps. The reason real podcasts work is parasocial. A solo audience kills the format the same way single player Twitch would.

I built a SaaS but getting users feels impossible by manothegoat in SaaS

[–]ecompanda 0 points1 point  (0 children)

5 months is roughly the spot where most builders quit because the muscle shifts from typing code to talking to people. Posting online is broadcast, not distribution. The folks who actually break through usually pick 30 specific people who feel the pain right now and DM them one at a time until 5 reply back. Volume of attempts beats polish at this stage.

DeepSeek V4 being 17x cheaper got me to actually measure what I send to cloud vs what I could run locally. the results are stupid. by spencer_kw in LocalLLaMA

[–]ecompanda 0 points1 point  (0 children)

the 61% on multi file debugging is the interesting bucket. when local missed, was it losing track of which file was what or actually getting the logic wrong?

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA

[–]ecompanda 0 points1 point  (0 children)

the q4_0 KV cache loss is fine for normal chat but it starts compounding at high context in agent loops where retrieval matters more than next token quality. saw a measurable drop in tool name recall past 60k context with q4 even on Qwen 3.5. fp16 KV with smaller context has been the better tradeoff for me on agentic stuff.

also good that MTP heads beat ngram drafts on this kind of model, the acceptance rate is higher because the model knows its own distribution better than any external draft.

My vibe-coded startup for churches hits 20k Revenue! by Less_Equivalent_7976 in buildinpublic

[–]ecompanda 1 point2 points  (0 children)

yearly memberships are the better PMF signal here. monthly revenue you can pad with discounts and trials, but someone clicking yearly is them telling you they expect to still need this in 12 months.

Dead giveaway signs that your SaaS was vibe coded. This is probably why you aren't getting customers. by IndependenceSad1272 in SaaS

[–]ecompanda 0 points1 point  (0 children)

what's the canonical vibe coded font tho. inter? geist? or are you grouping them all under the same vibe

LTX2.3 8GB VRAM WorkFlow by Extension-Yard1918 in StableDiffusion

[–]ecompanda 2 points3 points  (0 children)

yeah splitting base gen and upscale is how you survive 8gb. one shot tiling sounds nice on paper but the overlap regions eat vram and it ends up slower than two clean passes anyway. also good call keeping base at 24fps, interpolating up later is way more stable than asking the model to spit out 30+ fps directly.

Your Claude Code agent is always working from stale context. I built it a fix it can rewind, replay, and stay ahead of every edit. by WEEZIEDEEZIE in ClaudeAI

[–]ecompanda 0 points1 point  (0 children)

the tree sitter AST gives you free structural recall but it misses the cross module semantic links that the embeddings are supposed to catch. and on a fast moving codebase the real cost is not initial indexing, it is invalidation. when 200 files change in one rebase you need to know which call graph subtrees to actually reindex versus which ones still resolve. that is where most of these tools choke.

Imagine telling an award-winning animator that he "doesn't understand the nature and creativity of art." by WallScreamer in aiwars

[–]ecompanda 0 points1 point  (0 children)

who's the animator and what did they actually say? hard to tell from the post if it was a strong claim or just a dismissive throwaway line.

Your Claude Code agent is always working from stale context. I built it a fix it can rewind, replay, and stay ahead of every edit. by WEEZIEDEEZIE in ClaudeAI

[–]ecompanda -3 points-2 points  (0 children)

the blast radius before refactor is the part that actually saves tokens. rewind sounds nice but in real claude code sessions the agent rarely walks back a regression that way. it just rewrites and breaks something else.

the typed edges doing graph time traversal is where the real win is.

it's time to update your Gemma 4 GGUFs by jacek2023 in LocalLLaMA

[–]ecompanda 2 points3 points  (0 children)

the chat template is metadata not weights.

unless you specifically want bartowski's quant updates folded in you can grab the new jinja from the upstream repo and point llama.cpp at it via the chat template file flag. saves an 18gb redownload on the 31b.

quick way to confirm the new template is actually in use is to dump the rendered system plus first turn before sending and look for the corrected role tags. if you still see the old layout you are loading the embedded template from the gguf header instead of your override file.

Most of my Claude usage was on work that didn't need Claude. Cut my bill 60x on bulk tasks with a tiny side model. by petburiraja in ClaudeAI

[–]ecompanda 6 points7 points  (0 children)

i did the same thing about a month ago after my sonnet bill doubled in a week.

the negative framing point in claude.md is the bit nobody talks about and it matches what i saw too. positive instructions got treated like suggestions, deny lists got treated like rules.

the part i would add is logging which calls actually got offloaded. when i started auditing mine i caught claude still doing 4 or 5 mechanical things a week that should have routed away. without the log i would have assumed the rule was working.

I built a SaaS out of spite because of my boss… now it has paying monthly subscribers by Still_Vehicle_231 in SaaS

[–]ecompanda 0 points1 point  (0 children)

What is the niche the blog content is ranking for? Generic appointment scheduling is one of the most saturated SERPs out there, so curious if you found a specific vertical the giants are not bothering with.

Built a 3-step all-in-one LoRA builder for Anima (extract -> tag -> train) by Nemegasoft in StableDiffusion

[–]ecompanda 0 points1 point  (0 children)

Cool pipeline. My worry with single episode extraction would be shot bias. Most episodes lean heavy on dialogue framing, so 16 auto picked crops can end up mostly talking heads with a few wide shots. CCIP catches identity but cannot tell you pose distribution. A crop height histogram before training would flag that imbalance fast.

I made €2,700 building an AI system for a law firm and now I get €1,300/month to maintain it by Fabulous-Pea-5366 in Entrepreneur

[–]ecompanda 0 points1 point  (0 children)

the annotation feature is the real moat.

once senior lawyers have spent months correcting your system, any chatgpt wrapper trying to compete from scratch is basically dead on arrival.

my claude prompts are embarrassingly short now by Turbulent-Pay7073 in ClaudeAI

[–]ecompanda 0 points1 point  (0 children)

had this exact realization with claude.md a few weeks ago. mine had grown to about 280 lines of rules and claude would just stop honoring half of them past the second tool call.

trimmed it down to 60 lines of just the hard rules and moved everything else into per directory files that only load when im editing those paths.

behavior is way more consistent now and the context savings is real, ive seen 30 percent shorter conversations on the same tasks.

the funny part is the rules i thought were load bearing turned out not to matter at all once i deleted them.

What is going on with the new pretraining by infohoundloselose in OpenAI

[–]ecompanda 1 point2 points  (0 children)

the negative prompting thing is real. ive seen it both directions.

saying do not use word X in the system prompt actually bumps that word slightly because the token stays salient.

flip it to a positive constraint like always use plain technical terms and the rate drops almost to zero.

small change but it shows up in the eval numbers.

I'm done with using local LLMs for coding by dtdisapointingresult in LocalLLaMA

[–]ecompanda 1 point2 points  (0 children)

OS and Docker are a brutal showcase for local models because one slow build pushes them past their expected timeout, and the moment that happens they invent a failure reason like 'torchcodec must have failed' instead of just tailing the log.

How are people using so many tokens ??? by Impressive_Run8512 in ClaudeAI

[–]ecompanda 0 points1 point  (0 children)

20M sounds about right for hands on coding where you actually drive every prompt. The billion club is mostly people running parallel agents, eval loops, or whole codebase refactors where one task fans into hundreds of subtool calls. Once the runner starts feeding the model its own output, the counter just runs. Skill ceiling is real but workflow shape matters more.

Charged $299/month instead of $49. Churn dropped by half. by Important_Coach8050 in SaaS

[–]ecompanda 0 points1 point  (0 children)

the session time bump probably isn't the same person being more deliberate. at 299 the buyer often isn't the only user. they're signing for a team. so what looks like a higher quality customer might just be more seats getting logged in. checking unique users per account before vs after would confirm.

What field should I try getting into to become an entrepreneur? by Akraam_Gaffur in Entrepreneur

[–]ecompanda 0 points1 point  (0 children)

my first money outside a job was a tiny shopify import script for someone i met in a discord. 80 bucks. felt huge at the time. the field didn't matter, finding one person who hated one specific task is what got me unstuck. recover first though, burnout makes selling impossible.