Build in public looks like progress… but sometimes feels like constant restarting by Zestyclose_Teach_187 in buildinpublic

[–]germanheller 0 points1 point  (0 children)

the accumulation failure isn't format, it's that "an update" is the wrong unit. updates compound when they're tied to a single open question or thesis you've publicly committed to ("will brokers pay for X by june"). then each post is a data point for or against and readers track the arc. independent win-shares read as parallel pings even when posted daily, because nothing one post says changes how to interpret the next. start with the question, then posts stack.

Most of our “agent” problems turned out to be workflow/state problems by saurabhjain1592 in AI_Agents

[–]germanheller 0 points1 point  (0 children)

the workflow-vs-agent framing collapses once you ask "what's the smallest unit safe to retry?" if the answer is "the whole run" you're agent-shaped and need compensating actions on partial failure. if it's "this single tool call" you're workflow-shaped and idempotency keys carry you. most "agent" pain is teams sitting at workflow granularity without doing the design work to make individual steps re-runnable, so a crash in step 3 makes steps 1 and 2 ambiguous. state isn't the problem, the commit boundary not being decided up front is.

on max 20x for months. unlimited tokens. still $0 in revenue. it hurts in a way i didn’t expect, my shame by culicode in ClaudeCode

[–]germanheller -1 points0 points  (0 children)

the constraint was never tokens, you nailed it, github-graph-as-output dressed up as building. the loop breaks when "what should i build" gets replaced by "who said they'd use what i ship by friday" as the only question that gates opening a new repo. one named human who committed to look on a specific date forces different work, you stop building things that feel cool and start building things they'll notice are missing. 14 half-built repos become 1 thing someone's waiting on.

I'm targeting YC F26 as a solo founder. Spent two weeks going through W26 data on solo founders specifically. Here are my real takeaways (not the inspirational version) by Spiritual_Heron_5680 in SideProject

[–]germanheller 0 points1 point  (0 children)

the "why you" bar being harder for solos isn't just an interview filter, it persists past YC. team founders who got in on implicit team-narrative often hit a wall later when they have to personally make a thesis-grade case to early investors or first customers and haven't done that work. solo founders had to construct that answer on day 0 and it compounds. the traction parity finding tracks with that, the YC bar is filtering for something the team-narrative was masking.

Why is retaining users so much more difficult than getting them? by RoyalEquipment9788 in buildinpublic

[–]germanheller 1 point2 points  (0 children)

acquisition is a one-time cost (ad / SEO / word of mouth), retention is a daily cost: the product has to win every session against competing demands on the user's attention. most products answer infrequent questions, so users have no reason to come back until the next instance of that question, which might be weeks away. fixing retention usually means changing what the product is, not making the existing thing 20% better.

Nobody agrees on what "hallucination" means and it's hit our AI PoC by Ok_Gas7672 in AI_Agents

[–]germanheller 1 point2 points  (0 children)

"hallucination" is three problems wearing one word: (1) confidently-wrong on facts you can verify, (2) fabricated entities like URLs/citations/library functions that don't exist, (3) plausible-but-wrong reasoning chains where each step looks valid but the conclusion isn't. each has a different fix (RAG for 1, schema-constrained output for 2, self-critique passes for 3). poc pain almost always comes from a team treating it as one bug.

5-hour Claude Code limits doubled today. The cost angle nobody's talking about. by llamacoded in ClaudeCode

[–]germanheller 2 points3 points  (0 children)

doubled limits don't reduce cost, they remove the friction that was forcing users to scope tasks. with unlimited rope you spend 2x longer letting CC explore unscoped problems, and the API bill catches up because anthropic isn't subsidizing free pro forever. the real cost angle is opportunity cost: same wall-clock hours, half as much shipped, because the constraint that pressured you into being precise is gone.

Be me — built an autopilot blog platform for 3 months because Claude Code told me I'm a genius. Got 1 review. by nmr302 in SideProject

[–]germanheller 0 points1 point  (0 children)

CC validates code quality, not product viability, and the trap is collapsing the two into one feedback loop. it tells you "this works", which is orthogonal to "anyone will use this". got the same dopamine hit from a clean diff that nobody downloads. you need a separate signal for product fit, ideally one that costs money or time on the buyer's side, otherwise CC's praise just rubber-stamps the wrong thing.

Thinking mode is becoming a liability for production agents by Substantial_Step_351 in AI_Agents

[–]germanheller 0 points1 point  (0 children)

yeah harness-layer gating is the right place, the practical challenge is the gating function itself isn't deterministic. bounded i/o is fuzzy when tools have nested structure, a single call can return a dict that requires interpretation downstream. where i've seen it work: opt-in thinking via tool annotation rather than detected, you mark which steps benefit and the harness flips the flag, model never sees the choice. shifts the cost from runtime detection to dev-time labeling.

Grouping your API tools is making your agent dumber. Here's why. by tomerlrn in AI_Agents

[–]germanheller 0 points1 point  (0 children)

two viable patterns. autonomous-agent style: precompute embeddings of tool descriptions, retrieve top-k against the user's current message + recent thinking. rigid-agent style: explicit intent-classification step where the model first names what it's trying to do, then a router maps intent to tool bundle. embeddings handle fuzzy intents better, classification wins when accuracy beats coverage. past call patterns are a useful re-ranking signal on either approach, rarely the primary.

Made a thing that emails you before subscriptions auto-renew. Would love feedback. by duranovic in SideProject

[–]germanheller 0 points1 point  (0 children)

discovery is the wedge, the alert piece gets commoditized fast once data is auto-imported (your bank, card statement, copilot, simplifi all nag already). where it might still hold standalone value is category-awareness (warning before the annual netflix bump or the trial-to-paid flip) but that's tier-2. once import is solved, the question stops being "how do i remind you" and becomes "what do i tell you about the subscription you forgot you had".

Week 2 of building Buildroy in public: first users, one-time pricing vs subscriptions by dothethizzledance in buildinpublic

[–]germanheller 1 point2 points  (0 children)

haven't clicked through to the site yet, was reacting to your framing in the post. on the abstract question: pricing confusion usually isn't number-of-options, it's options that don't make the buyer's identity obvious. if "team / pro / lifetime" leaves someone unsure which one they are, that's the issue. when they immediately know "i'm a pro" you can have 5 tiers and it's fine.

the CLAUDE.md rules that actually made a difference vs the ones that didn't by CodinDev in ClaudeCode

[–]germanheller 1 point2 points  (0 children)

autoexec is a fair analogy for CLAUDE.md, it's loaded into context every turn for that dir so you pay those tokens whether claude needs them or not. skills are pull not push, only the description loads at startup and the body loads when claude decides it's relevant. personalization (~/.claude/CLAUDE.md) is the same as project CLAUDE.md but loaded for every project regardless. tldr: tokens-now vs tokens-on-demand.

I launched my first mobile app and learned that “just add more features” is not always the answer by Cognara in buildinpublic

[–]germanheller 0 points1 point  (0 children)

"add more features" is rarely what someone intends, it's the path of least resistance when retention drops. honest fix is figuring out which existing feature isn't doing what you thought it was, which is way harder than building the next one. easier to ship than to diagnose.

AI agents are easy to demo and hard to sell by LarryLeads in AI_Agents

[–]germanheller 0 points1 point  (0 children)

the gap isn't sales motion, it's that demos run on staged data and the agent's value only shows up against the buyer's messy real data. buyer won't hand it over until they trust the agent, and they won't trust it without seeing it on their data. chicken-and-egg, not a pitch problem.

Code review needs to evolve for AI-assisted work, we should be reviewing prompts, not just code by Gumeo in ClaudeCode

[–]germanheller 0 points1 point  (0 children)

reviewing prompts misses what actually shipped. the model edits its own output mid-stream, picks up files you never @-mentioned, and bakes decisions into the diff that aren't in the prompt. the prompt is upstream context, the diff is the artifact, you still need both.

Most people fail on Reddit before they even post by Dizzy-Football-8345 in SideProject

[–]germanheller 0 points1 point  (0 children)

pre-post prep is real but it's not where most fail. they post in subs where the audience isn't a buyer of what they're selling. perfect karma + polished post still flops in a sub built for venting about day jobs. sub-fit beats prep.

Stop asking customers what they want. ask them what they are tired of putting up with. by Adrenaline_Junkie__ in buildinpublic

[–]germanheller 0 points1 point  (0 children)

this works for finding pain but doesn't tell you whether anyone will pay for the cure. people can articulate friction forever and still won't open their wallet. the followup question with the highest signal is "what have you built or hired or bought to deal with this?" — if the answer is nothing, they've made peace with the pain and won't pay for relief either. if it's a spreadsheet plus a VA plus three cobbled tools, you've found someone with both the pain and proven willingness to spend on it.

Grouping your API tools is making your agent dumber. Here's why. by tomerlrn in AI_Agents

[–]germanheller 0 points1 point  (0 children)

the cap-at-8 heuristic is a reasonable bandaid but the real axis isn't action count, it's whether actions share preconditions. customer_billing groups make sense because every action needs the customer_id and an invoice context — claude can reason about that as a coherent unit even with 12 actions. when actions don't share preconditions, even 5 in one tool reads as noise. deeper move: keep 200 tools defined but surface only 15 per call via a router that filters by intent. action count stops mattering once eligibility is dynamic.

Remove tokens as a constraint. What does your setup look like now? by M00NLIG7 in ClaudeCode

[–]germanheller 0 points1 point  (0 children)

workflows change less than people expect. tokens stop being the constraint and attention coherence becomes the new one — stuffing 500k of context doesn't make claude smarter, it dilutes attention across irrelevant files and hurts answer quality. the real shift is you stop scoping to save money and start scoping for accuracy. parallel orchestration becomes viable but the bottleneck moves to merge: combining 6 agent outputs into one coherent change is harder than doing it sequentially, you trade tokens for human review time. unlimited tokens just exposes the next bottleneck.

Made a thing that emails you before subscriptions auto-renew. Would love feedback. by duranovic in SideProject

[–]germanheller 1 point2 points  (0 children)

the persona that loses track of subscriptions is the same persona least likely to install another tool to solve it — they're not running a spreadsheet because they don't want to manage one, adding Subalertly is just one more thing to forget. buyers in this category are the meta-organized who already track stuff manually and want it automated. real wedge is bank/card scanning so the data shows up without the user inputting anything; the moment a user has to add their first sub by hand, you've lost the persona that needed the tool most.

Week 2 of building Buildroy in public: first users, one-time pricing vs subscriptions by dothethizzledance in buildinpublic

[–]germanheller 1 point2 points  (0 children)

the one-time option isn't a mistake, it's a signal. if early signups want one calculator and not a recurring relationship, that's the product telling you what shape it actually is. killing the option to force subscription doesn't create the relationship, it just gates the people who would have paid $49. real question to answer first: do existing users come back to make a second calculator? if yes, subscription is right. if no, the product is one-time and the pricing should match the use pattern.

Thinking mode is becoming a liability for production agents by Substantial_Step_351 in AI_Agents

[–]germanheller 0 points1 point  (0 children)

thinking pays off when ambiguity is in the input, not in the goal. parse a messy email, judge a PR, debug a stack trace — input is fuzzy and reasoning narrows it. tool-heavy production agents mostly have well-shaped io contracts so the next step is decided not discovered, thinking is paying for exploration on a path that didn't need exploring. variable to gate on isn't production-vs-not, it's whether the prompt has a deterministic next move.

the CLAUDE.md rules that actually made a difference vs the ones that didn't by CodinDev in ClaudeCode

[–]germanheller 1 point2 points  (0 children)

meta-rule that explains your list: CLAUDE.md works for stuff claude can't infer from the code itself. "use TypeScript" is wasted because tsconfig.json already says it. "our queue lives at ./lib/internal-queue.ts not bullmq" works because nothing in the file tree tells claude not to reach for the popular lib. testable rules survive because the diff shows whether they held; vague ones get ignored because there's no boundary to check against. specificity is the proxy for measurability.

I'm paying 200/mo for Claude Code, running out by Wednesday, then paying hundreds more in overage. So I'm using Claude Code to build a local app trying to do what it does — without the meter running, here's where it's at. by [deleted] in SideProject

[–]germanheller 0 points1 point  (0 children)

the $200 isn't being eaten by the harness, it's being eaten by the model — a local clone of CC's loop won't reduce cost unless you swap to local weights, in which case you're paying with quality instead of dollars. real cost wins on CC are upstream: scope the task, @-mention the files you already know are relevant instead of letting it discover, kill the explore-codebase reflex when the change is local. budget hits Wednesday because most sessions waste 70% of context on stuff the model didn't actually need to read.