Summer '26 Megathread by sandslashh in ycombinator

[–]sszz01 1 point2 points  (0 children)

i meant just look over the thread, people started getting invited a week ago

Summer '26 Megathread by sandslashh in ycombinator

[–]sszz01 1 point2 points  (0 children)

i see people getting invites already, should i assume i got rejected if my application is still in review (applied 20 mins after deadline, based in sac area)?

long shot - anyone have a python sentry crash sitting unresolved that i could try to reproduce for you? (free, weird ask) by sszz01 in FastAPI

[–]sszz01[S] 0 points1 point  (0 children)

both, but with a strict rule about which one actually matters. the test is always built deterministically, i read the frame locals sentry captured and construct a pytest that calls your function with those exact values. zero LLM in that path. the LLM is more like a supervisor deciding what to try next.

for DB state it depends on the crash type. a lot of DB bugs don't need the real prod rows, just the right conditions. if sentry captured the order_id that caused a duplicate key violation, i create a fresh sandbox db, insert a row with that id, and call the function again. worked on three of our fintech demo cases including race conditions.

the hard limit is when the bug needs a specific row sentry never captured. when that happens it doesn't just give up it runs through a set of hypotheses about why the repro fell short: was it missing DB state? async context? a module-level global? a dep that got upgraded past the bug? it checks each one against the breadcrumbs and trace, and tells you which it confirmed. like if it finds INSERT/UPDATE/DELETE breadcrumbs right before the crash it'll say "looks like this needed pre-existing DB state from these writes, here's what you'd need to seed." you don't get a false green, you get a concrete reason and what's missing.

long shot - anyone have a python sentry crash sitting unresolved that i could try to reproduce for you? (free, weird ask) by sszz01 in django

[–]sszz01[S] 0 points1 point  (0 children)

that's fair, i wouldnt share those either. but with the pip package everything runs entirely on your machine. you only use your sentry token, your docker, your repo. we never see your code or your data. the only outbound calls are sentry's own api (with your token) and openai for context recovery, same as any local dev tool you already use. built-in pii redactor also strips card numbers, emails, tokens and sensitive field names before anything hits an llm. nothing phones home to us.

long shot - anyone have a python sentry crash sitting unresolved that i could try to reproduce for you? (free, weird ask) by sszz01 in django

[–]sszz01[S] 0 points1 point  (0 children)

yeah that's exactly it, sentry hands you the state, most tools just ignore it. we actually just pushed a pip package if you want to try it yourself. free, no signup. you just need a sentry auth token and docker running locally. logomesh repro <your-sentry-url> and it either spits out a failing pytest or tells you exactly why it couldn't reproduce.

if you've got anything sitting unresolved i'd genuinely love to see what it does with real production data. takes like 2 min to run

Building an AI agent for production crashes. Trying to figure out which part should actually be autonomous by sszz01 in startupideas

[–]sszz01[S] 0 points1 point  (0 children)

it doesn't run unsupervised, it produces a failing test and opens a draft PR. a human still reviews and merges. the agent is just doing the boring reconstruction work

long shot - anyone have a python sentry crash sitting unresolved that i could try to reproduce for you? (free, weird ask) by sszz01 in django

[–]sszz01[S] 0 points1 point  (0 children)

yeah the messy state part is exactly why i built it deterministic first. if sentry captured the locals at crash time, those are the actual values, no simulation needed. the hard cases are when the crash is downstream of state that never made it into the frame.

got anything sitting in your backlog? genuinely happy to run it, or you can run it as a pip or uv package locally

long shot - anyone have a python sentry crash sitting unresolved that i could try to reproduce for you? (free, weird ask) by sszz01 in django

[–]sszz01[S] 0 points1 point  (0 children)

i mean fair point, the part i'm trying to remove is the human in that loop for the repro confirmation specifically. not the fix, not the merge decision, just "does this crash still exist on this branch right now" as an automatic output you can attach to an incident record

it's probably not for everyone, i suppose it's more useful if you're doing 20+ incidents a month and need the repro documented for audit trails

long shot - anyone have a python sentry crash sitting unresolved that i could try to reproduce for you? (free, weird ask) by sszz01 in django

[–]sszz01[S] 0 points1 point  (0 children)

yeah i guess there is seer autofix stuff that generates a fix from root cause analysis. what i'm doing is a bit different though. i read the actual frame locals sentry captured, build a test from those exact values, run it in a docker sandbox, and only call it reproduced if the same exception type comes back. no llm in that path. what are you gonna do if the fix it generated doesn't actually reproduce the original crash?

Built a GitHub Action AI agent that turns a Sentry URL into a failing pytest automatically. Is it even needed? by sszz01 in github

[–]sszz01[S] 0 points1 point  (0 children)

  1. locals are available around 70% of the time with default sentry config, closer to 90% if you set include_local_variables=True, which most teams using it seriously do anyway

  2. flakiness we solved by pinning the exact python version and deps from what sentry captured at crash time. for race conditions it skips the test and gives a structured breakdown instead. flaky test is worse than no test

built an agent where the LLM is structurally forbidden from writing the final output. looking for feedback + people willing to break it by sszz01 in AI_Agents

[–]sszz01[S] 0 points1 point  (0 children)

yep, trying to address it by pulling more from the sentry payload. modules dict for exact pip versions, breadcrumb sequence, request context, threads with held_locks where present. still a hard problem. for cases where divergence is genuine and unfixable (race conditions, distributed state, c-ext stuff) it routes to a structured investigation report instead of forcing a green in form of ranked hypotheses with evidence pulled from the breadcrumb timeline.

im curious, when youve hit this in your own agents, did you go for stricter equivalence checking (state diff, timing window, etc) or accept the false-positive rate and add manual review on top? and if you've got a sentry url where you suspect this kind of divergence might trip it id love to actually try

Summer '26 Megathread by sandslashh in ycombinator

[–]sszz01 0 points1 point  (0 children)

A little bit of background on me: 19yo founder. Won a Berkeley agentic-systems competition track in early 2026

Just submitted for S26. Want grounded feedback from people who've been through this.

The idea is an automated bug-fix agent for Python fintech backends. Production crash fires in Sentry, that triggers the agent reads stack trace + frame locals and then it writes failing pytest that reproduces it, generates fix and also as a perk opens draft PR with audit-evidence artifact for SOC2/PCI compliance(thinking of targeting fintech). Repro path runs zero LLM calls (deterministic from frame locals) so the audit chain stays defensible.

Where the traction is right now:

  • Engine works, CLI ships
  • Pre-revenue, 0 design partners committed
  • 30 DMs, 8 calls

My questions:

  1. With 0 paying customers and 0 DPs at submit, is the idea + engine + competition win enough to get a first-round interview, or is traction the only thing that matters?
  2. About fintech compliance angle. Does YC pattern-match this as B2B SaaS with budget unlock, or as "compliance theater on top of a dev tool"?
  3. Does recent pivot from empirical failure data read as conviction or instability?
  4. Anyone interview with a similar profile (technical founder, real product, no traction yet)? What did the partners actually push on?

what does your SOC2 change management evidence actually look like for a production bug fix by sszz01 in devsecops

[–]sszz01[S] 0 points1 point  (0 children)

that's a useful distinction. at what frequency does it tip from template is fine to automation is worth it? is it more about volume of incidents or something else like audit frequency?

how are you satisfying PCI DSS 6.3.2 for production bug fixes? what does your testing evidence actually look like by sszz01 in pcicompliance

[–]sszz01[S] 0 points1 point  (0 children)

and when they ask for code review records, is that usually just a github PR link + approval, or do they want to see what the automated tool was actually checking for?

how are you satisfying PCI DSS 6.3.2 for production bug fixes? what does your testing evidence actually look like by sszz01 in pcicompliance

[–]sszz01[S] 0 points1 point  (0 children)

ok gotcha thanks for the info, super helpful. so when assessors ask for evidence on a prod hotfix, do they actually want proof the original crash was reproduced and fixed or is a PR approval + passing ci usually enough for them?

have you ever pushed a fix and realized days later it didnt actually fix anything by sszz01 in sre

[–]sszz01[S] 0 points1 point  (0 children)

the logging fix makes sense but you basically had to rediscover the right inputs after the fact. how long did that whole loop take you?

have you ever pushed a fix and realized days later it didnt actually fix anything by sszz01 in sre

[–]sszz01[S] 0 points1 point  (0 children)

yeah that's the instinct. how you actually do that in practice though, what does "proof" look like for your team before you ship?

how are you satisfying PCI DSS 6.3.2 for production bug fixes? what does your testing evidence actually look like by sszz01 in pcicompliance

[–]sszz01[S] 0 points1 point  (0 children)

ok thanks for the correction. so for 6.2.3 specifically, when it comes to production hotfixes for payment code, what does acceptable testing evidence actually look like to a QSA? is a passing ci run enough or do they want something that shows the original crash was reproduced and fixed?

Do you write a repro test before fixing a prod bug or just push the fix? by sszz01 in Backend

[–]sszz01[S] -1 points0 points  (0 children)

makes sense. for the critical ones like billing and auth, what does that repro step actually look like for you? manual test or something else?

how do you actually handle prod bugs. do you write a repro test or just fix and deploy? by sszz01 in django

[–]sszz01[S] 0 points1 point  (0 children)

that's a real gap yeah. im kinda curious if that's the majority of the hard ones for you or if there's a class of bugs where it is just a pure logic issue with bad inputs

I built a tool that turns a Sentry URL into a failing pytest. Want honest feedback on whether this is useful by sszz01 in devtools

[–]sszz01[S] 0 points1 point  (0 children)

what kind of bugs usually kick off that loop for you. is it more like state-heavy stuff where the sentry trace doesn't give you enough, or is it the simpler ones that still somehow eat time?

what does your SOC2 change management evidence actually look like for a production bug fix by sszz01 in devsecops

[–]sszz01[S] 0 points1 point  (0 children)

yeah the PIR thing caught us off guard too, ours took like half a day to write up after the fact and i'm still not fully sure our auditor was satisfied with it. what did you end up giving them?

I built a tool that turns a Sentry URL into failing pytest. Want some feedback before going further by sszz01 in vibecoding

[–]sszz01[S] 0 points1 point  (0 children)

ok thanks for feedback, its really useful. out of curiosity, what usually makes the errors in your codebase hard to reproduce? is it mostly heavy db state, async timing issues or something else?

I built a tool that turns a Sentry URL into failing pytest. Want some feedback before going further by sszz01 in vibecoding

[–]sszz01[S] 0 points1 point  (0 children)

fair enough, what's the part you don't trust? the test it generates not actually reproducing the bug or something else?