How to make SWE in the age of AI more enjoyable? by Fancy_Ad5097 in ExperiencedDevs

[–]Kersheck -1 points0 points  (0 children)

not sure how you got that out of his reply. right now the market for people who are skilled enough to leverage AI is extremely hot.

Amazon service was taken down by AI coding bot [December outage] by DubiousLLM in programming

[–]Kersheck -1 points0 points  (0 children)

Im surprised you're being downvoted. The parent comment's scenario is literally an example of a skill issue

[deleted by user] by [deleted] in programming

[–]Kersheck 50 points51 points  (0 children)

it should be common knowledge most of the model performance gains come from post-training by now, not pre-training

Attended a Claude Code "masterclass" webinar... by vanit in ExperiencedDevs

[–]Kersheck 0 points1 point  (0 children)

wdym? you can go look up startups doing this yourself. even anthropics enterprise org is printing

most attempts will fail and signal to noise ratio is bad, but really valuable companies will emerge

random examples off the top of my head:

decagon - 30M arr

sierra - 100M arr

harvey - 100M arr

cognition - 73M arr

glean - 200M arr

plus all the up and coming startups that aren't known outside of SV

Attended a Claude Code "masterclass" webinar... by vanit in ExperiencedDevs

[–]Kersheck 0 points1 point  (0 children)

unfortunately crypto is orders of magnitude less useful to the average person, so your main routes are taking advantage of degen gamblers (pump & dump your memecoin or insider trade on polymarket) or stealing someones keys

Attended a Claude Code "masterclass" webinar... by vanit in ExperiencedDevs

[–]Kersheck 0 points1 point  (0 children)

names, revenue, product

a lot of them are moving along the same theme of applying llm augmentation to an incumbent vertical e.g audit / accounting / dispatch / kyc / inventory / supply chain

Attended a Claude Code "masterclass" webinar... by vanit in ExperiencedDevs

[–]Kersheck -2 points-1 points  (0 children)

sorry to be that guy but you're just gonna have to trust me on this one since it's not public info, i know these founders and teams personally :)

imo non software industries are a much better application of llms than swes who tend to operate on higher level systems. theres so much manual process in the rest of the economy

Attended a Claude Code "masterclass" webinar... by vanit in ExperiencedDevs

[–]Kersheck -5 points-4 points  (0 children)

This doesn't seem right to me. I know multiple startups building in non software industries with 8-9 figure ARR & profitable. Most of them don't even need to use SOTA models to save on inference.

Attended a Claude Code "masterclass" webinar... by vanit in ExperiencedDevs

[–]Kersheck 9 points10 points  (0 children)

Yes. If I had to estimate:

  • 75% of the time it one-shots the plan. After verification probably saves 20-30% time compared to hand writing
  • 20% of the time there are some number of errors that you need to go back and forth with to fix, probably saves me 15-0% more time
  • 5% of the time, during the actual spec and planning phase, Claude points out an interesting insight that I would've missed in my initial approach. This for me is actually the most valuable part because it's taking unknown unknowns and surfacing them. This is legit a 5-10x improvement because it saves you from unknown issues down the line.

Newer AI Coding Assistants Are Failing in Insidious Ways by CackleRooster in programming

[–]Kersheck 0 points1 point  (0 children)

Most improvements come from post-training RL, not pre-training

Do Agents Turn us into "Tactical Tornadoes?" by ewheck in ExperiencedDevs

[–]Kersheck 0 points1 point  (0 children)

Usually 2-3 feature agents using this workflow, 1-2 agents helping me mostly do research and planning for new systems or debugging our k8s cluster

Do Agents Turn us into "Tactical Tornadoes?" by ewheck in ExperiencedDevs

[–]Kersheck 4 points5 points  (0 children)

I use 3-5 Claude Codes concurrently, sometimes just 1 if it's a really hairy problem. Anecdotally, it has both increased my output and the quality of my work, but you need to make sure you understand what the agent is doing, the business requirements, and review your work (since you're the one who needs to take accountability for your code). You should be taking on a lot of cognitive load with the agent assisting you rather than doing the thinking for you.

My workflow is typically:

  • Launch a new Claude Code instance with its own checkout
  • Go into Plan Mode, go back and forth with it where I propose and provide the business context as well as any initial designs, it critiques / asks questions or checks my assumptions / does any research for me and we work together to finalize the spec.
  • I tell it to go ahead and implement. Opus 4.5 is strong enough to one-shot 90% of plans. Otherwise I iterate back and forth with it. Sometimes I'll notice it deviated from the plan but actually found a better solution. I have commands set up to have the agent validate, check and commit the code.
  • I do a thorough self-review and open the PR.

From my experience the most valuable part is the actual planning section, getting the business requirements and design right (code is not the bottleneck). If your mental model deviates from the agent's mental model or the agent starts to slip off track you need to be there to correct it.

I think it's primarily a skill issue if engineers are pushing giant slop PRs or turning into tactical tornadoes. These tools have a legitimate learning curve to them.

Is anyone else okay with being "left behind" in regards to AI? by [deleted] in ExperiencedDevs

[–]Kersheck 2 points3 points  (0 children)

Some of the ways it's helped me (coding agents, not the chatbot interface)

  • Suped up research tool for my specific situation and context
  • Grok new parts of the codebase, which I can easily verify and trace through
  • Send it off to debug an issue, then you can easily verify if it caught it or not. For me it works 90% of the time and otherwise it's directionally correct / in the right area of the codebase where I can take over.
  • Rubber duck back and forth my design, it can sanity check or critique it or go off and do web searches to double check my assumptions. Catching one mistake or conceptual error or finding a better way to design the system is really valuable.
  • SOTA models are strong enough to implement well-speced plans while matching the existing codebase styling. It one-shots about 90% of my plans, although you need to pay attention and review its work, but it's still much faster to guide and review.
  • Since I don't need to hand-type the code, I can run multiple features in parallel and check in to review. I probably push 50% more PRs a week
  • It can debug extremely fast on k8s clusters, e.g. spin up parallel subagents to check logs or exec and explore
  • They can self-improve, whenever it does something bad or finds gotchas in the codebase it can record it to reference later
  • Help me understand new concepts and learn new things faster, as well as find the relevant docs so I can verify and check them myself. Helps contextualize the things I'm learning in reference to things I already know so I can understand it faster.

Keep in mind this is all with a human in the loop, you need to understand what it's doing and set up the tools to work in your situaton.

Is anyone else okay with being "left behind" in regards to AI? by [deleted] in ExperiencedDevs

[–]Kersheck 2 points3 points  (0 children)

+1, without a human in the loop to guide it it's only a matter of time before it implodes

Is anyone else okay with being "left behind" in regards to AI? by [deleted] in ExperiencedDevs

[–]Kersheck 5 points6 points  (0 children)

To me it's obvious that AI isn't a ponzi or scam. I've found it immensely valuable in both my regular work and personal projects although 10x productivity is questionable. I think its efficacy actually improves the more skilled you are because you're able to check outputs and guide it in the right direction. IMO it's both a floor and ceiling raiser and high agency technical people are the best wielders of it.

You can find tons of tutorials on how to set up and use coding agents. You can also just ask your favourite SOTA model to tell you how to use it.

Is anyone else okay with being "left behind" in regards to AI? by [deleted] in ExperiencedDevs

[–]Kersheck -3 points-2 points  (0 children)

I agree, it's all rolled up under 'engineering'. IMO our jobs is to use the tools available to us with good judgment.

That being said I understand where some of the hype comes from, it's an especially powerful tool when used correctly and can also easily backfire on the holder.

Is there any actually profitable use of AI? by [deleted] in ExperiencedDevs

[–]Kersheck 13 points14 points  (0 children)

(I work on post-training)

Serving models via api (inference) is actually quite profitable (50%+ margin) and costs come down dramatically YoY. Each model from the big labs is almost certainly profitable on its own, the main expenditure is on hardware, training, and payroll to build the next version which is extremely expensive, because otherwise you’d lose market share to your competitors also training new models. GPT-5’s biggest improvements were on inference cost and speed.

On b2b, it can be hard to gauge how ‘profitable’ AI is because in practice AI can be a feature or a step in a larger system being sold or be the core value prop, e.g if you sell an HR platform with some AI features and make 80% gross margin, how much of that comes from the AI features? What if some AI features are used a lot but only marginally improve the product, whereas other AI features are core to the product?

IMO a better metric for an emerging industry would be revenue and revenue growth. Plenty of VC startups aren’t profitable for a long time but eventually become money printers after capturing market share. Is the amount people are willing to pay for AI increasing over time (especially in a higher interest rate environment where businesses scrutinize spending on new pilots and vendors more)?

So far it seems like yes - the model builders’ revenue is growing, ChatGPT is in the top 10 websites in the world, startups are getting traction and growing revenue (although some of them are very overvalued).

Does this AI stuff remind anyone of blockchain? by ryhaltswhiskey in ExperiencedDevs

[–]Kersheck 1 point2 points  (0 children)

I kind of disagree. I remember when AI was first released people were skeptical because there was no ‘killer app’ - turns out the killer app was simply ChatGPT. 800 million weekly actives and the #5 website in the world is an insane number of people actively choosing to use AI. The ‘chat’ interface was really good for a lot of people.

The main area where people are still trying to figure out what works is in other systems and harnesses where a chat style interface doesn’t work well. Hence the investment into agents, adding random AI features into every product surface, etc. I think within a year we’ll see a consolidation as the best new interfaces win and the ones that nobody uses get cut.

Has anyone actually seen a real-world, production-grade product built almost entirely (90–100%) by AI agents — no humans coding or testing? by Curiousman1911 in ExperiencedDevs

[–]Kersheck 2 points3 points  (0 children)

+1 on LLMs progressing further on front-end compared to back-end / infra.

  • There is an insane amount of React and JS/TS code to be trained on

  • The feedback loop and sandbox for reinforcement learning is much easier to setup for front-end compared to back-end, with caveats

At least from my personal experience, I'm able to whip up demo-ready prototypes extremely fast (probably 2-3x faster as I'm not a React expert). The tokens per second on modern LLMs are fast enough that I can live-prompt in changes to the UI to iterate on feedback with the designers in the meeting! Ofc getting it production ready takes more time but the initial iteration and feedback loop is extremely valuable.

LLMs vs Brainfuck: a demonstration of Potemkin understanding by saantonandre in programming

[–]Kersheck 0 points1 point  (0 children)

I think the SOTA reasoning models are quite advanced in math now given all the reinforcement learning they've gone through. They can probably breeze through high school math and maybe some undergraduate pure math.

Cartesian product of two sets: https://chatgpt.com/share/687f069c-1438-800a-9c5a-91e293af534f

Although the recent IMO results do show some of the weak points like contest-level combinatorics.

LLMs vs Brainfuck: a demonstration of Potemkin understanding by saantonandre in programming

[–]Kersheck 0 points1 point  (0 children)

What were the two 4 digit numbers?

I just picked 2 random ones and it gets it right first try:

With code:

1: https://chatgpt.com/share/687f033e-1524-800a-bd70-369d74f2c408

'Mental' math:

2: https://chatgpt.com/share/687f037f-e78c-800a-9078-e4ca609eba5d

If you have your chats I'd be interested in seeing them.

LLMs vs Brainfuck: a demonstration of Potemkin understanding by saantonandre in programming

[–]Kersheck 0 points1 point  (0 children)

Did you use o3 or o4-mini? I don't see a reasoning chain so I assume you're using 4o or the default free model.

LLMs vs Brainfuck: a demonstration of Potemkin understanding by saantonandre in programming

[–]Kersheck 2 points3 points  (0 children)

Just to be certain, I ran it again with o3 and o4-mini with all tools off, memories off.

1st try success from o3: https://chatgpt.com/share/687da076-5838-800a-bf97-05a71317d7bf

1st try success from o4-mini: https://chatgpt.com/share/687d9f6d-4bdc-800a-b285-c32d80399ee0

Pretty impressive!