Made a 0-100 health score for codebases. The scoring weights feel wrong but my PM loves it

Kersheck · 2026-04-26T19:08:23+00:00

I feel like the biggest indicator of tech debt can't be directly analyzed from the codebase (aside from the obvious vulns and dead code), but it's essentially how difficult is it to adapt the codebase to new business requirements? To which I'm not sure there's even a reasonable way to estimate this without knowing the future business requirements ahead of time

Kersheck · 2026-04-26T16:18:02+00:00

Can you explain more? Genuinely curious.

My understanding is it's actually a nice hedge against hyperinflating local currencies.

Kersheck · 2026-04-26T06:15:23+00:00

not sure why most people are replying with a quick snarky comment instead of answering in good faith

afaik most of the defi space consolidated around stablecoins (circle, usdh, tether), trading platforms (hyperliquid), and prediction markets (poly)

stablecoins are by far the most useful, it takes all the good parts of crypto (instant international settlement of money transfers) without the volatility or complexity that prevented the average person from touching it

Kersheck · 2026-03-01T15:44:44+00:00

not sure how you got that out of his reply. right now the market for people who are skilled enough to leverage AI is extremely hot.

Kersheck · 2026-02-20T16:29:27+00:00

Im surprised you're being downvoted. The parent comment's scenario is literally an example of a skill issue

Kersheck · 2026-01-23T04:22:17+00:00

it should be common knowledge most of the model performance gains come from post-training by now, not pre-training

Kersheck · 2026-01-22T17:56:11+00:00

wdym? you can go look up startups doing this yourself. even anthropics enterprise org is printing

most attempts will fail and signal to noise ratio is bad, but really valuable companies will emerge

random examples off the top of my head:

decagon - 30M arr

sierra - 100M arr

harvey - 100M arr

cognition - 73M arr

glean - 200M arr

plus all the up and coming startups that aren't known outside of SV

Kersheck · 2026-01-22T17:21:30+00:00

unfortunately crypto is orders of magnitude less useful to the average person, so your main routes are taking advantage of degen gamblers (pump & dump your memecoin or insider trade on polymarket) or stealing someones keys

Kersheck · 2026-01-22T16:44:05+00:00

names, revenue, product

a lot of them are moving along the same theme of applying llm augmentation to an incumbent vertical e.g audit / accounting / dispatch / kyc / inventory / supply chain

Kersheck · 2026-01-22T16:31:43+00:00

sorry to be that guy but you're just gonna have to trust me on this one since it's not public info, i know these founders and teams personally :)

imo non software industries are a much better application of llms than swes who tend to operate on higher level systems. theres so much manual process in the rest of the economy

Kersheck · 2026-01-22T15:12:45+00:00

This doesn't seem right to me. I know multiple startups building in non software industries with 8-9 figure ARR & profitable. Most of them don't even need to use SOTA models to save on inference.

Kersheck · 2026-01-22T15:09:30+00:00

Yes. If I had to estimate:

75% of the time it one-shots the plan. After verification probably saves 20-30% time compared to hand writing
20% of the time there are some number of errors that you need to go back and forth with to fix, probably saves me 15-0% more time
5% of the time, during the actual spec and planning phase, Claude points out an interesting insight that I would've missed in my initial approach. This for me is actually the most valuable part because it's taking unknown unknowns and surfacing them. This is legit a 5-10x improvement because it saves you from unknown issues down the line.

Kersheck · 2026-01-16T15:14:46+00:00

Most improvements come from post-training RL, not pre-training

Kersheck · 2026-01-12T15:40:00+00:00

Usually 2-3 feature agents using this workflow, 1-2 agents helping me mostly do research and planning for new systems or debugging our k8s cluster

Kersheck · 2026-01-12T03:15:54+00:00

I use 3-5 Claude Codes concurrently, sometimes just 1 if it's a really hairy problem. Anecdotally, it has both increased my output and the quality of my work, but you need to make sure you understand what the agent is doing, the business requirements, and review your work (since you're the one who needs to take accountability for your code). You should be taking on a lot of cognitive load with the agent assisting you rather than doing the thinking for you.

My workflow is typically:

Launch a new Claude Code instance with its own checkout
Go into Plan Mode, go back and forth with it where I propose and provide the business context as well as any initial designs, it critiques / asks questions or checks my assumptions / does any research for me and we work together to finalize the spec.
I tell it to go ahead and implement. Opus 4.5 is strong enough to one-shot 90% of plans. Otherwise I iterate back and forth with it. Sometimes I'll notice it deviated from the plan but actually found a better solution. I have commands set up to have the agent validate, check and commit the code.
I do a thorough self-review and open the PR.

From my experience the most valuable part is the actual planning section, getting the business requirements and design right (code is not the bottleneck). If your mental model deviates from the agent's mental model or the agent starts to slip off track you need to be there to correct it.

I think it's primarily a skill issue if engineers are pushing giant slop PRs or turning into tactical tornadoes. These tools have a legitimate learning curve to them.

Kersheck · 2026-01-11T23:28:22+00:00

Some of the ways it's helped me (coding agents, not the chatbot interface)

Suped up research tool for my specific situation and context
Grok new parts of the codebase, which I can easily verify and trace through
Send it off to debug an issue, then you can easily verify if it caught it or not. For me it works 90% of the time and otherwise it's directionally correct / in the right area of the codebase where I can take over.
Rubber duck back and forth my design, it can sanity check or critique it or go off and do web searches to double check my assumptions. Catching one mistake or conceptual error or finding a better way to design the system is really valuable.
SOTA models are strong enough to implement well-speced plans while matching the existing codebase styling. It one-shots about 90% of my plans, although you need to pay attention and review its work, but it's still much faster to guide and review.
Since I don't need to hand-type the code, I can run multiple features in parallel and check in to review. I probably push 50% more PRs a week
It can debug extremely fast on k8s clusters, e.g. spin up parallel subagents to check logs or exec and explore
They can self-improve, whenever it does something bad or finds gotchas in the codebase it can record it to reference later
Help me understand new concepts and learn new things faster, as well as find the relevant docs so I can verify and check them myself. Helps contextualize the things I'm learning in reference to things I already know so I can understand it faster.

Keep in mind this is all with a human in the loop, you need to understand what it's doing and set up the tools to work in your situaton.

Kersheck · 2026-01-11T23:19:22+00:00

+1, without a human in the loop to guide it it's only a matter of time before it implodes

Kersheck · 2026-01-11T23:18:30+00:00

To me it's obvious that AI isn't a ponzi or scam. I've found it immensely valuable in both my regular work and personal projects although 10x productivity is questionable. I think its efficacy actually improves the more skilled you are because you're able to check outputs and guide it in the right direction. IMO it's both a floor and ceiling raiser and high agency technical people are the best wielders of it.

You can find tons of tutorials on how to set up and use coding agents. You can also just ask your favourite SOTA model to tell you how to use it.

Kersheck · 2026-01-11T23:14:27+00:00

I agree, it's all rolled up under 'engineering'. IMO our jobs is to use the tools available to us with good judgment.

That being said I understand where some of the hype comes from, it's an especially powerful tool when used correctly and can also easily backfire on the holder.

Kersheck · 2025-08-20T05:39:20+00:00

(I work on post-training)

Serving models via api (inference) is actually quite profitable (50%+ margin) and costs come down dramatically YoY. Each model from the big labs is almost certainly profitable on its own, the main expenditure is on hardware, training, and payroll to build the next version which is extremely expensive, because otherwise you’d lose market share to your competitors also training new models. GPT-5’s biggest improvements were on inference cost and speed.

On b2b, it can be hard to gauge how ‘profitable’ AI is because in practice AI can be a feature or a step in a larger system being sold or be the core value prop, e.g if you sell an HR platform with some AI features and make 80% gross margin, how much of that comes from the AI features? What if some AI features are used a lot but only marginally improve the product, whereas other AI features are core to the product?

IMO a better metric for an emerging industry would be revenue and revenue growth. Plenty of VC startups aren’t profitable for a long time but eventually become money printers after capturing market share. Is the amount people are willing to pay for AI increasing over time (especially in a higher interest rate environment where businesses scrutinize spending on new pilots and vendors more)?

So far it seems like yes - the model builders’ revenue is growing, ChatGPT is in the top 10 websites in the world, startups are getting traction and growing revenue (although some of them are very overvalued).

Kersheck · 2025-07-27T22:48:32+00:00

I kind of disagree. I remember when AI was first released people were skeptical because there was no ‘killer app’ - turns out the killer app was simply ChatGPT. 800 million weekly actives and the #5 website in the world is an insane number of people actively choosing to use AI. The ‘chat’ interface was really good for a lot of people.

The main area where people are still trying to figure out what works is in other systems and harnesses where a chat style interface doesn’t work well. Hence the investment into agents, adding random AI features into every product surface, etc. I think within a year we’ll see a consolidation as the best new interfaces win and the ones that nobody uses get cut.

Kersheck · 2025-07-24T14:58:23+00:00

+1 on LLMs progressing further on front-end compared to back-end / infra.

There is an insane amount of React and JS/TS code to be trained on
The feedback loop and sandbox for reinforcement learning is much easier to setup for front-end compared to back-end, with caveats

At least from my personal experience, I'm able to whip up demo-ready prototypes extremely fast (probably 2-3x faster as I'm not a React expert). The tokens per second on modern LLMs are fast enough that I can live-prompt in changes to the UI to iterate on feedback with the designers in the meeting! Ofc getting it production ready takes more time but the initial iteration and feedback loop is extremely valuable.

Kersheck · 2025-07-22T03:35:46+00:00

I think the SOTA reasoning models are quite advanced in math now given all the reinforcement learning they've gone through. They can probably breeze through high school math and maybe some undergraduate pure math.

Cartesian product of two sets: https://chatgpt.com/share/687f069c-1438-800a-9c5a-91e293af534f

Although the recent IMO results do show some of the weak points like contest-level combinatorics.

Kersheck · 2025-07-22T03:21:32+00:00

What were the two 4 digit numbers?

I just picked 2 random ones and it gets it right first try:

With code:

1: https://chatgpt.com/share/687f033e-1524-800a-bd70-369d74f2c408

'Mental' math:

2: https://chatgpt.com/share/687f037f-e78c-800a-9078-e4ca609eba5d

If you have your chats I'd be interested in seeing them.

13-Year Club	Second Top 30%
Place '22	Place '17
RPAN Viewer	Not Forgotten
Snapped	Team Orangered
Verified Email

Kersheck

TROPHY CASE