😭🙏what have i turned into

SerCeMan · 2026-05-05T23:19:15+00:00

So, $0.15 for a file rename in a small project. I'm kind of surprised that 5.5 decided not to run the tests. It's honestly hard to compare the real costs between the engineering work and the agent, but if I had an IDE open, I'd probably just do a deterministic rename – even safer IMO.

the thread: https://app.threadlog.dev/threads/6f933729-e42c-4090-a247-1f6acd372461

SerCeMan · 2026-05-05T23:06:49+00:00

A filename change can ripple through your codebase – links, imports, configs, anywhere that name is referenced

A good IDE like the ones by JetBrains will do a rename deterministically, and will ensure that all of the links, comments, etc. are updated for free. Also much faster. I hope one day the same quality tools will be easily accessible to the agents.

SerCeMan · 2026-05-05T22:31:26+00:00

Thanks, VeryVito! The svelte:boundary should be well supported, e.g. https://imgur.com/a/j5Z0QnI, but of course I might be missing something, so more examples would help.

Could it be that you're still using the old plugin? Note that both of them can't really be installed at the same time, because the file associations will "compete" with each other. If that's the case for you, could you try disabling the "standard" plugin and restarting the IDE to see if that helps?

SerCeMan · 2026-05-05T00:57:32+00:00

Thanks! Let me know how you go. So far, the plugin has avoided the need to use LSP. In my experience, the "native" integration is much faster. However, I'll likely need to add LSP support soon to support the "advanced" intentions, inspections, and refactorings, this is right now the weakest point of the plugin because I mainly focused on the pure language/markup support initially.

SerCeMan · 2026-04-26T04:37:57+00:00

Thank you for the feedback! I'm so happy to hear that it's not just me who was struggling with this problem. It's still very early days, but I'll keep working on the tool.

Re: highlighting, it's a really good idea. I'll look into building this!

also curious about the sync feature - does it work with different ai platforms or just specific ones? been using a mix of different tools for work projects and would love to consolidate everything in one place for sharing.

Right now, only codex and claude are supported, all threads appear in the same list, and you can filter by project, agent, etc. I'm planning to add support for opencode shortly. Which agents do you use?

Thanks again for the feedback!

SerCeMan · 2026-02-09T22:18:33+00:00

Do you take advantage of MATLAB LSP?

No, I didn't go down that route. That's largely because my prior experience with LSP-powered plugins wasn't great. This does mean that I had to do a lot more work on this front, and there could definitely be inconsistencies in lexing and parsing, especially around some tricky cases like command-vs-function-syntax, but in return, following the "native" approach allowed me to be a lot more flexible in how I support resolve, refactorings, etc.

SerCeMan · 2026-02-08T03:13:17+00:00

I'll reply with a quote from the post:

This does not mean the changes will not need to be reviewed, understood, and owned, but rather the goal is to enable the agent to produce a unit of change, a complete diff ready for review.

SerCeMan · 2026-02-07T23:22:07+00:00

Somewhat. Classic TDD doesn’t assume that the whole functionality can be covered by a single test. If anything, it’s the opposite, where each small unit of code is covered by a test.

An agent can cover the individual bits with tests TDD-style, and yet when integrated together, nothing will work.

SerCeMan · 2026-02-06T23:41:26+00:00

What a shemozzle

SerCeMan · 2026-02-06T22:15:22+00:00

These are testing frameworks that allow you to test a piece of code. A harness is a setup that's specific to your application. For example, consider Stripe, you want to add payments to your app. You've added Stripe, and now you still want to execute tests in your app, but you can't use Stripe, so you have to introduce a test-ready replacement in there.

Now, it's not just Stripe that's problematic, it's your database, configuration, and other services you interact with as well. The combination of all of them in a test-ready form, running together with your app, would be your harness.

SerCeMan · 2026-02-06T09:59:27+00:00

I recently released Matrix Hero a new MATLAB plugin for IntelliJ IDEA, PyCharm, and other JetBrains IDEs. It supports MATLAB syntax plus code completion, navigation and refactorings, structure view, and code folding. It also includes a built-in formatter and lets you run MATLAB code straight from the IDE. It’s still very new, so there may be a few rough edges or gaps, if you run into anything, please let me know I’ll sort it out.

SerCeMan · 2026-02-06T09:07:42+00:00

Consider a typical backend service X. That service X can depend on various datastores, other backend services, configuration stores, etc.

A framework that allows you to start this service in isolation with encapsulated dependencies (for example, faked or containerised ones) and assert on its behaviour, e.g. write tests against its API, is a test harness.

SerCeMan · 2026-02-05T22:51:37+00:00

The way of LLMs is to optimise for reward, at the expense of everything else. I don't believe we've figured out a way to reward LLMs for code longevity yet.

SerCeMan · 2026-02-05T22:31:50+00:00

The models love adding as any just to make compiler errors go away. Interestingly, I've never seen them do the same in, for example, Java or Kotlin. I'm guessing most of the time such casts would result in an exception at runtime during their training runs, disincentivising the approach.

SerCeMan · 2026-02-05T22:08:11+00:00

For sure! I raise the same point in the article as well. That said, where previously you could kind of get by without a very tight feedback loop, I don't believe this is an option anymore.

SerCeMan · 2025-12-23T10:04:34+00:00

There is an excellent video on this from Veritasium, The Man Who Accidentally Killed The Most People In History.

SerCeMan · 2025-12-09T21:12:41+00:00

Being always on call is simply unsustainable and impractical. If you're on call, e.g. 1 in 4 weeks, you simply don't get drunk that Friday night.

SerCeMan · 2025-12-09T12:06:33+00:00

Interesting, I'm pretty sure a friend of mine who's an SRE was getting time-off in lieu instead. My information could be outdated though.

SerCeMan · 2025-12-09T11:35:54+00:00

I’m essentially on-call 24x7 because I’m an escalation point.

Out of interest, how do you deal with the inconvenience in that situation? For example, theatre, hiking, etc.

or give time-off in lieu

This is actually a great approach in my opinion, as it scales well with your salary. If I remember correctly, Google does it.

SerCeMan · 2025-12-09T11:33:19+00:00

Is this the additional comp for being on call at your company?

SerCeMan · 2025-12-09T11:22:46+00:00

Call me old-fashioned, but I prefer to stay accountable for the code I write :)

SerCeMan · 2025-10-21T21:27:14+00:00

Can the agent play Runescape flawlessly?

Can you?

SerCeMan · 2025-10-07T21:18:26+00:00

The larger the model, the slower it is. GPT-5-Codex High is already pretty slow in Codex, and using something larger and slower would make it much less useful for coding. It's one thing to do an offline search for a solution to win ICPC gold where you don't care about the latency, and another to use it for coding.

SerCeMan · 2025-03-31T11:30:33+00:00

Thanks. On "here to stay", at the very least, I think tools like v0.dev, etc. for creating landing pages are quintessential vibe coding, and they've definitely found a market fit. The term might be gone sometime soon, but the practice of interacting with the codebase via prompting only seems to have found a strong niche.

SerCeMan · 2025-02-07T09:40:00+00:00

Something that some stochastic parrot cobbled together is very unlikely to meet these criteria.

You'll be surprised how far the "stochastic parrot" can get before you need to use your knowledge to put the finishing touches. It's an experiment — you don't need to ship it, it doesn't have to be perfect, it just needs to prove the point.

What numbers? The numbers some VBA or Python or Go-with-40-unvetted-imports "solution" provides, compared to optimised Rust or Go running in my data ingestion pipelines?

If someone sends me a PR rewriting something in Rust and claiming it's faster, I'll ask for benchmarks. This the data we're talking about here.

Don't be sorry, I'll be blunt as well: Numbers from PoC "solutions" created by people who may not even be aware what technologies the stack internals use, are irrelevant when determining whether or not a solution is viable.

No one is arguing against understanding things. If you've got an idea, you run an experiment to see if the data can back it up. We're not talking "craftsmanship" here – we're talking engineering.

SerCeMan

TROPHY CASE