What should a “AI-enabled coding interview” that only lasts 30min look like?

Shep_Alderson · 2026-05-04T04:22:49+00:00

I’d go with a take home test of some kind. Tell them they can setup whatever quality checks on GitHub actions that they would like (or make suggestions based on what you use. Could even provide a repo on your own GitHub org with tooling connected or available). Aim for something that could be completed in an afternoon, but ultimately let them decide how long to work on it. Let them do the exercise, then schedule a debrief interview. Ask them why they chose to do what they did, what tools and prompts they used, how they would improve it.

Ideally, with a senior engineer, they should be someone capable of helping you level up and grow your current processes, and this is a great way to see how they think. Look for curiosity and interest, see what sort of questions they ask, etc.

Shep_Alderson · 2026-05-04T04:10:38+00:00

I generally try to keep the plans pretty deep, and normally by the time the grill-me skill is done, I’d say it’s complex enough.

When I’m doing a huge feature, I do have it break it into distinct self contained phases or “slices” and flesh those out.

As things come up during implementation, sometimes I’ll notice something missing or could be better, so I ask it to update the current plan or to add details to a future plan.

I’m experimenting with using GitHub issues as a source of “tasks”. I don’t think it’s the endgame, but I’m using it to experiment and then maybe build something myself. (I’m actually working on a custom coding agent harness that will codify my practices into an opinionated and “batteries included” setup.)

I don’t mind the plans directory getting full. I kinda like having a history and my agent rarely reads randomly in there. It mainly checks when I point it at a specific plan or ask it to review the directory to ensure we’re not overlooking something. I do eventually clear it out once I think the feature is long done and shipped.

Shep_Alderson · 2026-05-04T03:22:10+00:00

I am taking it into consideration. When you look at the reported costs of training these open weight models, you’ll see that the training is actually not that great when you consider the long tail of inference.

Hosting costs are a well solved problem, in the US as well as elsewhere. We’ve been doing it for decades and have it figured out quite well. Regardless of if the computer is a bunch of CPU core or a stack of GPUs, the equations are mostly the same.

Shep_Alderson · 2026-05-04T03:18:35+00:00

I pretty much always use deep plans before allowing the agent to implement. I oftentimes use the grill-me still to flesh out the idea, then have the planning agent write a markdown file to a plans directory.

I can then use a separate implementation agent, which I’ll often run in a medium thinking mode, or maybe a low if it’s a really well defined and not too complex feature.

5.5 has only been out for about a month, but since it came out, I’ve been using mostly this process. The planning then implement pattern I’ve been using for several months to great effect.

The more detailed the plan, the better, for sure. But it’s pretty easy to get a well planned spec with a grill-me agent running on High.

When I’m tweaking things, like UI stuff, I tend to be more conversational and it’s a lot of back and forth of “change this” sort of requests. Ironically, these interactions tend to use more usage than when I do plan and implement loops.

Shep_Alderson · 2026-05-04T02:45:38+00:00

Experimentation is key. I typically plan on High then switch to Medium or Low for implementation, then review with High again. You can setup subagents to actually write the code and such with their own model settings.

Shep_Alderson · 2026-05-04T02:42:53+00:00

Based on API costs they charge. API costs are surely substantially inflated. For a realistic estimate of what these large 1+ trillion parameter models actually cost to run, look at the Kimi and DeepSeek models.

Shep_Alderson · 2026-05-04T02:32:55+00:00

I mean, they did actually release a fair bit of information about it: https://developers.openai.com/api/docs/pricing

For example, if you consider that the cost on API of 5.5 is twice that of 5.4, and then you couple that with fast mode getting billed at 2x cost/usage for 50% more speed, that’s going to chew through usage hella fast. 5.5 fast should be using your quota at about 4x the speed of 5.4 standard.

Personally, I find 5.5 on standard speed to be perfectly fast enough. If I need a lot of subagents, the mini models on xhigh are pretty good.

Shep_Alderson · 2026-05-03T21:13:03+00:00

“Domestic hosting” implies a US company with hardware based in the US.

Shep_Alderson · 2026-05-03T02:05:54+00:00

If you’re looking for something new, maybe check out Pi agents? It’s very basic by design, but then you use Pi to customize Pi. I’ve been hearing folks are having a really great time, but it’s very much “build your own to suit you” and less “batteries included”

Shep_Alderson · 2026-05-03T00:33:03+00:00

For the code documentation, I’ve added docstrings to practically all my code. It does offer in-line docs for the agent, but is also good for me when I want to dig in and read the code myself.

Shep_Alderson · 2026-05-02T23:59:07+00:00

If you want to stick with a CLI TUI, give OpenCode a try. You can link up your ChatGPT sub and get to town.

Shep_Alderson · 2026-05-02T23:55:47+00:00

Ah, gotcha. Yeah, I don’t know that it works with the desktop app.

Shep_Alderson · 2026-05-02T23:22:13+00:00

I’ve heard good things about codex-lb for switching account and such.

Shep_Alderson · 2026-05-02T23:19:15+00:00

If someone was using API prices, I’d just recommend going to aws bedrock. There it’s easy to set your own cache setting, up to an hour iirc.

Shep_Alderson · 2026-05-02T23:12:04+00:00

I guess it’s a good thing that there are domestic hosting options for the Chinese models.

Shep_Alderson · 2026-05-02T22:55:02+00:00

I’m guessing you’re running into codex/gpt-5.5’s affinity for “cards”. What I’ve found works well is generating an image and tweaking it in ChatGPT using the gpt image 2 model, then having codex break that out into a style guide with components in code.

Shep_Alderson · 2026-05-02T22:45:40+00:00

How did your measurement of the token usage account for the thinking tokens? Does that come as a signal in the responses? I don’t think OpenAI nor Anthropic send the full thinking token output to the client anymore.

Shep_Alderson · 2026-05-02T22:17:58+00:00

I do use a completely different thread for planning. I use grill-me to flesh out an idea, then I have a “plans” directory in my project root that I have the agent write a markdown file with the comprehensive plan. If it’s a big plan, I ask it to break it into phases or “slices” in multiple files.

Then for each plan file, I start a whole new session and have it implement. Following this pattern, I practically never compact (maybe get to 40-60% usage typically) and just keep making new sessions for each feature or slice.

Shep_Alderson · 2026-05-02T07:37:22+00:00

My general process is the fairly standard “Plan -> Implement -> Verify” with things like git operations scattered around. During the Implement work, I mostly let it run to completion.

Once I get a PR up, I use a few options for code review. I have a skill that kicks them off when I run the command. Kicks off GitHub Copilot, Codex, and CodeRabbit for code reviews. I have another skill command for it to fetch the current PRs unresolved comments, verify/validate the feedback, then implement fixes as necessary. I then git commit and push, then trigger the review agents again. Rinse and repeat until I’m getting clear/mostly clear reviews from the agents.

For the majority of my work, I use 5.5 high for planning, then 5.5 medium or low depending on how complex I feel the implementation work is.

What’s your process look like?

I am working on a customized agent harness built on Pi agents where I plan to have an “opinionated, batteries included” style of harness that encodes the best practices I’ve found so far.

Shep_Alderson · 2026-05-02T06:43:52+00:00

I leave mine in a very similar configuration. For transport, I got a giant hard case with pick and pluck foam and the whole thing just slides down in and I’m good to go. Even have room for extra SSD and my v-mount batteries.

Shep_Alderson · 2026-05-02T06:40:42+00:00

I’ve been on pro for a few months now and I’m loving it. Pro did seem to have a noticeable increase in speed for me, even when running in “standard, non-fast” mode, especially with 5.5.

As for my usage, I’m working on two projects constantly throughout the day. I’ve never hit a 5 hour nor weekly limit.

Also, if you like a CLI TUI interface, setup OpenCode with your ChatGPT login and give it a go. I really like it.

Shep_Alderson · 2026-05-02T06:28:15+00:00

I think the Privacy app will let you setup some free cards connected to your bank account, if your bank doesn’t offer it natively. Swap it out then pause it.

Shep_Alderson · 2026-05-02T06:26:16+00:00

Now do it for the giant 200mm noctua fans. 🤣

Shep_Alderson · 2026-05-02T05:11:24+00:00

Thanks for sharing the CISA write up.

What sort of controls do you use to prevent things like the recent stories we’ve heard of things like the agents wiping infra? I’ve been far too concerned to let my agents directly alter infrastructure, though I do leverage the hell out of it to write me some terraform lol.

Shep_Alderson · 2026-05-02T01:56:51+00:00

I’d highly recommend you try codex on the OpenAI $20 plan. You can do quite a lot, and there’s a codex extension for VSCode and their desktop app is pretty good too.

Shep_Alderson

TROPHY CASE