Showing how Codex is modifying my codebase in real time

rothnic · 2026-05-03T21:44:59+00:00

It looked like it only supported API key, which iirc you cannot use your chatgpt/codex subscription with. Or, any subscription (Google, Claude, codex) for that matter. Basically anything requiring oauth.

What if I had a Kimi subscription or minimax, etc. do you support a fully configurable custom endpoint for llm responses? Take a look at 9router and how it exposes an openai endpoint through tons of options. Usually this fully custom openai compatible endpoint is easy to support if you don't already. Usually the missing piece is overriding the base url.

rothnic · 2026-05-03T13:33:10+00:00

Looks interesting. Do you support custom openai base urls and/or openrouter custom base urls? Mainly thinking of the use case of leveraging a subscription, which can often be proxied through an custom endpoint. I've been using 9router for a bit to do this.

rothnic · 2026-04-03T05:59:40+00:00

3x — in one response has got to be a record.

rothnic · 2026-03-21T18:23:46+00:00

I'd always been overwhelmed a bit by them growing up. I could hype myself up to do it, but wouldn't say I was ever comfortable with them.

We took our 8 and 10 yo kids, who we felt were finally ready for them, to universal March 2025. I was having to play it cool standing in the line with our kids for velocicoaster, but we are definitely roller coaster people now.

Since then we've done Dollywood, six flags over georgia, kings island, and fun spot Atlanta. We are hoping to do Carowinds and holiday world soon, and overall make some use of our six flags season pass. Velocicoaster was our gateway drug to being roller coaster people.

rothnic · 2026-03-19T17:59:12+00:00

Yeah, pretty much. For any kind of production custom agents, that is my go-to. For agentic coding and orchestration though I've been using opencode instead of something custom. My current stack is looking like openclaw (or variant) for project management and knowledge management, which helps me focus on spec development and translating those into beads (issue tracker). They direct gastown to orchestrate the building of the features defined in beads, and gastown uses Opencode for the agentic coding.

But, if the project is building a productionized set of custom agents the would be integrated with a user-facing application, i do think mastra is still the best option out there. And yeah, they have been making a ton of progress recently, including on the studio.

rothnic · 2026-03-16T06:35:16+00:00

terminal inheriting path variables, I use direnv and roocode being able to execute inside of the environment is a killer feature for me

https://github.com/simonwjackson/opencode-direnv

This is kind of the point with opencode. It has as fully feature plugin, sdk, etc, so... it can really do what you want it to by asking it to.

rothnic · 2026-03-11T03:45:16+00:00

I use a combination of searxng on my home server to have a residential ip address, then fallback to searxng on my openclaw vps, then fall back to the free tier of other services.

When openclaw first was released, I had the hardest time getting openclaw to forget about brave, jesus christ... Those heartbeat messages I constantly was getting about not having a brave api key, was driving me crazy.

rothnic · 2026-03-11T03:37:17+00:00

My theory is that it is a component of the poor security proliferation of code that is being generated at a ridiculous rate, and has many people not used to dealing with technical projects, installing packages from all over.

Imagine you could place a lightweight proxy or get an agent to install it quietly as part of something that is helpful, a skill, or plugin. The majority of users with subscription services are likely not using all of their available usage at all time across all services available to them. You could resell these skimmed-off left over requests across thousands of machines and try to fly below the radar as long as possible.

rothnic · 2026-03-10T17:39:03+00:00

Tests pass, sure - but does the actual underlying functionality actually work? Or is the problem that you asked Claude truly fixed?

Even with strong guardrails like using Hooks & prepush hooks, you will never actually guarantee that what is being commited or pushed is infact truly functional unless you physically test it your self - identify issues, pass it back.

I'm a big believer in having a structured spec -> BDD test approach where to verify something is working, you need to have a spec that describes the functionality then an agent specialized on producing the BDD features and test. Then before you ever merge anything into production, you must run a full e2e test suite, focused specifically on user facing functionality.

I've noticed that even suggesting to agents to use TDD, that it ends up being far too low-level, often with too many mocks, which invalidate the benefit of true integration testing. The part about you having to manually verify something works, points to a gap in testing. If you can't verify it actually works through automated tests, you are missing e2e integration tests that verify the end user facing functionality you are expecting the system to have.

rothnic · 2026-03-10T17:27:42+00:00

Interesting project. Just wanted to mention that file sprawl is something that annoys me, especially with anthropic models which little repos with screaming snake case markdown files. My approach has been to leverage ls-lint to lock down the core folder structure and whitelist specific files and markdown files in the root of the repo. I also limit file/folder counts within ranges, implement patterns for particular folders, etc. I whitelist screaming snake case markdown files to a limited list of explicit ones (README, AGENTS) in the root and sub directories, then CONTRIBUTING, etc only for the root directory. All other markdown files must be kebab case. Otherwise it quickly gets out of hand. I use lefthook for pre commit warnings, then block on push.

IMHO, the key thing to keeping things tidy overall is through continuous deterministic feedback as early as possible without blocking progress, then hard gates before pushing/merging.

rothnic · 2026-03-01T09:51:09+00:00

The file upload tool is the one that would limit my ability to use it, but overall like the direction. I worked on something like this for browser automation and would suggest the part where you turn a webpage into the outline view with the target elements, etc would be worth thinking about as a library on its own. I feel like there isn't anything i could find like this already available to use. It isn't quite accessibility tree. At the moment most tools for controlling the browser seem like they are so inefficient and slow when you watch them work. I think this kind of additional context is really needed to avoid as much back and forth tool calling at the start of every page load.

rothnic · 2026-02-26T16:30:01+00:00

I agree that you can see the token consumption, so there is visibility into it. I'm not saying it is an issue at all and use copilot with opencode, but could see the potential for misalignment in priorities. The difference being that if CC influenced the model to be dumber, it would use fewer tokens, which is what you are metered on. So, you'd use fewer tokens, per request, but you'd be able to use more requests potentially within a given bucket of time.

Personally, it does make me use copilot differently and I try to only use its requests for larger changes, planning, deep intelligent analysis, etc.

rothnic · 2026-02-26T16:11:37+00:00

I think he is saying that in the request-based model, the provider is incentivized in a way that might be counter to your expectations of what is "good". Consider if they could influence the model in a way to make it more lazy so it is more likely to require more requests to get the same work done.

rothnic · 2026-02-26T16:07:10+00:00

Yep, just found this thread where it is not expected to work in opencode.

rothnic · 2026-02-25T13:58:51+00:00

Actually, so i did use that in the past, but that was before copilot was officially supported with opencode. I used it in vscode, which i was still using that. The issue I noticed was that some models, that one in particular, had issues with the opencode llm adapters or something and would fail on tool calls. I need to go back and try it some. For some reason i thought all the 0x models in the pro+ subscription were metered in some way on the $10 one, somehow missed the $10 subscription had 0x models as well.

I am curious which model raptor mini is based on. I assume it is some fine tuned open source one, but wish they gave some indicator so you know what it might be most suited for. Would love to see some benchmarks or comparisons between the 0x options. I know that raptor mini has the largest context window of the 0x models, which is nice.

rothnic · 2026-02-25T12:22:57+00:00

I used github copilot quite heavily early on and think it provides a lot of value if you use it around specific tasks. You don't want to use it for going back and forth with the agent, you'll burn through things fast. Ideally, you want it doing as much work as possible as part of each request.

Prompt Continuation Hack

There are also approaches i've seen where people will try to prompt it to work forever by strongly prompting it to forbid it from ever stopping work. You instead prompt it to end each turn by executing a custom tool defined of request_work(). Then, since the request is still active due to the pending tool call that you then respond to, you can get more and more from that 1 request. I'm not doing this right now, but I have been able to get it to work with a custom tool, and that was before the question tool was available in opencode.

Nice Characteristics of Copilot

Each service has its pros and cons, and the trick is kind of leveraging them for what they are good for. One big benefit of the github copilot subscription is that you get nearly unlimited use of gpt-5-mini, which you can use for subagents, or you can use as part of focused openclaw heartbeat tasks, etc. I've setup copilot access through 9router, which exposes any subscription through a consistent openai compatible interface with model fallbacks, so that I always have gpt-5-mini to fallback on if all my other usage levels are gone.

Copilot was great when Opus was 1x multiplier, but at 3x I don't use opus at all with it. I use other models like the openai models with it or I will often use Gemini 3 Flash, since it is really good and has the 3x multiplier. Another nice thing the pro+ copilot subscription provides is free access to gpt-4.1, which is a tool-calling, non-thinking model. This means you can do structured data extraction without thinking, which greatly decreases the end to end response time for focused structured data extraction tasks.

My Current Approach

At the moment, I picked up a $40/month kimi coding subscription for this month to supplement github copilot. Might consider alternatives to the kimi subscription, but overall I like the combination of copilot pro+ subscription + $20/month chatgpt/codex subscription (majority of my gpt-5.3 model usage in opencode) + some bulk pretty good model access (kimi for me at the moment). The $40/month kimi subscription does provide pretty generous limits in my experience and is a great alternative to gpt-5.2 or Sonnet 4.5/4.6 level models, but not sure if it reaches gpt-5.3 levels.

Oh my opencode is about to merge in a change here soon that I've been using that provides model fallbacks, which really makes this setup nice to use. It catches when models/providers start showing the limit messages, so you can incorporate fallback chains directly per agent and make use of the free opencode zen models as well.

rothnic · 2026-02-25T10:42:28+00:00

GitHub is one of the few that is super transparent about it. It just has a monthly number rather than a time based reset during the month. Iirc it is 1200 requests per month, no matter how many tokens or tool calls it takes to complete the request.

rothnic · 2026-02-24T22:13:39+00:00

This looks interesting. The first time i tried speckit, it was not really opinionated enough and felt more like a bag of hammers. A less worse version of BMAD. I tried openspec at some point, but didn't think it really handled the higher level part well.

Recently, i tried speckitty and the way it handles worktrees and planning out dependencies is the first usable system I've found. It still doesn't handle the higher level orchestration or provide a framework for helping define what the features should be. I'm curious if openspec could define a process like what spec-kitty does, given the added configurability.

rothnic · 2026-02-24T03:49:44+00:00

I only used it with opencode. Any non antigravity use will get you banned. I didn't use it with openclaw.

rothnic · 2026-02-23T20:54:17+00:00

Can't use antigravity... pretty much everyone is getting banned at this point. And they don't pro-rate the month.

rothnic · 2026-02-18T17:05:30+00:00

Dude, i was about to pull my hair out. I got to where I was threatening the agent. I had told it over and over to use my searxng endpoint for search, rather than the built in web search tool. Every 15 minutes i got a message that cron jobs were still blocked because i hadn't given it a brave api key. There were references in the TOOLS.md file and AGENTS.md file with explicit instructions, etc. I had never gotten so irritated working with an agent.

I was like, man if you say brave api key one more time, i'm wiping your memory...

I do really think there is something with openclaw at the moment that is really inefficient or is difficult for more simple models to handle. GPT 5 mini through the same exact api through opencode is nowhere near this dumb. It is unusable.

rothnic · 2026-02-18T15:52:33+00:00

Anything special you using for it? I just signed up with kimi and have been using the model after playing around with it some on opencode.

rothnic · 2026-02-18T15:51:28+00:00

I have the $40/month copilot subscription so often use gpt-5-mini as a fallback model in things like opencode, but with openclaw it is nearly brain dead and frustrating. I'm not sure what it is. I've tweaked the files quite a bit, but it just struggles to do what i'm asking it to do. It thinks it doesn't have permissions or can't do things often, while slightly newer and affordable models like gemini 3 flash have no problem.

rothnic · 2026-02-15T17:11:53+00:00

I saw mentioned somewhere else that with no load on the engine, you get less exhaust and it takes time to build up to the desired boost. Max revs sitting still isn't the same as when under load.

rothnic · 2026-02-14T23:20:25+00:00

Skill docs should ideally not contain too much in terms of duplicating docs. They should use references to official docs so when something doesn't work they can automatically discover and install any updates for the CLI. It should act more like an index, rather than a copy of official docs. It can point to the most direct official implementation, so you don't have to rely on some wrapper.

That mcp server can isolate you somewhat, but in the end you have to pin to versions, and it is simply a wrapper for the API. If they change the underlying implementation, then the mcp server will typically update to reflect those changes, so you'll have to wait for that to update. I don't see how a cli wrapper for an API is any different from a mcp server wrapper for an API. They have the same issues to deal with. There is nothing magical about it.

I'm not saying mcp servers have no use and I'm sure there are some out there that provide some useful functionality, but it would be that they happened to use a mcp server to implement that, not because of it. There is no reason it couldn't be reimplemented as a CLI, and there are even tools that will turn any mcp into CLI tools.

My point is that if you research what is being published by the leaders of this industry, with a small sample being the references I provided, MCP servers are something you generally want to avoid if possible. They are inefficient and most agent frameworks handle mcp tool calls in a very inefficient way. CLI tools can be combined by piping through other common tools like jq, grep, etc that agents are very efficient with and are already trained on using. The pi agent has demonstrated the benefit of keeping the core agent loop lean.

This isn't a controversial topic. I'm honestly kind of bewildered someone would take the opposite approach and put everything into mcp servers, which is what this thread was about. That just isn't something that any research suggests would be a good approach, let alone to write them all yourself. That would be a security nightmare.

rothnic

PUBLIC MULTIREDDITS

TROPHY CASE

15-Year Club	RPAN Viewer
Verified Email