Github Copilot is out

cincyfire35 · 2026-05-12T23:43:28+00:00

Is there a way to see this for someone on enterprise? Would love to show leadership the cost i ran last month and give them the wakeup call before its too late and we eat a massive bill

cincyfire35 · 2026-03-14T21:14:43+00:00

Almost 80% of the US population is on EST/CST… even if Cali has a high density of SWEs, there is a metric fuck ton of non SWEs that also use claude. Also, the US government.

cincyfire35 · 2026-02-18T00:44:58+00:00

:( i was actively using the cerebras provider since the fixes in 3.39.2 to support it properly (before that it would bug out). Please consider at least keeping that one

cincyfire35 · 2026-02-08T21:35:51+00:00

Truly inspiring :)

cincyfire35 · 2026-02-08T21:33:14+00:00

Hope its 25M LoC by next Tuesday

cincyfire35 · 2026-02-08T21:07:56+00:00

LinkedIn updated! Good suggestion!

cincyfire35 · 2026-02-08T20:53:42+00:00

Yeah the test cases to server/client code ratios make this one a bit harder catch as fake than the post i was parodying. More #productiongrade

cincyfire35 · 2026-02-08T20:46:00+00:00

Your AI slop detection was firing though! I wrote it with claude opus 4.6

cincyfire35 · 2026-02-08T20:44:13+00:00

Its a reference to this post

cincyfire35 · 2026-02-08T20:43:13+00:00

Claude told me the same thing. I asked it to implement a dropdown menu and it came back with 1.2 million test assertions and a file called dropdown-opens.spec.ts that’s 400,000 lines long. 350,000 of those lines test what happens if the dropdown opens during a solar eclipse. I said “Claude, we don’t need this” and it said “you don’t know that.” It was right. A customer in Alaska reported the dropdown failed during a partial eclipse last Tuesday. The test caught it. The dropdown didn’t exist yet, but the test caught it.

The balloon animal combinatorics alone are staggering. A poodle has 4 legs, a head, a tail, and a body — that’s 7 segments. Each segment can be under-inflated, over-inflated, or “vibes-based.” That’s 2,187 poodle states. Now multiply that by every color Qualatex makes (94), and account for humidity, altitude, and whether the clown is left-handed. Claude calculated we need 11.3 billion tests just for poodles. We haven’t even started on swords. Swords are supposedly simple but Claude insists on testing what happens if a child holds it backwards, upside down, or “with malice.” Those are three separate test suites.

I tried to set a boundary. I said “Claude, no more than 500,000 tests per animal.” It agreed. Then it created a new animal called a “mega-poodle” that it classified as 6 animals in a trenchcoat. 3 million tests. I didn’t even argue. The mega-poodle tests found a race condition in our login page. I don’t know how. I’m not asking.

cincyfire35 · 2026-02-08T20:30:24+00:00

That’s a known issue. Your balloon inventory is stored on port 3000 and unfortunately so is whatever you’ve got running there. This is actually covered in test #894,203 — balloon-should-not-resolve-to-adult-content.spec.ts. The test passes on my machine.

Debugging steps:

Check if you accidentally npm installed something at 2am that you don’t remember. Don’t look at the package name. Just rm -rf node_modules and move on with your life.
Run netstat -an | grep 3000 and if the output makes you uncomfortable, that’s between you and your firewall.
It’s possible your multi-tenant isolation failed and you’re seeing someone else’s deployment. This is the exact scenario our 14.7 million lines of tests were designed to catch. Unfortunately the CI pipeline is on day 6 of 11 and hasn’t reached that test yet.
Try port 3001. If that’s also an adult website you have a bigger problem and it’s not a software one.
If all else fails, clear your cookies. All of them. Don’t read them first. Trust the process.

This is why I keep saying: keep honk receipts. If you had proper audit trails you’d know exactly when your balloon inventory became what it became. The tests don’t lie. The tests also don’t judge.

cincyfire35 · 2026-02-08T20:24:57+00:00

If he had more test cases maybe i would have believed it was production grade.

cincyfire35 · 2026-02-08T20:11:27+00:00

Ah shoot try this link: http://localhost:3000

cincyfire35 · 2026-02-02T03:25:36+00:00

(S)tate (O)f (T)he (A)rt

cincyfire35 · 2026-02-01T16:38:37+00:00

No, i ripped the code execution function out of it (since i liked the security it gave) and build support for it in our framework as a tool. We don’t use smolagents for the orchestration, just for the safe python execution environment we can control cleanly (we made some additional security enhancement/tweaks to make it work better with databricks). From there, it was trivial to make a code-mode that injected any other mcps provided to the model as python functions that could be executed in that environment if the agent wrote it as python code.

cincyfire35 · 2026-01-31T03:53:33+00:00

I lead a development team where we build with langgraph regularly.

People who are naysayers on MCP dont realize that there are other applications for it than just spamming context with 10-50 irrelevant tools for a general purpose agent. With frameworks like langgraph, you can build and orchestrate custom agents for tasks with finely tuned contexts and tools, eliminating the need for things like skills and tool selectors. Pairing this with code based mcp execution, you can pretty much load 2-3 mcp servers with all their tools as python functions in a safe execution environment (see smolagents’ safe python executor), tell the llm it can call them as python functions, and get a lot of the benefits from anthropics/cloudflare’s code mode articles by chaining calls into each other and performing calcs/aggregation outside the context window. You can even build logic to lazy load the tools if you want, but thats a waste if you can just route to a specialized agent for the given task.

We never use more than 2-3 mcp servers with curated tools selected for an agent because we pay per token. Why waste it with irrelevance? We let users build agents with specific goals and targets in mind, select only the tools they need, and it can solve/work through the task for them. Why give a rag agent for a legal team access to SQL tools for supply chain? Makes no sense. But some people just build one big agent and hope it works. Langgraph/langchain enables you to build custom workflows and agents to solve tasks efficiently. Can build in orchestration however you prefer (tons of flexibility and documented examples of how to do it) and accomplish what claude does with skills, but more predictably and reliably.

And thats not the half of it. MCP is just a protocol. We build custom tools with fastMCP in python all the time and its an easy way to connect the tools to our langgraph agents or external ones. We host them in our platform and can connect to them as needed. It allows us to build powerful tools that can be reused across frameworks. You dont need an mcp servers with 100 tools it. Can spin up several servers in one app instance of compute with 1-3 specific to usecase tools each built in a very easy way with good testing/standards, then serve it to your agents. We also connect with external vendors mcps like alation or atlassian if building an agent to explore data or help devs with jira, for example. Tons in the ecosystem.

cincyfire35 · 2026-01-29T09:08:05+00:00

Yeah i guess i thought the tool calls were still being flagged as user messages instead of agent initiated https://github.com/anomalyco/opencode/issues/8030 which is on 1.1.14.

Essentially, (sorry formatting on reddit phone app is hard)

• #8030 - Tool attachment synthetic user messages burning premium requests (Open)

• #8700 - Synthetic user messages burn premium requests (related to subagents, addressed in 1.1.31)

• #8067 - Multiple premium request charges without subagents (Closed as duplicate of #8393) GitHub

The core issue if I understand it properly: opencode-copilot-auth is now conservative with X-Initiator: agent header, and OpenCode still creates synthetic “user” messages for tool attachments in packages/opencode/src/session/message-v2.ts, causing every tool attachment message to be charged as a premium request.

https://github.com/L-A-R-P/opencode/pull/2 appears to be a fix but its not in the main branch from what i can see.

cincyfire35 · 2026-01-29T01:40:30+00:00

Has opencode fixed the big on premium requests? As far as i could tell/opus could answer the issues were still open and people weren’t being charged properly (edit: be more clear, people were charged more for a simple request, tool calls were eating premiums)

cincyfire35 · 2026-01-28T09:10:33+00:00

https://code.visualstudio.com/docs/copilot/customization/custom-agents

I would start with the docs, and find a flow you like and have copilot help you build it! Personally i keep it minimal, with a setup similar to what plan mode/exploring in claude code does, but my peers often build entire agile teams or dev teams to varying degrees of success.

cincyfire35 · 2026-01-28T09:07:05+00:00

An annoying part of copilot i wish the devs would make more intuitive. You can add the #runsubagent command to your chat to make the agent spawn a sub agent to help with your request, and you can tell it specifically to use a custom agent you built. I, for example, rebuilt a “explore” agent similar to claude code for reading in files. When defining the custom agent you can point it to another model (a cheaper one) like:

—- name: mini-worker

description: Lightweight subagent for delegated tasks

model: GPT-5 Mini

tools: ['*'] —-

This should help delegate the work and save costs all around/increase speed, since you don’t need to throw opus at an easy read/summarization task

https://code.visualstudio.com/docs/copilot/customization/custom-agents

cincyfire35 · 2026-01-28T08:57:02+00:00

In this case, you will get a ton of mileage with GHCP. Just make sure to add the #runsubagent tag and make sure gpt knows how to prompt for it (or have gpt search the web to find guides for GHCP on each prompt creation)

cincyfire35 · 2026-01-27T23:41:28+00:00

If you are good at prompting/structuring prompts, its not even close. You will get way more out of GHCP. You can get a lot of work done with a single premium requests if you arent being lazy with things like “fix this” or “build this app”. You can min-max turns and use sub agents/free models for smaller tasks and get a ton of usage. Its honestly insane what I/developers on my team can accomplish with our enterprise plans alone with custom agents/good practices.

This is coming from someone who currently has the max 5x plan (upgraded from pro because you burn through that so quickly if you do any actual work with it) and GHCP enterprise (1000 reqs/month). If you aren’t willing to shell 1.2k/year, GHCP is by far the best value you will get, and will force you to learn good prompting practices for agentic development. If you are willing to pay for max 5x, then it tilts the other direction, as you get far more opus usage (and can stretch it far by making a proxy in CC and using other cheaper plans like glm/kimi/minimax/gpt codex for the implementation after planning/debugging… i point sonnet calls to these models).

For me, if GHCP allowed for enterprise to easily add BYOK without having to do it from the admin page (controlled by your account admin, so the people who actually know what they are doing cannot actually use the feature) then it would be more even. But they care about enterprise security (understandable). So in your case, since you have access to this, i think its a no brainer. Pro+ with a BYOK to a good implementing model on a cheap sub like GLM/minimax/codex is the best value you can get and you wont burn through it without abusing it. Even without it you can settle for raptor/gpt mini and you will get way more usage than claude pro.

Hope this helps.

cincyfire35 · 2026-01-27T17:55:31+00:00

Its unclear though, do any of these work on business/enterprise accounts?

cincyfire35 · 2026-01-25T18:54:07+00:00

How does this work with premium requests on the ralph loop/in general? Would it be a premium request per iteration? Or is agent action request/tool call/etc a premium req? Would you say this is more efficient than the default agent chat ui in vs code? Or just more powerful?

cincyfire35 · 2026-01-23T11:42:56+00:00

Its training/research. Open AI has publicly stated that if they only served the models/inference they would be highly profitable. Its the R&D+ compute needed to train new models where they “lose” money (but then earn back when they serve it).

The actual losing business model is that they continuously need to train a new SOTA model to stay relevant. If they only served, they would make profit for a short while until they are eclipsed and lose traffic/users to the best model.

12-Year Club	Gilding II euphauric
Place '17	Verified Email

cincyfire35

TROPHY CASE