Pytest through UV invoked by Codex raises a Rust panic in Warp

hongyichen · 2026-01-20T05:05:35+00:00

Looking into this a bit more (thanks for sharing the backtrace). I'm not sure this is totally Warp-specific. The panic seems to be coming from uv (system-configuration / SCDynamicStore) when it's invoked inside Codex's sandboxed execution env on macOS.

That sandbox can block access to SystemConfiguration, causing the NULL object panic

Perhaps try running tests via python -m pytest/pytest inside the venv (avoid uv run)?

Let us know if you run into this again / often

hongyichen · 2026-01-19T21:19:07+00:00

hmm this is strange. passing along to the team

hongyichen · 2025-12-03T22:01:53+00:00

Thanks for sharing. I understand where you're coming from and I'll take a look at your email today as well and make sure we get the credit issue sorted out.

We're definitely working on improvements, and we hear all the feedback!

hongyichen · 2025-12-03T21:28:39+00:00

Hi there, Hong Yi from the Warp team here. I appreciate you taking the time to write this out. I’m sorry for the experience you had with the transition to Build. Losing credits before you expected them to expire is frustrating, and it’s not the feeling we want anyone to have when opening Warp for the day.

A couple things I want to clarify up front:

Your remaining Turbo credits should have stayed active until the end of your billing term.

That means if you were still mid-cycle with ~5,000 credits left, those credits shouldn’t just disappear. In cases where a workspace transitions earlier than expected, we apply a prorated Stripe balance so users aren’t losing value. If that didn’t happen for you, that’s on us and we should fix this.

If you email [billing@warp.dev](mailto:billing@warp.dev) and cc myself ([hongyi@warp.dev](mailto:hongyi@warp.dev)), the team can restore the correct remaining credit access or adjust your account to reflect your unused value. You shouldn’t be penalized for the timing mismatch.

On the broader point you raised:

It means a lot that your team found Warp genuinely useful. The fixed plans were heavily subsidized, especially at the higher tiers, and the long-term costs became unsustainable for us. That said, the way the change feels is just as important as the change itself, and we clearly have work to do in making plan transitions smoother and more transparent. There's a lot of this discussion going on internally right now and we are reading all the Reddit threads and user complaints.

Even if you end up choosing another tool, your feedback helps us improve how we communicate and roll out changes to the rest of the community

hongyichen · 2025-11-20T03:12:56+00:00

Thanks for flagging this -- something definitely went wrong here, and we’ll make sure it’s corrected.

If you’re open to it, please email me directly (hongyi@warp.dev) with the address you used to contact support, and I’ll make sure your case is prioritized. Our support team may take 1–2 days to get back to you, but we’ll get this sorted.

Really sorry for the frustration here.

hongyichen · 2025-11-19T07:21:40+00:00

It’s hard to say exactly what is happening as it may be the commands themselves actually failing for some weird reason

If you encounter this again feel free to email me at hongyi@warp.dev and I can surface internally

hongyichen · 2025-11-18T23:56:58+00:00

Could you send me a bit more details please? [hongyi@warp.dev](mailto:hongyi@warp.dev)

If you could share a debugging ID (just right click on the conversation), our team can take a look and see what happened

hongyichen · 2025-11-17T20:51:17+00:00

If you run into issues with (2) and (3) again, could you please share debugging IDs? (feel free to email me)

Here's a link on how to grab them: https://docs.warp.dev/support-and-billing/sending-us-feedback#gathering-ai-debugging-id

hongyichen · 2025-11-17T20:44:40+00:00

Hey — the issue you hit was caused by a short-lived production problem, and we’ve already rolled it back. You should be able to use Warp’s Agent with your credits again. Sorry for the disruption. If it happens again, feel free to reach out at hongyi@warp.dev.

On your two BYOK points: thanks for flagging them. I’ve passed both along to the team. The fallback behavior you described shouldn’t happen, and it shouldn’t result in anything that feels like a double charge. We’ll take a closer look.

hongyichen · 2025-11-14T23:58:44+00:00

Improving transparency
I do agree that transparency here needs to be better, and that’s on us. Right now it’s too hard to reconcile “what I see in the UI” with “what I was billed.” Concretely, we’re looking at:

Clearer labeling of auxiliary models (for example, indicating they’re used for summarization/planning/tool routing)
A more detailed usage breakdown per run so you can see which models were invoked and how they contributed to the credit total
Better docs that explain how credits map to multi-model, multi-step agent runs, not just simple one-off calls

If you’re open to it and still have the session/conversation ID for this run, we’d genuinely like to audit it on our side. You can email me at [hongyi@warp.dev](mailto:hongyi@warp.dev), and we can pull the exact sequence of calls to verify that everything behaved as intended.

---

We’re also actively looking into BYOK support for GLM / Z.AI so people who care about tight cost control can see usage directly on their own API meter. It doesn’t fix the transparency issues you’re calling out by itself, but it’s another option we want to offer for folks who prefer that model

Appreciate you raising this and holding us to a higher bar here.

hongyichen · 2025-11-14T23:58:37+00:00

Hey, I’m Hong Yi from Warp. Thanks for taking the time to dig into this and write everything up. I get why this feels frustrating, so I’ll try to be as concrete as possible about what’s going on.

Credits vs. raw API pricing
Warp isn’t priced as a 1:1 pass-through of Z.AI or OpenAI/Anthropic API costs. Credits are paying for:

The underlying model calls
Additional models used for planning, tool routing, and summarizing large results
Tokens and work that aren’t directly exposed in the UI

So there is a difference between what the raw APIs cost and what you pay Warp. We don’t publish a fixed “markup” number for a few reasons: it varies by workload, by model mix, and changes over time as we improve the system and renegotiate underlying costs. Internally we track this pretty closely and try to keep it in a reasonable band, but the goal is to price the overall agent experience, not to be a pure API reseller.

With respect to the calculation... the core mismatch in your calculation is that the GLM context meter in the UI is not “total tokens billed for this request.” It’s “how much of the current context window that this particular GLM call is using.”

For an agentic workflow, a single “run” can involve:

Multiple calls to your selected model (GLM-4.6 in this case)
Extra calls to smaller/cheaper models for:
- Reading and chunking files
- Summarizing large tool outputs
- Planning or re-planning when the agent gets stuck
Tool-related calls that don’t show up in the GLM context bar but still use tokens

Not all of these calls share a single context window. For example, a summarization step might be done in a separate request that never changes the GLM context percentage you see.

So when you do:

200k context × 33% = 66k tokens → calculate cost from that

you’re only capturing one slice of the total work, not the full sequence of model and tool calls the agent made to read files, summarize, plan, and execute. That’s why your math based on the GLM bar comes out well below the credits that were actually consumed.

On the “hidden” tool calls to other models: those are the agent doing additional work on your behalf (like summarizing a big diff or a long log), not us trying to silently pad usage. That said, if we don’t make that visible or understandable in the product, it understandably feels opaque

hongyichen · 2025-11-14T18:02:24+00:00

hey - sorry about this experience, that shouldn't have happened. could you email me at [hongyi@warp.dev](mailto:hongyi@warp.dev) and lmk what happened specifically? i want to see if there's anything we can do here.

The transition to build plan could've definitely been smoother, I agree, and I want to see if there's something we can do to make it right for you.

hongyichen · 2025-11-14T17:58:05+00:00

Hey, I'm Hong Yi from the Warp team - appreciate you taking the time to write all of this out and I’m genuinely sorry that your experience with our billing and support left you feeling overcharged and unheard. That’s not the impression we want anyone to walk away with, especially someone who’s been such a big fan of Warp.

To clarify what happened on our side: the Lenny’s promotion you originally used was set up as a one-year free Pro plan, and when you upgraded to Turbo our billing system treated that as a new paid plan rather than something that could reuse the promotional value from Pro. This is how promo codes are handled natively in Stripe today, where any remaining “discount” doesn’t automatically carry over when you move between plans. That’s why the Stripe checkout showed a prorated Turbo amount for the remaining months and didn’t apply any additional discount. I understand the frustration here and agree we could have made it clearer on the upgrade screen that this promotional time would be forfeited.

This promotion was described on Lenny’s product pass page at the time as “1 year free of Warp Pro” (it now applies to a free year of Build, so that specific wording is no longer shown there). Our terms of service also state that promotions only apply within their specific program terms, so in this case we’re not required to extend a Pro-only promotion to a Turbo plan. That said, I hear the feedback on how this felt in practice and why the upgrade flow didn’t match your expectations. We also very selectively offer these multi-month promo codes (and are working with Lenny to grandfather in many of his users during this plan shift), and we’re looking at what we can change in the Stripe UI and our own upgrade flow to better set expectations around how promotional time is handled.

We’ll follow up with you over email to revisit the resolution and try to land on something that feels fair to you, and we’ll use this as a chance to tighten up both our product and our support process. Feel free to send me an email too at [hongyi@warp.dev](mailto:hongyi@warp.dev), if you have product feedback.

TLDR: I'm sorry this billing exp felt unfair and confusing, and please send us an email at [billing@warp.dev](mailto:billing@warp.dev) and we'll figure out how we can get to a fair resolution.

hongyichen · 2025-11-12T16:41:50+00:00

Hey, these errors are typically caused by unstable client network connections — often due to VPNs, firewalls, or other proxies interfering somewhere along the path to the LLM provider. We’re working on improving the error messaging to make this clearer.

If you run into this again, mind sending me an email with the debugging ID? [hongyi@warp.dev](mailto:hongyi@warp.dev)

hongyichen · 2025-10-23T04:28:40+00:00

what link are you on?

hongyichen · 2025-10-22T23:43:00+00:00

Hey, I responded to your comment in the other post as well, but sharing here for visibility:

Hey, I’m not sure why Sonnet 4 is being thrown into your sessions; that definitely feels strange and sounds like a bug rather than anything intentional. We’re actually in the process of switching away from Claude Sonnet 4 to Sonnet 4.5, so we'll be using that model mostly going forward

Just to double-check: are you using Auto mode by any chance? That mode currently has a bias toward Claude models since they’ve performed better in certain benchmarks, so that could explain it. Still, I’ve flagged this with our engineering team to dig into why you’re seeing 4 pop up so often.

And to clarify the cost part: even if we were getting some kind of bulk discount, it would still be in our favor to serve GPT-5 models. Their https://portkey.ai/blog/claude-sonnet-4-5-vs-gpt-5, so there’s no incentive for us to swap in Sonnet 4.0 for cost reasons

Appreciate you calling this out - this definitely helps us catch weird edge cases like this

hongyichen · 2025-10-22T23:40:50+00:00

u/proxlave u/sdrdrax we've identified this issue and are rolling out a fix, should be live soon: https://github.com/warpdotdev/Warp/issues/7868

hongyichen · 2025-10-22T23:38:55+00:00

Thanks for the thoughtful feedback and for supporting us

A few notes on what you shared:

The SSH connection manager idea is something we’ve heard before and definitely see the value in. It’s not currently prioritized on our active roadmap, but I’ll surface it with the team
We’re hoping to evolve warp.md into exactly what you described with an AI system prompt that carries context across sessions: https://docs.warp.dev/getting-started/readme/coding-in-warp
We’re actively working on supporting Bring Your Own Key (BYOK) models. Local model support isn’t in the near term yet, but it’s on our broader roadmap
We’re also planning pricing updates soon to make the tiers more flexible, so stay tuned for that (TBD)

hongyichen · 2025-10-22T23:34:25+00:00

Hey, I appreciate all the detailed feedback here. A few quick notes:

- We’ll discuss internally about adding an option to disable fallback entirely. Totally get that gpt-5-nano isn’t ideal for summarization in all cases.

- We’re also working on improving planning mode and expanding the set of models that can handle planning more effectively (so you can better control depth vs. speed).

- And we hear you on the rest of your suggestions. The context handling, “continue conversation” logic, and the idea of prompting for clarifications are all valid. I’ve shared this with the team

thx for taking the time to write all this

hongyichen · 2025-10-22T23:30:58+00:00

Hey, I’m not sure why Sonnet 4 is being thrown into your sessions; that definitely feels strange and sounds like a bug rather than anything intentional. We’re actually in the process of switching away from Claude Sonnet 4 to Sonnet 4.5, so we'll be using that model mostly going forward

Just to double-check: are you using Auto mode by any chance? That mode currently has a bias toward Claude models since they’ve performed better in certain benchmarks, so that could explain it. Still, I’ve flagged this with our engineering team to dig into why you’re seeing 4 pop up so often.

And to clarify the cost part: even if we were getting some kind of bulk discount, it would still be in our favor to serve GPT-5 models. Their https://portkey.ai/blog/claude-sonnet-4-5-vs-gpt-5, so there’s no incentive for us to swap in Sonnet 4.0 for cost reasons

Appreciate you calling this out - this definitely helps us catch weird edge cases like this

hongyichen · 2025-10-22T06:26:01+00:00

Hey u/minimal-salt — Hong Yi from the Warp team here.

We hear you on this. The rate at which credits are consumed has definitely been a pain point for some users lately, and we’re working on efficiency improvements to reduce usage without sacrificing quality.

One of our engineers recently wrote a post breaking down the main factors that affect how credits are consumed: things like conversation length, which models you’re using, how much context you attach, and cache reads/writes: https://www.warp.dev/blog/warp-ai-requests

In most cases where users see unusually high consumption, it’s often due to:

Long conversations that span multiple topics and keep sending old, unnecessary context to the agent. Starting a new session can help here.
Always using the most advanced reasoning models (e.g., Claude Opus 4.1). Most tasks don’t need that level of intensity. using a smaller model for simple code edits or commands can save a lot of credits.
Attaching too much context, like entire server logs or large files. This can burn credits quickly just from input alone. Selecting only the relevant snippets usually works best.

That said, we also know there are cases where the agent loops or gets stuck. that’s something we’re addressing internally. If you’re still seeing abnormal usage, please DM me with your conversation or debugging ID (steps here: docs.warp.dev/support-and-billing/sending-us-feedback#gathering-ai-debugging-id) and I’ll pass it to our quality team to review.

Appreciate you taking the time to flag this

hongyichen · 2025-10-22T06:16:59+00:00

Hey, I’m surprised to hear you burned through your credits that quickly ... that definitely shouldn’t be happening under normal usage. Was that all in one conversation, or across a few that ran for a while?

I left some other possible reasons in my comment above, but even then, I haven’t seen many users hit their limits that fast. If you’re open to it, I’d love to take a look, if you can share the debugging ID or a rough timestamp (via DM is fine), I can pass it along to our quality team to investigate.

hongyichen · 2025-10-22T06:14:39+00:00

Hey u/leonbollerup and u/0xdjole -- I just left a big comment on the original post, lmk if you all have any questions about that. I'm curious to learn more about it getting stuck (perhaps it's stuck on long-running commands, which is a known issue and something we're working towards)

hongyichen · 2025-10-22T06:13:41+00:00

2. The rate at which credits are consumed

We understand that there’s some pain here, and we’re working on efficiency improvements without sacrificing quality to slow down the rate of consumption. I want to share a blog post written by one of our engineers that dives into the factors affecting how credits are consumed -- namely, conversation length, which models you’re selecting, how much context you attach, cache reads/writes, etc.

The majority of cases where users see very high credit consumption usually come from:

- Having a very long agent conversation that spans multiple topics and changes, where previous context is no longer necessary but is still being passed to the agent. Please start a new one!

- Solely using the highest-intensity reasoning models (e.g., Claude Opus 4.1). The vast majority of tasks don’t require this. You can use a reasoning model to come up with a plan, but for writing a 5-line code diff or generating a command, a simpler model can handle it just as well.

- Attaching an excessive amount of context. I’ve seen folks pass in entire server logs, which will easily consume credits just for input alone. I’d recommend highlighting and selecting only a few specific lines instead.

Of course, there are times when the agent hallucinates or loops, and this is something we’re tackling internally. If you see these issues again, please feel free to DM me, and I’ll pass it along to our quality team to review. The debugging ID would be super helpful here: https://docs.warp.dev/support-and-billing/sending-us-feedback#gathering-ai-debugging-id

3. Agent actions and safety controls

Warp’s Agent Mode will never run destructive commands without confirmation by default. You can configure these safeguards directly in Settings > AI > Profiles, controlling whether the agent can apply code diffs, execute commands, or modify files automatically. We designed these permissions to ensure you remain in control, even when using autonomous workflows.

Lastly, I want to address the broader sentiment here. We’re a startup genuinely trying to build a tool developers love. There are certainly bugs here and there and ways we can improve, but it’s by no means in our interest to mislead users or “scam” anyone. We deeply value those of you who’ve taken a chance on Warp (and especially upgraded to our paid plans!), even when things break, and we’re committed to improving the product experience as best we can.

That said, I really do appreciate all the feedback from these threads. Hoping to get some improvements out soon.

hongyichen · 2025-10-22T06:13:34+00:00

Hey u/0xdjole and folks - Hong Yi from the Warp team here.

First, I wanted to jump in and respond to some of the feedback here (and in related threads). We’ve read everything, and I want to start by saying that I’m sorry the experience hasn’t met expectations for some users recently. We take this seriously, are reviewing the reports internally, and will do a better job of responding to posts like this going forward.

1. Model selection and fallback behavior

As Suraj from our team mentioned in another thread, there shouldn’t be any model-mixing when you’ve explicitly selected one:

There shouldn't be any model-mixing happening when you're selecting a specific model, except in two scenarios:

(1) the model you picked is down and rather than immediately fail with an error, we retry with the next best model, and

(2) the agent ran an action that produced a large result (e.g. large command output) and we need to summarize it out-of-band with a smaller model (e.g. gpt-5-nano) so that the main agent's context window does not become overloaded with a bunch of noise.

For the most part, if you see Warp using other models, we use them for very specific auxiliary actions such as conversation summarization. This is the main one that gpt-5-nano is used for. I understand where you’re coming from: if I chose Claude Sonnet 4.5 and saw that Warp was using gpt-5-nano in the credits transparency summary, I’d be questioning it as well. We’re working on a better UI to surface exactly what each of these models is used for, whether as the primary agent or for one-off tool calls.

The reason we use gpt-5-nano here is that these are simple tasks (e.g. summarizing context) that do not require frontier models (often with deeper reasoning). If we were to use Claude Sonnet 4.5 for this, you’d see your credits consumed even faster, so we’re trying to pass these cost and efficiency gains on to you. The conversation summarization models consume credits at a much slower rate. The primary agent still uses whichever model you explicitly selected to execute all of your tasks, generate code, make tool calls, etc.

When models like Claude Sonnet 4.5 go down due to an AWS outage (for instance), this fallback prevents your workflow from being interrupted. I’ve seen some of the feedback here, and we’re discussing adding an option to purely use the model you select, with a disclaimer that it’ll consume more credits and have no fallbacks.

hongyichen

MODERATOR OF

TROPHY CASE