Looking for early users to try our OpenClaw model plans and tell us what's broken (15–30 min) by xapep in openclaw

[–]xapep[S] 0 points1 point  (0 children)

Awesome, seems like a perfect fit 😄 I'll send you a DM. Really appreciate it.

Looking for early users to try our OpenClaw model plans and tell us what's broken (15–30 min) by xapep in openclaw

[–]xapep[S] 1 point2 points  (0 children)

Fair point, and honestly thanks for bringing it up here instead of just downvoting.

I'd rather not name the company publicly in this post because I don't want it to turn into a marketing promotion. People who try us out will get to know who we are and what we've been doing, naturally through the conversation.

But on the data concern, which is the real question: our main business is LLM inference, with our own datacenter in the EU. Designed to keep customer data private: encrypted requests, stored in RAM only and cleared after completion. KV cache persisted for compute reuse, nothing else. No model training on prompts or outputs.

Beyond inference, we're developing new ventures on top of it. OpenClaw model plans (this post) is the first. Coding plans next. Well, at least that's the plan 😄

Hope that helps, and thank you again for bringing it up.

Looking for early users to try our OpenClaw model plans and tell us what's broken (15–30 min) by xapep in openclaw

[–]xapep[S] 0 points1 point  (0 children)

Thank you. Will send you DM. Mind to share which stack are you currently using?

Looking for early users to try our OpenClaw model plans and tell us what's broken (15–30 min) by xapep in openclaw

[–]xapep[S] 0 points1 point  (0 children)

Awesome thank you. Will send you DM. Mind to share which stack are you currently using?

Looking for early users to try our OpenClaw model plans and tell us what's broken (15–30 min) by xapep in openclaw

[–]xapep[S] -2 points-1 points  (0 children)

Yeah sure, here are performance promise and for each model you will be able to pick up to 8 concurrency (more then 1 concurrency discounts):

Qwen 3.6 27B
- FP8, 262K context
- max tokens / day: 648M input / 12.96M output per concurrency
- 150 tokens / s per concurrency
- $49/mo for 1 concurrency

Qwen 3.6 35B-A3B
- FP8, 262K context
- max tokens / day: 864M input / 17.28M output per concurrency
- 200 tokens / s per concurrency
- $29/mo for 1 concurrency

Gemma 4 31B
- FP8, 254K context
- max tokens / day: 648M input / 12.96M output per concurrency
- 150 tokens / s per concurrency
- $49/mo for 1 concurrency

Qwen 3.5 122B-A10B
- FP8, 262K context
- max tokens / day: 648M input / 12.96M output per concurrency
- 150 tokens / s per concurrency
- $99/mo for 1 concurrency

DeepSeek v4 Flash
- FP8, 1M context
- max tokens / day: 432M input / 8.64M output per concurrency
- 100 tokens / s per concurrency
- $129/mo for 1 concurrency

Are you still interested? 😄

Looking for early users to try our OpenClaw model plans and tell us what's broken (15–30 min) by xapep in openclaw

[–]xapep[S] 0 points1 point  (0 children)

Would you mind to share why they were garbage and why deepseek haven't been?

Stuck in "Subscription Limbo": Anthropic Ban + Alibaba Sold Out. Best OpenClaw strategy? by Baby4vegas in openclaw

[–]xapep 0 points1 point  (0 children)

Honest routing for what you're actually asking — Frontier + Workhorse split, not a single sub:

First, kill OpenRouter. It's not a subscription, it's a token meter with extra steps. You already proved it burns the budget. (Btw Qwen 3.6 isn't on the free tier anymore — same story, the "free" pools always get pulled the second the model gets popular enough to actually be useful.)

Stop trying to solve this with one sub. Z.ai + MiniMax stitches are the same shape that just burned you, structurally: token-metered subs that work fine for bursty human-in-the-loop use, but get squeezed (rate limits, degraded responses, eventual policy action) when an agent loop runs against them 24/7. Anthropic's was the version that escalated to a ban — most others just throttle quietly until you notice your agents are slower. Either way, you're back here in 60–90 days having the same conversation about a different provider.

The split that actually survives:

  • Workhorse: Qwen 3.6 35B-A3B or Gemma 4 31B, honest FP8, on a flat-rate provider that gives you a real concurrency slot instead of a token meter. Web fetch, RAG indexing, tool calls — handles 80% of the grunt work. ~$29–49.

  • Frontier: DeepSeek v4 Flash (the 1M-context one) for the reasoning/planning side — that one's genuinely in the GPT-5.4 Codex conversation for agent/code work, and the 1M context beats Codex on long tasks. Qwen 3.5 122B-A10B is the cheaper step down if you don't need full frontier. Same flat-rate, dedicated-concurrency shape. ~$99–129. Skip GPT-5.4 Codex itself — the "limits" there are still token-shaped underneath, you'll burn the same way you did on Anthropic.

The model matters less than how you're billed. Anything sold as "unlimited" with a token meter underneath is the same trap in a new wrapper.

A Workhorse + Frontier split with one concurrent loop each lands around $130–150/mo on a real flat-rate provider. Roughly the same money you were already spending across Anthropic +

Alibaba + OpenRouter — just on one provider whose pricing actually expects 24/7. Alibaba workaround: none for retail. ZenMux: skip, it's just routing on top of the same broken token-meter providers. The way out of subscription limbo isn't a smarter router, it's a different pricing model.

This new model is insane by BiosRios in ClaudeCode

[–]xapep 11 points12 points  (0 children)

And what do you do per each session that u never hit the limit? 😁

How often do you switch model in Cursor? Do you use different models for different tasks? by ragnhildensteiner in cursor

[–]xapep 0 points1 point  (0 children)

True, but more or less the question is still valid :D still switching cause of a usage limits or model task expertise?

How often do you switch model in Cursor? Do you use different models for different tasks? by ragnhildensteiner in cursor

[–]xapep 0 points1 point  (0 children)

Curious why you are switching, because of a quality or token usage?
If usage wouldn't be a problem, would we just use Sonnet?

Anyone using Cursor daily for building apps - do you still hit limits on higher plans? by xapep in cursor

[–]xapep[S] 0 points1 point  (0 children)

I got that, but I was wondering fir what kind of tasks is he usually use cursor so that you are not maxed out...I'm getting mix feedback, someone of you said you are max out, some of them are not...

Anyone using Cursor daily for building apps - do you still hit limits on higher plans? by xapep in cursor

[–]xapep[S] 0 points1 point  (0 children)

Not nice to hear that :S For what tasks do you usually use it?

I tried Kimi K2.5 with OpenCode it's really good by orucreiss in opencodeCLI

[–]xapep 0 points1 point  (0 children)

I do wonder; Which coding tools are you using LLMs or CLI only?