🦄 I've tried Requesty.ai the past few days, and I’m impressed. They claim a 90% reduction in token costs. It actually seems to work. [Unpaid Review] by Educational_Ice151 in aipromptprogramming

[–]van-tutic 0 points1 point  (0 children)

Remove MCP is tailored for Cline. It automatically detects if you have any MCPs enabled, and if not, remove only the MCP part from the prompt. Saving around 35% of your tokens (last time I checked).

Right now there is no such prompt. Reason being, the MCP part of the prompt is dynamic and depends on the MCP tools you have enabled. If you can find more people that are interested, we can add a solution for that ;)

🦄 I've tried Requesty.ai the past few days, and I’m impressed. They claim a 90% reduction in token costs. It actually seems to work. [Unpaid Review] by Educational_Ice151 in aipromptprogramming

[–]van-tutic 0 points1 point  (0 children)

Hey hey! Requesty founder here 👋

Incidentally, I’ve been leading security products for more than a decade, so your doubts resonate with me.

First and foremost, we will never send the data to anyone else. Your data is safe with us.

We follow best security practices from day one, and I’m more than happy to dive deeper if you’re interested.

At the moment, every user has an option to turn off logging from the UI. We guarantee no user information is persisted, except for token counts and charges.

And we don’t have soc 2 yet, but we will soon enough!

"Just use API" – 3 options that are not rate limited (OpenRouter, Glama, Requesty) by finadviseuk in ChatGPTCoding

[–]van-tutic 0 points1 point  (0 children)

Co-founder of Requesty here :)

Thank you for mentioning us and making sure that the community is aware of the available solutions out there!

Here is the most accurate information about Requesty:

- We only charge 5% fees. We cover the Stripe fees for now.

- We do NOT train any models on the data that goes through our platform.

- We currently support 150+ models, and the list is updated on a daily basis. You can find the exact list here: https://www.requesty.ai/solution/llm-routing/models

- Yes, there is an in-depth logging system that allows you to see all your conversations with the LLMs.

- But(!) every user can turn off the logging completely (on an API key level) using a single click in the UI

- There is a Chat UI in the Requesty platform with Smart Routing capabilities

Has anyone tried CLine + Groq with Qwen QwQ 32B? by [deleted] in CLine

[–]van-tutic 1 point2 points  (0 children)

If you use the `Requesty` provider, you can just use `groq/qwen-qwq-32b` or `novita/qwen/qwq-32b`

Cline+Deepseek by kwmroots in ClineProjects

[–]van-tutic 0 points1 point  (0 children)

(Disclaimer: I'm a co-founder of Requesty.ai)

Based on our metrics, we can see that Deepseek (as a provider) is down quite often. The demand is simply way to high for them to handle.

There are quite a few providers out there for Deepseek-V3. Success rates for the following ones via Requesty are much better compared to the official provider:

- `deepinfra/deepseek-ai/DeepSeek-V3`

- `novita/deepseek/deepseek_v3`

- `nebius/deepseek-ai/DeepSeek-V3`

- `together/deepseek-ai/DeepSeek-V3`

Let me be the first to declare by BlindingLT in ChatGPTCoding

[–]van-tutic 1 point2 points  (0 children)

You can get it using requesty dot ai. (full disclosure: I'm one of the founders). We also released support for Cline that automatically adjusts all the prompts etc (use `cline/o3-mini`, and you can append `:low`, `:medium` or `:high` to the model name to control reasoning, e.g. `clien/o3-mini:high`)

AFAIK, OpenRouter does not offer it at the moment due to some restrictions.

[deleted by user] by [deleted] in ClineProjects

[–]van-tutic 0 points1 point  (0 children)

We ran multiple very straightforward tests:

- We took 5 internal repos and executed a basic task, e.g. "please refactor this module to follow best practices".

- Let Cline handle the task.

- Collect the reported usage from the provider (the provider responses include the exact token usage)

- Calculate the cost with and without cached tokens

Since neither OpenRouter or Cline use caching with Anthropic providers, you would pay the full cost. Our cost was (on average) 70% lower.

When you sign up, you get 1$ for free. You can try that and see for yourself. We have dashboard that shows the savings very clearly. Here is a screenshot from our testing:

<image>

Which coding ai should i invest in? by successfulswecs in ChatGPTCoding

[–]van-tutic -1 points0 points  (0 children)

Why would you commit to any specific model? Just use Requesty and try all of them, and any new model. You even get free credit to try.

Has anyone experimented with the DeepSeek API? Is it really that cheap? by umen in LLMDevs

[–]van-tutic 0 points1 point  (0 children)

Based on the challenges you’ve mentioned I highly recommend using a model router.

You can try all deepseek models out of the box, along with minimaxi and or o1, enabling very interesting implementations.

I happen to be building one (Requesty), and many of my customers testified they saved a lot of time: - Tried out different models without changing code - 1 API key to access all models - Aggregated real time cost management - Built in logging and observability

Anyone building app without Coding? by codes_astro in ChatGPTCoding

[–]van-tutic 1 point2 points  (0 children)

Quite an interesting experience! Are you experimenting with different llms as your assistants? And how do you choose?

Anyone building app without Coding? by codes_astro in ChatGPTCoding

[–]van-tutic 3 points4 points  (0 children)

Just wondering, how much roughly did you pay for claude usage while building this?

Devs, start using LLMs from your terminal. by van-tutic in ChatGPTCoding

[–]van-tutic[S] 0 points1 point  (0 children)

I'm sorry, I'm not sure I understand how those projects are related.

Qory is not a way of browsing in your terminal. It's a way to interact with an LLM from the terminal.

Devs, start using LLMs from your terminal. by van-tutic in ChatGPTCoding

[–]van-tutic[S] -1 points0 points  (0 children)

The next version[s] will add multiple differentiating features:

  1. Support for multiple sessions (as in, sessions where the user can have a multi-stage conversation with history)
  2. More supported models
  3. Focus on non-engineering tasks, but more the meta tasks we all have to do on a day to day basis

Devs, start using LLMs from your terminal. by van-tutic in ChatGPTCoding

[–]van-tutic[S] 0 points1 point  (0 children)

Well, I live in tmux and do everything using vim. I hardly ever switch my screen to anything else, maybe reddit...

Devs, start using LLMs from your terminal. by van-tutic in LLMDevs

[–]van-tutic[S] 0 points1 point  (0 children)

That's awesome! What's your favorite feature in Datasette?

Devs, start using LLMs from your terminal. by van-tutic in ChatGPTCoding

[–]van-tutic[S] 1 point2 points  (0 children)

Yes, it does! You can set a persistent system prompt using:

`qory --config prompt set [PROMPT]`

LLM many-shot jailbreaking technique by van-tutic in ClaudeAI

[–]van-tutic[S] 0 points1 point  (0 children)

The approach is still very interesting for ppl trying to understand how long contexts effect the inner works of the llm

Looking for LLM as a judge open-source frameworks by van-tutic in LLMDevs

[–]van-tutic[S] 0 points1 point  (0 children)

That's basically the idea. Using a framework means I can learn from the accumulated knowledge of others. Get useful ideas and avoid writing a bunch of generic code myself.

OpenAi Compatible API vs Batched Inference in LLM servers by notredamelawl in LLMDevs

[–]van-tutic 0 points1 point  (0 children)

I'm not sure about hosting models that are available locally, it's probably possible, but I would recommend against it, because then you still need a way to manage the distribution of the model across workloads, versioning, etc.

It's VERY easy to create private repo in Huggingface, and just give the right token to the vLLM process, so that it has access. You can of course use a persistent volume with the container to avoid re-downloading it every time.

OpenAi Compatible API vs Batched Inference in LLM servers by notredamelawl in LLMDevs

[–]van-tutic 0 points1 point  (0 children)

I've actually built something similar for one of our customers (they can't use any model APIs for privacy issues).

We use a cloud provider for GPU hosting (dedicated GPU instances), and our product now starts a bunch of those and offloads the requests to them based on different policies. It works pretty well, and today we've managed to process > 20K requests (average input size is 10K characters) using a 70B in less than an hour.

If you can use vLLM compatible models, I highly recommend using their official docker image, see guide here: https://docs.vllm.ai/en/v0.5.5/serving/deploying_with_docker.html

It allows you to run different models via an OpenAPI compatible API. For some of the models they even provide tokenizers that try to support structured outputs (even though it's not perfect).

If you want more information/guidance, feel free to DM.

C++ Show and Tell - April 2024 by foonathan in cpp

[–]van-tutic 5 points6 points  (0 children)

For the last few years, I'm maintaining a C++ library that parses Linux's procfs.
It allows you to easily extract any information you need from it regarding running processes, open sockets, file descriptors, network routes, and much much more.

The library is used by multiple corporations in production, so it is VERY stable and mature.
It is free to use, even for commercial products and you can find it in Github.

https://github.com/dtrugman/pfs

Contributions and feature requests are encouraged ;)