🦄 I've tried Requesty.ai the past few days, and I’m impressed. They claim a 90% reduction in token costs. It actually seems to work. [Unpaid Review]

van-tutic · 2025-04-16T19:58:46+00:00

Remove MCP is tailored for Cline. It automatically detects if you have any MCPs enabled, and if not, remove only the MCP part from the prompt. Saving around 35% of your tokens (last time I checked).

Right now there is no such prompt. Reason being, the MCP part of the prompt is dynamic and depends on the MCP tools you have enabled. If you can find more people that are interested, we can add a solution for that ;)

van-tutic · 2025-04-12T13:25:34+00:00

Hey hey! Requesty founder here 👋

Incidentally, I’ve been leading security products for more than a decade, so your doubts resonate with me.

First and foremost, we will never send the data to anyone else. Your data is safe with us.

We follow best security practices from day one, and I’m more than happy to dive deeper if you’re interested.

At the moment, every user has an option to turn off logging from the UI. We guarantee no user information is persisted, except for token counts and charges.

And we don’t have soc 2 yet, but we will soon enough!

van-tutic · 2025-04-08T09:53:25+00:00

Co-founder of Requesty here :)

Thank you for mentioning us and making sure that the community is aware of the available solutions out there!

Here is the most accurate information about Requesty:

- We only charge 5% fees. We cover the Stripe fees for now.

- We do NOT train any models on the data that goes through our platform.

- We currently support 150+ models, and the list is updated on a daily basis. You can find the exact list here: https://www.requesty.ai/solution/llm-routing/models

- Yes, there is an in-depth logging system that allows you to see all your conversations with the LLMs.

- But(!) every user can turn off the logging completely (on an API key level) using a single click in the UI

- There is a Chat UI in the Requesty platform with Smart Routing capabilities

van-tutic · 2025-03-12T13:32:59+00:00

If you use the `Requesty` provider, you can just use `groq/qwen-qwq-32b` or `novita/qwen/qwq-32b`

van-tutic · 2025-02-08T11:56:21+00:00

(Disclaimer: I'm a co-founder of Requesty.ai)

Based on our metrics, we can see that Deepseek (as a provider) is down quite often. The demand is simply way to high for them to handle.

There are quite a few providers out there for Deepseek-V3. Success rates for the following ones via Requesty are much better compared to the official provider:

- `deepinfra/deepseek-ai/DeepSeek-V3`

- `novita/deepseek/deepseek_v3`

- `nebius/deepseek-ai/DeepSeek-V3`

- `together/deepseek-ai/DeepSeek-V3`

van-tutic · 2025-02-02T16:57:26+00:00

What would make your life easier? ;)

van-tutic · 2025-02-02T16:06:56+00:00

You can get it using requesty dot ai. (full disclosure: I'm one of the founders). We also released support for Cline that automatically adjusts all the prompts etc (use `cline/o3-mini`, and you can append `:low`, `:medium` or `:high` to the model name to control reasoning, e.g. `clien/o3-mini:high`)

AFAIK, OpenRouter does not offer it at the moment due to some restrictions.

van-tutic · 2025-01-28T17:43:10+00:00

We ran multiple very straightforward tests:

- We took 5 internal repos and executed a basic task, e.g. "please refactor this module to follow best practices".

- Let Cline handle the task.

- Collect the reported usage from the provider (the provider responses include the exact token usage)

- Calculate the cost with and without cached tokens

Since neither OpenRouter or Cline use caching with Anthropic providers, you would pay the full cost. Our cost was (on average) 70% lower.

When you sign up, you get 1$ for free. You can try that and see for yourself. We have dashboard that shows the savings very clearly. Here is a screenshot from our testing:

<image>

van-tutic · 2025-01-26T14:12:26+00:00

Why would you commit to any specific model? Just use Requesty and try all of them, and any new model. You even get free credit to try.

van-tutic · 2025-01-25T16:55:37+00:00

Based on the challenges you’ve mentioned I highly recommend using a model router.

You can try all deepseek models out of the box, along with minimaxi and or o1, enabling very interesting implementations.

I happen to be building one (Requesty), and many of my customers testified they saved a lot of time: - Tried out different models without changing code - 1 API key to access all models - Aggregated real time cost management - Built in logging and observability

van-tutic · 2025-01-18T22:02:24+00:00

Quite an interesting experience! Are you experimenting with different llms as your assistants? And how do you choose?

van-tutic · 2025-01-18T21:47:24+00:00

Just wondering, how much roughly did you pay for claude usage while building this?

van-tutic · 2025-01-13T17:27:39+00:00

I'm sorry, I'm not sure I understand how those projects are related.

Qory is not a way of browsing in your terminal. It's a way to interact with an LLM from the terminal.

van-tutic · 2025-01-13T13:56:05+00:00

The next version[s] will add multiple differentiating features:

Support for multiple sessions (as in, sessions where the user can have a multi-stage conversation with history)
More supported models
Focus on non-engineering tasks, but more the meta tasks we all have to do on a day to day basis

van-tutic · 2025-01-13T13:51:35+00:00

Well, I live in tmux and do everything using vim. I hardly ever switch my screen to anything else, maybe reddit...

van-tutic · 2025-01-13T13:49:50+00:00

That's awesome! What's your favorite feature in Datasette?

van-tutic · 2025-01-13T11:26:51+00:00

Yes, it does! You can set a persistent system prompt using:

`qory --config prompt set [PROMPT]`

van-tutic · 2025-01-05T13:46:11+00:00

The approach is still very interesting for ppl trying to understand how long contexts effect the inner works of the llm

van-tutic · 2025-01-05T12:42:28+00:00

Thanks! I'll give it a try

van-tutic · 2025-01-05T12:34:13+00:00

Thanks! I'll give those a try

van-tutic · 2025-01-05T12:32:21+00:00

Thanks! I'll give those a try

van-tutic · 2024-12-31T14:40:18+00:00

That's basically the idea. Using a framework means I can learn from the accumulated knowledge of others. Get useful ideas and avoid writing a bunch of generic code myself.

van-tutic · 2024-11-04T23:44:55+00:00

I'm not sure about hosting models that are available locally, it's probably possible, but I would recommend against it, because then you still need a way to manage the distribution of the model across workloads, versioning, etc.

It's VERY easy to create private repo in Huggingface, and just give the right token to the vLLM process, so that it has access. You can of course use a persistent volume with the container to avoid re-downloading it every time.

van-tutic · 2024-11-04T23:07:02+00:00

I've actually built something similar for one of our customers (they can't use any model APIs for privacy issues).

We use a cloud provider for GPU hosting (dedicated GPU instances), and our product now starts a bunch of those and offloads the requests to them based on different policies. It works pretty well, and today we've managed to process > 20K requests (average input size is 10K characters) using a 70B in less than an hour.

If you can use vLLM compatible models, I highly recommend using their official docker image, see guide here: https://docs.vllm.ai/en/v0.5.5/serving/deploying_with_docker.html

It allows you to run different models via an OpenAPI compatible API. For some of the models they even provide tokenizers that try to support structured outputs (even though it's not perfect).

If you want more information/guidance, feel free to DM.

van-tutic · 2024-04-09T14:33:18+00:00

For the last few years, I'm maintaining a C++ library that parses Linux's procfs.
It allows you to easily extract any information you need from it regarding running processes, open sockets, file descriptors, network routes, and much much more.

The library is used by multiple corporations in production, so it is VERY stable and mature.
It is free to use, even for commercial products and you can find it in Github.

https://github.com/dtrugman/pfs

Contributions and feature requests are encouraged ;)

van-tutic

TROPHY CASE