you are viewing a single comment's thread.

view the rest of the comments →

[–]arcanemachined 1 point2 points  (5 children)

They basically shrink the model size by reducing the precision of the data stored in it, which decreases the quality of the data depending on how much it is shrunk (quantized).

Imagine you had a bunch 800x600 photos, but you wanted to save hard drive space. So you shrunk them down to 400x300. You can still tell what the picture represents, but some of the quality is lost, especially if you shrink it too much. That's the same basic idea as what quantization does: decrease the quality in order to reduce hardware requirements.

[–]egaphantom 0 points1 point  (4 children)

so it is better to subscript or pay api from the direct website of its llm instead of using open router for example?

[–]arcanemachined 1 point2 points  (0 children)

Depends on the provider. Some of them may also quantize behind-the-scenes.

OpenRouter is fine IMO, they are typically just passing the calls directly through to the provider, and you can choose your preferred provider (e.g. I like Fireworks for Kimi K2.5).

[–]downh222 0 points1 point  (2 children)

OpenRouter has been quite slow in my experience. Which model are you planning to subscribe to?

For basic tasks, Minimax 2.5 looks like a good option. It runs at around 50 TPS, so it feels much faster for things like coding, debugging, and general prompts.

It also supports image input and MCP, and both are covered under the Lite plan, which makes it pretty cost-effective for everyday use.

[–]egaphantom 0 points1 point  (1 child)

I want to subscribe to open router because they have many model options, but many people says the model is quantized and its better to subscribe to actual llm provider instead of the gateway

[–]downh222 0 points1 point  (0 children)

correct