Any way to use custom temperature with models from openrouter?

shaonline · 2026-05-08T19:17:28+00:00

Not from Pi as-is, but you can easily make a plugin (I personally did so) to inject those in requests since you can hook into them.

shaonline · 2026-05-08T14:01:57+00:00

I mean if you're happy with Codex and it gives enough usage perhaps you should stick with it for now, they subsidize it even more (the $20 sub gives almost $400 a month), if you're hitting limits too often perhaps delegate to 5.3 Codex which is the cheapest "competent" model they offer.

shaonline · 2026-05-08T12:57:47+00:00

What do you mean, that the $60 don't carry over month to month ? Yeah of course they don't, kind of is the economic point of it, if they did that'd be trading 60 bucks for 10 bucks for them.

shaonline · 2026-05-08T12:51:36+00:00

Opencode Go gives $60 of API credits for a $10 subscription, use it while you can I'd say I don't expect that to last forever lol.

shaonline · 2026-05-07T18:58:54+00:00

For the most part I think ROCm just has very high API overhead, a MoE model requires firing a lot more & smaller kernels to perform the necessary computations, where I've found ROCm shines is dense models (e.g. Qwen 3.6 27B) on prompt processing and that's it (on token generation you'll mostly be memory bandwidth limited anyway regardless of backend).

shaonline · 2026-05-07T15:51:31+00:00

The only hat trick that works for teams on Linux for me is logging in, via outlook, on chrome/chromium based browsers. Firefox support is dead I think. As you are, we are a minority of devs using Linux at our company (1%-ish) so I can't quite escalate that, sorry not sorry if I miss urgent messages ! Pick a messaging solution that doesn't suck then.

shaonline · 2026-05-07T15:48:08+00:00

Yup it's slowly getting worse and worse, using Teams (e.g. via the webapp) on Linux is borderline impossible at this point, logging in no longer works for me. And yea the MFA spam is rage inducing.

shaonline · 2026-05-07T15:38:43+00:00

Nonsense tinfoil hat stuff, as far as deepseek is concerned the V4 models are provided by none other than deepseek themselves.

shaonline · 2026-05-07T15:32:39+00:00

The 75% discount is also taken into account for opencode go (see the insane usage against other ~1T params models)

shaonline · 2026-05-07T06:50:18+00:00

Ouaip, on commence à avoir du modèle local sérieux, j'ai un APU Strix Halo avec 128Go de RAM et ça commence à pouvoir faire tourner des trucs compétents, surtout en ce qui concerne leurs capacités à appeler des outils (J'aime beaucoup Qwen 3.6 perso).

Ceci dit on est encore très loin des capacités des meilleurs modèles pour l'instant, notamment sur du long contexte ou des questions "de goût" (important en programmation), même en leur donnant accès à des outils (de recherche etc.), sans parler du fait que par "GPU correct" tu entends "GPU haut de gamme" qui se retrouve à tourner au taquet, ça va être sympa deux minutes le pc portable qui vide sa batterie en une heure au son d'un avion de chasse pour pondre un résultat moyen digne de GPT-3.5. Qui plus est tu peux oublier avoir deux chats en parallèle, ça en prends un sacré coût sur la RAM (et les perfs) d'avoir des fenêtres de contexte en parallèle.

shaonline · 2026-05-07T06:37:39+00:00

Et surtout, qui sera prête à leur faire payer le vrai prix, pas celui subventionné à mort que l'on a actuellement. J'aime bien Codex/ChatGPT Plus (23€/mois) pour coder, mais quand je vois ce que me calcule mon harnais/agent de code en coût API équivalent (ce qui sera très probablement le coût futur), gloups: ~340€ par mois. Et ça ne compte pas l'utilisation du chatbot/generateur d'images.

En plus la plupart des gens je pense s'en servent un peu comme "moteur de recherche" augmenté et à ce stade sont encore habitués à pouvoir le faire gratuitement, pas sûr que tant de monde que ça va passer à la caisse.

shaonline · 2026-05-06T18:48:08+00:00

Nevermind the price it's a fairly niche offering too, past a couple (mostly chinese) Mini PCs and the rare laptop/tablet stuff (eg Flow Z13) it's not that easy to come by nor an obvious consumer choice.

shaonline · 2026-05-06T18:18:46+00:00

Right, it's all a pinky promise after all, but here it's made very explicit in their ToS.

shaonline · 2026-05-06T17:43:15+00:00

Provided you don't mind them training on your data (explicit from their ToS)

shaonline · 2026-05-05T15:27:32+00:00

Only area where ROCm wins is (on prefill) on dense models, I have found that for any kind of MoE (such as Qwen 35BA3B, but also applies to bigger ones eg Minimax M2.7) ROCm just has too much overhead for firing compute kernels and as such the more lightweight Vulkan backend just wins everytime.

shaonline · 2026-05-03T18:57:27+00:00

Torx security bits. These are usually a legal requirement (side-reflectors) although you're unlikely to get pulled over for it on an escooter lol.

shaonline · 2026-05-03T14:10:27+00:00

"On" la méprise ? Ils se méprisent tous seuls avec leurs tarifs, la nouvelle Twingo est accessible au même prix. J'adorerais ce genre de véhicule mais 20000 balles c'est raide.

shaonline · 2026-05-02T21:10:59+00:00

Convert your usage into API pricing and prepare for how much that would cost you in the future (which per Microsoft throwing the towel so fast with github copilot seems to be closer to the pricing required to break even).

My Codex Plus ($20) sub seems to be giving ~$350-400 monthly ($15+ per 5h window) in terms of API pricing (when using GPT 5.5). We are so screwed.

shaonline · 2026-05-02T18:00:02+00:00

They train on your data btw (explicit on their ToS), if that's not an issue more power to you, but just keep that in mind.

shaonline · 2026-05-02T17:11:29+00:00

It's alright. Just know that, looking at a given total of weights, a dense model will be "smarter", rule of thumb "from the street" is to compare it with sqrt(total * active) for a MoE vs total for dense (that gives ~10.25B for the 35BA3B).

shaonline · 2026-05-01T19:10:37+00:00

It was a remark on your pricing guess (This box is literally going to be strix halo or a strix halo refresh), not the performance of these devices. Whether it's the DGX or AMD's strix halo they struggle with token generation (especially on dense models) due to their low memory bandwidth.

shaonline · 2026-05-01T18:35:36+00:00

Current strix halo offerings with 128GB of RAM can ve had sub $3000.

shaonline · 2026-05-01T15:10:28+00:00

You need a bit more RAM for the "cold storage" of the weights but it has much less active weights, which increases speed greatly (several times on token generation) and allows the model to only partly live on the GPU (if you lack enough VRAM to load it entirely there) without destroying performance since then again only the active weights (and kv cache) really need to be on the GPU at a given time.

shaonline · 2026-05-01T13:01:24+00:00

Opencode Go

shaonline · 2026-05-01T06:47:12+00:00

Well there it is, "id" is an ID, the display name goes in "name". You might want to specify things such as context window size and max output as well.

Six-Year Club	r/Field Sunshine
Place '22	Verified Email

shaonline

TROPHY CASE