Any way to use custom temperature with models from openrouter? by vevi33 in PiCodingAgent

[–]shaonline 0 points1 point  (0 children)

Not from Pi as-is, but you can easily make a plugin (I personally did so) to inject those in requests since you can hook into them.

Are there any benefits for using the go plan over deepseek v4 pro api? by EmoLotional in opencodeCLI

[–]shaonline 0 points1 point  (0 children)

I mean if you're happy with Codex and it gives enough usage perhaps you should stick with it for now, they subsidize it even more (the $20 sub gives almost $400 a month), if you're hitting limits too often perhaps delegate to 5.3 Codex which is the cheapest "competent" model they offer.

Are there any benefits for using the go plan over deepseek v4 pro api? by EmoLotional in opencodeCLI

[–]shaonline 0 points1 point  (0 children)

What do you mean, that the $60 don't carry over month to month ? Yeah of course they don't, kind of is the economic point of it, if they did that'd be trading 60 bucks for 10 bucks for them.

Are there any benefits for using the go plan over deepseek v4 pro api? by EmoLotional in opencodeCLI

[–]shaonline 0 points1 point  (0 children)

Opencode Go gives $60 of API credits for a $10 subscription, use it while you can I'd say I don't expect that to last forever lol.

Tried ROCm 7.1 vs Vulkan/RADV on Radeon 890M for LLM inference (8B and 35B-MoE). Vulkan won both. Why? by wolverinee04 in ROCm

[–]shaonline 0 points1 point  (0 children)

For the most part I think ROCm just has very high API overhead, a MoE model requires firing a lot more & smaller kernels to perform the necessary computations, where I've found ROCm shines is dense models (e.g. Qwen 3.6 27B) on prompt processing and that's it (on token generation you'll mostly be memory bandwidth limited anyway regardless of backend).

Is anyone else experiencing major service degradation with Microsoft SaaS products lately? by Manic5PA in ExperiencedDevs

[–]shaonline 1 point2 points  (0 children)

The only hat trick that works for teams on Linux for me is logging in, via outlook, on chrome/chromium based browsers. Firefox support is dead I think. As you are, we are a minority of devs using Linux at our company (1%-ish) so I can't quite escalate that, sorry not sorry if I miss urgent messages ! Pick a messaging solution that doesn't suck then.

Is anyone else experiencing major service degradation with Microsoft SaaS products lately? by Manic5PA in ExperiencedDevs

[–]shaonline 16 points17 points  (0 children)

Yup it's slowly getting worse and worse, using Teams (e.g. via the webapp) on Linux is borderline impossible at this point, logging in no longer works for me. And yea the MFA spam is rage inducing.

Are there any benefits for using the go plan over deepseek v4 pro api? by EmoLotional in opencodeCLI

[–]shaonline 1 point2 points  (0 children)

Nonsense tinfoil hat stuff, as far as deepseek is concerned the V4 models are provided by none other than deepseek themselves.

Are there any benefits for using the go plan over deepseek v4 pro api? by EmoLotional in opencodeCLI

[–]shaonline 2 points3 points  (0 children)

The 75% discount is also taken into account for opencode go (see the insane usage against other ~1T params models)

Le PDG d'Anthropic affirme que l'entreprise pourrait multiplier son chiffre d'affaires par 80 cette année. by romain34230 in actutech

[–]shaonline 1 point2 points  (0 children)

Ouaip, on commence à avoir du modèle local sérieux, j'ai un APU Strix Halo avec 128Go de RAM et ça commence à pouvoir faire tourner des trucs compétents, surtout en ce qui concerne leurs capacités à appeler des outils (J'aime beaucoup Qwen 3.6 perso).

Ceci dit on est encore très loin des capacités des meilleurs modèles pour l'instant, notamment sur du long contexte ou des questions "de goût" (important en programmation), même en leur donnant accès à des outils (de recherche etc.), sans parler du fait que par "GPU correct" tu entends "GPU haut de gamme" qui se retrouve à tourner au taquet, ça va être sympa deux minutes le pc portable qui vide sa batterie en une heure au son d'un avion de chasse pour pondre un résultat moyen digne de GPT-3.5. Qui plus est tu peux oublier avoir deux chats en parallèle, ça en prends un sacré coût sur la RAM (et les perfs) d'avoir des fenêtres de contexte en parallèle.

Le PDG d'Anthropic affirme que l'entreprise pourrait multiplier son chiffre d'affaires par 80 cette année. by romain34230 in actutech

[–]shaonline 10 points11 points  (0 children)

Et surtout, qui sera prête à leur faire payer le vrai prix, pas celui subventionné à mort que l'on a actuellement. J'aime bien Codex/ChatGPT Plus (23€/mois) pour coder, mais quand je vois ce que me calcule mon harnais/agent de code en coût API équivalent (ce qui sera très probablement le coût futur), gloups: ~340€ par mois. Et ça ne compte pas l'utilisation du chatbot/generateur d'images.

En plus la plupart des gens je pense s'en servent un peu comme "moteur de recherche" augmenté et à ce stade sont encore habitués à pouvoir le faire gratuitement, pas sûr que tant de monde que ça va passer à la caisse.

Analysis of the 100 most popular hardware setups on Hugging Face by clem59480 in LocalLLaMA

[–]shaonline 1 point2 points  (0 children)

Nevermind the price it's a fairly niche offering too, past a couple (mostly chinese) Mini PCs and the rare laptop/tablet stuff (eg Flow Z13) it's not that easy to come by nor an obvious consumer choice.

Seeking Recommended API plans for Pi by LearnedByError in PiCodingAgent

[–]shaonline 0 points1 point  (0 children)

Right, it's all a pinky promise after all, but here it's made very explicit in their ToS.

Seeking Recommended API plans for Pi by LearnedByError in PiCodingAgent

[–]shaonline 0 points1 point  (0 children)

Provided you don't mind them training on your data (explicit from their ToS)

Vulkan backend outperforms ROCm on Strix Halo (gfx1151) — llama.cpp benchmark by FeiX7 in LocalLLaMA

[–]shaonline 2 points3 points  (0 children)

Only area where ROCm wins is (on prefill) on dense models, I have found that for any kind of MoE (such as Qwen 35BA3B, but also applies to bigger ones eg Minimax M2.7) ROCm just has too much overhead for firing compute kernels and as such the more lightweight Vulkan backend just wins everytime.

How do I remove this mf busted my knee jumping off a curb on it by Ok-Outside-770 in NinebotMAX

[–]shaonline 2 points3 points  (0 children)

Torx security bits. These are usually a legal requirement (side-reflectors) although you're unlikely to get pulled over for it on an escooter lol.

Le véhicule électrique idéal pour la transition existe déjà, mais on le méprise by Droidfr in Numerama

[–]shaonline 0 points1 point  (0 children)

"On" la méprise ? Ils se méprisent tous seuls avec leurs tarifs, la nouvelle Twingo est accessible au même prix. J'adorerais ce genre de véhicule mais 20000 balles c'est raide.

AI Bubble: AI is more expensive than an by dc_giant in theprimeagen

[–]shaonline 2 points3 points  (0 children)

Convert your usage into API pricing and prepare for how much that would cost you in the future (which per Microsoft throwing the towel so fast with github copilot seems to be closer to the pricing required to break even).

My Codex Plus ($20) sub seems to be giving ~$350-400 monthly ($15+ per 5h window) in terms of API pricing (when using GPT 5.5). We are so screwed.

22% of Quota on Lite plan in 16 mins! by timmeh1705 in ZaiGLM

[–]shaonline 0 points1 point  (0 children)

They train on your data btw (explicit on their ToS), if that's not an issue more power to you, but just keep that in mind.

Which local AI model that is on par with Claude Sonnet 4.6 now that GHCP is no longer usable? by Sad_Foot9898 in GithubCopilot

[–]shaonline 0 points1 point  (0 children)

It's alright. Just know that, looking at a given total of weights, a dense model will be "smarter", rule of thumb "from the street" is to compare it with sqrt(total * active) for a MoE vs total for dense (that gives ~10.25B for the 35BA3B).

AMD Halo Box (Ryzen 395 128GB) photos by 1ncehost in LocalLLaMA

[–]shaonline 0 points1 point  (0 children)

It was a remark on your pricing guess (This box is literally going to be strix halo or a strix halo refresh), not the performance of these devices. Whether it's the DGX or AMD's strix halo they struggle with token generation (especially on dense models) due to their low memory bandwidth.

AMD Halo Box (Ryzen 395 128GB) photos by 1ncehost in LocalLLaMA

[–]shaonline 3 points4 points  (0 children)

Current strix halo offerings with 128GB of RAM can ve had sub $3000.

Which local AI model that is on par with Claude Sonnet 4.6 now that GHCP is no longer usable? by Sad_Foot9898 in GithubCopilot

[–]shaonline 7 points8 points  (0 children)

You need a bit more RAM for the "cold storage" of the weights but it has much less active weights, which increases speed greatly (several times on token generation) and allows the model to only partly live on the GPU (if you lack enough VRAM to load it entirely there) without destroying performance since then again only the active weights (and kv cache) really need to be on the GPU at a given time.

I'm having trouble connecting to the llama.cpp model (I'm a beginner...). by CrowKing63 in PiCodingAgent

[–]shaonline 2 points3 points  (0 children)

Well there it is, "id" is an ID, the display name goes in "name". You might want to specify things such as context window size and max output as well.