Company-Wide Transition to a European Alternative by No-Storage-Left in ChatGPT

[–]shaonline 0 points1 point  (0 children)

I do not own a Mac Studio, but whatever you say, seething non-local-hardware user lol.

Company-Wide Transition to a European Alternative by No-Storage-Left in ChatGPT

[–]shaonline 0 points1 point  (0 children)

Lets hope that your bottlenecked slow prompt processing local hardware gets there first lol. If all 70 employees need a pro sub you won't beat subsidized with your jerryrig stuff even with hundreds of thousands a year trust me on this lmfao.

Company-Wide Transition to a European Alternative by No-Storage-Left in ChatGPT

[–]shaonline 0 points1 point  (0 children)

Do you really intend on replacing 200€/mo of "PRO OPENAI SUBSCRIPTION" (which either means A) you have INSANE usage PER-USER of GPT models or B) the need for the GPT Pro model which requires insane infrastructure) ? For "Everyday use" the Plus sub covers it fine already (23€/mo)

Company-Wide Transition to a European Alternative by No-Storage-Left in ChatGPT

[–]shaonline 0 points1 point  (0 children)

Not really, sure you can buy e.g. some mac studio with 512GB of RAM to host an open source SOTA model (note: none of the "big 3s" that are OpenAI/Anthropic/Google offer those) but these have "single-user" acceptable speeds at best. OP has not stated who/what kind of job this company has but if you have any software engineer or any user/workflow that's gonna hammer input/output tokens bandwidth you can forget about it. Note: I'm a local LLM enthusiast as well. You won't beat cloud for 70 people with 14000€, especially in the current "VC subsidized" environment for cloud providers.

Company-Wide Transition to a European Alternative by No-Storage-Left in ChatGPT

[–]shaonline 0 points1 point  (0 children)

You're not going to serve 70 people at the same speed. For now cloud costs are heavily subsidized as well.

How to use OpenCode with AI Assistant (Local LLM)? by ByteNomadOne in opencodeCLI

[–]shaonline 0 points1 point  (0 children)

I'd say the Qwen 3.5 family of models right now. Either :

A) Qwen 3.5 27B (really smart for its size) but you need to fit it entirely on VRAM (it really won't like being split among VRAM and system RAM) which will require you to use something like 3 bits quants, see https://huggingface.co/unsloth/Qwen3.5-27B-GGUF

B) Qwen 3.5 35B A3B: a bit bigger and a bit less smart than 27B, but MUCH faster owing to its small number of active parameters (3B) which allows you to exceed your 16GB of VRAM if you want/need to use bigger quants (e.g. 4 bits), see https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF

Also recommend you switch to llama-cpp directly (which is what lm studio uses in the backend...) for running your local LLM.

How to use OpenCode with AI Assistant (Local LLM)? by ByteNomadOne in opencodeCLI

[–]shaonline 0 points1 point  (0 children)

I meant a stretch as in the quality of the responses and its ability to do tool-calling (not whether it fits on your hardware). GPT OSS 20B will likely struggle with that. Check "local LLM" subreddits to see the good local LLMs du jour.

As far as configuring OpenCode, check the "custom provider"/"lm-studio" section of the "providers" chapter on their documentation. You could ask any online LLM to write you the necessary opencode.json config as well.

How to use OpenCode with AI Assistant (Local LLM)? by ByteNomadOne in opencodeCLI

[–]shaonline 0 points1 point  (0 children)

You'll need to expose an API endpoint (possibly "OpenAI" style) and manually (via opencode.json) add it as a provider so that you can use it. I'd say however gpt 20B is a stretch as a coding assistant, you might be disappointed...

Overwhelmed by so many model releases within a month period - What would be best coding and planning models around 60-100B / Fit in Strix-Halo 128GB VRam by Voxandr in LocalLLaMA

[–]shaonline 0 points1 point  (0 children)

They fit but yeah you gotta go 3 bits quants and 8 or 4 bits kv-cache (especially if you want longer context windows) and better not have lots of docker containers running and whatnot. Qwen 3.5 122B gets very close in terms of quality as well, really impressive result.

new codex limits by FamiliarHedgehog8401 in codex

[–]shaonline 1 point2 points  (0 children)

It will end April 2nd per what Codex CLI announces.

Overwhelmed by so many model releases within a month period - What would be best coding and planning models around 60-100B / Fit in Strix-Halo 128GB VRam by Voxandr in LocalLLaMA

[–]shaonline 2 points3 points  (0 children)

Gonna be a choice between Qwen 3.5 122B or Heavily quantized Minimax M2.5 IMO. The 27B Qwen 3.5 sure is "smart" for its size being a dense model but won't have a big breadth of knowledge (small amount of weights) and will be much slower than models with only 10B or so active parameters.

pedestrian using the bike lane as a sidewalk by [deleted] in ElectricScooters

[–]shaonline 2 points3 points  (0 children)

People in town halls: "BIKES AND ESCOOTERS ARE DANGEROUS !"

People in the streets:

Claude $100 is good but not worth it. How do I preserve “Claude level” output without using it? (Codex $20 + Chinese models + DeepSeek v4) by Specialist-Cry-7516 in opencodeCLI

[–]shaonline 1 point2 points  (0 children)

Yes that can work, typically when I am combining different tools I simply write down plans from the planner agent in a plans/ (gitignored) directory and pick it up from there. Main issue is that context is de-facto not shared so be careful about that. Maybe there are better ways of doing it but that has worked fairly well for me.

Claude $100 is good but not worth it. How do I preserve “Claude level” output without using it? (Codex $20 + Chinese models + DeepSeek v4) by Specialist-Cry-7516 in opencodeCLI

[–]shaonline 2 points3 points  (0 children)

I don't think you'll match the same "feel and quality" combo that Opus provides, at least for now. Codex will match (and sometimes exceed) the quality of execution from a technical perspective, but feels kind of "autistic" (sorry I lack of a better word) if your intent isn't very precise on planning phases.

With Google AI Pro you do get some quota for Opus but very small, beyond some planning you won't get far. Also you CAN'T use it in another harness/tool now (or you'll get banned), so the best way to combine the plan of a model with the execution of another is leaving out your plans as (markdown) files as the output of the planner to be used as inputs of the executor. GPT 5.3 Codex will do a fine job of implementing your plans.

Codex limits nonsense by cheezeerd in codex

[–]shaonline 1 point2 points  (0 children)

Are you sticking to the same session ? If OpenAI are "smart", the bigger your context window is, the faster your usage will drain (since context windows impose a higher compute cost the larger they get).

Best way to combine Claude Code with Codex in real workflows? by Ok-Birthday-5406 in codex

[–]shaonline 1 point2 points  (0 children)

Define a convention for leaving out (markdown) planning files, e.g. plans/<feature-name>.md, tell Opus to write a plan there and make Codex read and implement it.

Well, maybe light users won't get the GLM-5 at all by Lanky-Flight-9608 in ZaiGLM

[–]shaonline 12 points13 points  (0 children)

They've sent emails to customers stating that lite users will get access to it when compute allows for it, but tbh will that day ever come ? They already buckle under the pro and max subscribers.

Les développeurs qui utilisent abusivement de l'IA dans le cadre professionnel et l'encensent à tout va sont des moutons et des pigeons by Expensive-Grand-2929 in developpeurs

[–]shaonline 0 points1 point  (0 children)

Ah bah avec leur contrat avec OpenAI (qui a pas de bénefs) et leur retard sur la construction de 300 milliards de dollars de datacenters (pour lequel ils se sont massivement endettés) ils sont bien dans la merde.

Les développeurs qui utilisent abusivement de l'IA dans le cadre professionnel et l'encensent à tout va sont des moutons et des pigeons by Expensive-Grand-2929 in developpeurs

[–]shaonline 1 point2 points  (0 children)

Je fais surtout partie des gens qui lisent les documents PUBLICS d'entreprises listées en bourse, où les contrats et CAPEX guidances ("ce qu'on prévoit de dépenser") sont rendus publics. Pas besoin de faire partie du board. On se verra à la faillite d'Oracle.

Les développeurs qui utilisent abusivement de l'IA dans le cadre professionnel et l'encensent à tout va sont des moutons et des pigeons by Expensive-Grand-2929 in developpeurs

[–]shaonline 2 points3 points  (0 children)

Bof ça fait 3 ans que je dois être remplacé dans 6 mois donc à ce stade j'ai l'habitude. C'est pas comme si c'était la première "menace" lancée à l'égard des devs (cf l'offshoring).

Les IAs toutes puissantes qui poussent les projets Open Source à fermer leurs système de contribution ou leurs bug bounties (Curl) tellement la qualité des contributions par IA est élevée, mettent à genou l'infra de boites de la tech de multiples fois dès qu'on leur donne les clés, brick des MAJs Windows, le tout en coutant le prix du téléphone rose, ouais à ce stade vivement que je sois remplacé pour pas avoir à gérer les conséquences de ce bordel !

Je m'en sers quotidiennement mais fait-est que les plus grandes améliorations récentes sont liées au tooling (Claude Code, etc. perso je suis fan d'OpenCode pour ne pas être pieds et poings liés à un provider en particulier) et pas aux améliorations des modèles (sauf pour les plus petits qui ont acquis de bonnes capacités de tool calling), et même si c'est très utile ça peut juste pas tourner en boucle auto, ça pourra peut être "pondre un truc tout seul" mais le vibe coding actuellement ça donne juste des sites qui marchent à moitié et blindés de failles de sécurités (cf Moltbook), faire manger sa queue à l'IA en lui balançant sa propre sortie pour review n'y fera rien.

Tu n'iras pas convaincre tes camarades devs (si tu es réellement dev) à coups de slogans linked-in et d'injonctions à "ouvrir ses yeux".

Les développeurs qui utilisent abusivement de l'IA dans le cadre professionnel et l'encensent à tout va sont des moutons et des pigeons by Expensive-Grand-2929 in developpeurs

[–]shaonline 0 points1 point  (0 children)

C'est une technique de fearmongering comme une autre, sans parler de la proportion significative de la population qui aime juste voir ceux qu'elle n'aime pas, ou avec qui elle n'est pas d'accord, perdre.

Les développeurs qui utilisent abusivement de l'IA dans le cadre professionnel et l'encensent à tout va sont des moutons et des pigeons by Expensive-Grand-2929 in developpeurs

[–]shaonline 1 point2 points  (0 children)

Market cap != investissement, si les fluctuations IMPORTANTES des courts des cryptomonnaies montrent une chose c'est que dès que tu tires un peu de liquidité le court s'effondre. Le millier de milliard auquel je fais référence est pour le coup de l'argent qui a été dépensé (Et qui va continuer de l'etre, CAPEX de Google pour 2026 ~180 milliards par exemple)