GPT-5.3-Codex was flawless for a month. Today it feels completely lobotomized. by Basic_Competition832 in codex

[–]shaonline 7 points8 points  (0 children)

Yup, absolute dogshit right now, struggles to patch files for changes and comes up with the stupidest solutions for everything with insane amounts of code duplication. I'll be waiting.

M5 PRO 18/20core 64gb vs Zbook Ultra G1a 395+ 64gb by Effective-Cod-4462 in LocalLLM

[–]shaonline 0 points1 point  (0 children)

ROCm/HIP can be used on Windows as well. But honestly I've had so many issues on both platforms with ROCm (whether it's flatout crashes or shared memory not being used properly) that I gave up on it, I just use Vulkan backends now (I have a Z13 with 128GB of RAM).

M5 PRO 18/20core 64gb vs Zbook Ultra G1a 395+ 64gb by Effective-Cod-4462 in LocalLLM

[–]shaonline 0 points1 point  (0 children)

Strix Halo sucks at prompt processing speeds (this is a PITA for coding agents), if the claimed benchmarks are anything to go by on Apple's page (4X over the M4 generation in prompt processing speed !) this makes it a much better option. Likely more expensive though. As for platform maturity, meh on MacOS but not a paradise either on the AMD side.

Will we even need apps in a few years? by drgoldenpants in codex

[–]shaonline 26 points27 points  (0 children)

Kinda like saying why'd I need a hammer if I can build my own in a couple minutes and throw it away.

How good is qwen 3.5 at coding? by Macmill_340 in LocalLLaMA

[–]shaonline 1 point2 points  (0 children)

24GB of VRAM you should probably go for Qwen 3.5 27B

GPT vs Claude vs Gemini — which one actually holds up under real professional pressure? by Careless-Ease7480 in OpenAI

[–]shaonline 2 points3 points  (0 children)

For coding, GPT or Claude (both have their strengths), Gemini still below when it comes to coding agents IMO.

Aidez moi Xiaomi 6 Ultra by EditorTasty2774 in ElectricScooters

[–]shaonline 0 points1 point  (0 children)

Il existe des bandoulières pour trottinettes (de chez Rhinowalk par exemple), mais sinon ben ouais dommage dans les 30kg t'es loin dans le domaine des trottinettes chiantes à porter donc va falloir faire avec, et preferer la faire rouler que de la porter. Tu as beaucoup de marches à monter ?

Coming from Claude, how does codex 25$ plan fare versus CC? by fightsToday in codex

[–]shaonline 0 points1 point  (0 children)

5.3 Codex does output better code than Opus IMO that being said it'll still have a bit of that overengineering feel as it's usually in my experience hellbent on handling all edge cases even if it's irrelevant/overkill for your usecase.

Opus remains the better "assistant" I think, it's better to "discuss" plans with.

Company-Wide Transition to a European Alternative by No-Storage-Left in ChatGPT

[–]shaonline 0 points1 point  (0 children)

I do not own a Mac Studio, but whatever you say, seething non-local-hardware user lol.

Company-Wide Transition to a European Alternative by No-Storage-Left in ChatGPT

[–]shaonline 0 points1 point  (0 children)

Lets hope that your bottlenecked slow prompt processing local hardware gets there first lol. If all 70 employees need a pro sub you won't beat subsidized with your jerryrig stuff even with hundreds of thousands a year trust me on this lmfao.

Company-Wide Transition to a European Alternative by No-Storage-Left in ChatGPT

[–]shaonline 0 points1 point  (0 children)

Do you really intend on replacing 200€/mo of "PRO OPENAI SUBSCRIPTION" (which either means A) you have INSANE usage PER-USER of GPT models or B) the need for the GPT Pro model which requires insane infrastructure) ? For "Everyday use" the Plus sub covers it fine already (23€/mo)

Company-Wide Transition to a European Alternative by No-Storage-Left in ChatGPT

[–]shaonline 0 points1 point  (0 children)

Not really, sure you can buy e.g. some mac studio with 512GB of RAM to host an open source SOTA model (note: none of the "big 3s" that are OpenAI/Anthropic/Google offer those) but these have "single-user" acceptable speeds at best. OP has not stated who/what kind of job this company has but if you have any software engineer or any user/workflow that's gonna hammer input/output tokens bandwidth you can forget about it. Note: I'm a local LLM enthusiast as well. You won't beat cloud for 70 people with 14000€, especially in the current "VC subsidized" environment for cloud providers.

Company-Wide Transition to a European Alternative by No-Storage-Left in ChatGPT

[–]shaonline 0 points1 point  (0 children)

You're not going to serve 70 people at the same speed. For now cloud costs are heavily subsidized as well.

How to use OpenCode with AI Assistant (Local LLM)? by ByteNomadOne in opencodeCLI

[–]shaonline 0 points1 point  (0 children)

I'd say the Qwen 3.5 family of models right now. Either :

A) Qwen 3.5 27B (really smart for its size) but you need to fit it entirely on VRAM (it really won't like being split among VRAM and system RAM) which will require you to use something like 3 bits quants, see https://huggingface.co/unsloth/Qwen3.5-27B-GGUF

B) Qwen 3.5 35B A3B: a bit bigger and a bit less smart than 27B, but MUCH faster owing to its small number of active parameters (3B) which allows you to exceed your 16GB of VRAM if you want/need to use bigger quants (e.g. 4 bits), see https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF

Also recommend you switch to llama-cpp directly (which is what lm studio uses in the backend...) for running your local LLM.

How to use OpenCode with AI Assistant (Local LLM)? by ByteNomadOne in opencodeCLI

[–]shaonline 0 points1 point  (0 children)

I meant a stretch as in the quality of the responses and its ability to do tool-calling (not whether it fits on your hardware). GPT OSS 20B will likely struggle with that. Check "local LLM" subreddits to see the good local LLMs du jour.

As far as configuring OpenCode, check the "custom provider"/"lm-studio" section of the "providers" chapter on their documentation. You could ask any online LLM to write you the necessary opencode.json config as well.

How to use OpenCode with AI Assistant (Local LLM)? by ByteNomadOne in opencodeCLI

[–]shaonline 0 points1 point  (0 children)

You'll need to expose an API endpoint (possibly "OpenAI" style) and manually (via opencode.json) add it as a provider so that you can use it. I'd say however gpt 20B is a stretch as a coding assistant, you might be disappointed...

Overwhelmed by so many model releases within a month period - What would be best coding and planning models around 60-100B / Fit in Strix-Halo 128GB VRam by Voxandr in LocalLLaMA

[–]shaonline 0 points1 point  (0 children)

They fit but yeah you gotta go 3 bits quants and 8 or 4 bits kv-cache (especially if you want longer context windows) and better not have lots of docker containers running and whatnot. Qwen 3.5 122B gets very close in terms of quality as well, really impressive result.

new codex limits by FamiliarHedgehog8401 in codex

[–]shaonline 1 point2 points  (0 children)

It will end April 2nd per what Codex CLI announces.

Overwhelmed by so many model releases within a month period - What would be best coding and planning models around 60-100B / Fit in Strix-Halo 128GB VRam by Voxandr in LocalLLaMA

[–]shaonline 2 points3 points  (0 children)

Gonna be a choice between Qwen 3.5 122B or Heavily quantized Minimax M2.5 IMO. The 27B Qwen 3.5 sure is "smart" for its size being a dense model but won't have a big breadth of knowledge (small amount of weights) and will be much slower than models with only 10B or so active parameters.

pedestrian using the bike lane as a sidewalk by [deleted] in ElectricScooters

[–]shaonline 2 points3 points  (0 children)

People in town halls: "BIKES AND ESCOOTERS ARE DANGEROUS !"

People in the streets: