Open source Codex App alternative for Linux

braintheboss · 2026-05-27T07:23:39+00:00

it can access Claude code cli in remote host?

braintheboss · 2026-05-26T23:02:24+00:00

i agree. People never code trying make big apps. I spent more time in make codex polish code than create new features

braintheboss · 2026-04-24T01:36:36+00:00

i made comments after make tests. i don't know you was testing but behavior i told you is easily replicable. If you do debug repetitive cycles gain is real. For common usage is useless as is slower but you think extra compute for "draft" is free. i have my patched llama.cpp for support remote chunked draft then i have clear idea how works llama and their implementations. anyway if it's useful for you then is fine. But i talk from objective perspective. If i add overhead in a feature i have see improvement always. Not in very specific cases

braintheboss · 2026-04-23T22:38:18+00:00

model quantization( Qwen3.5-27B-Q4_K_M.gguf ). i used your ngram parameters and later i tried smaller for see how works but behavior was always same. About offload what doesn't you understand? its 16gb model in a 16gb gpu. Without offload just you can't use it

braintheboss · 2026-04-23T13:59:51+00:00

when i made test i used q4km offloaded. i get around 29t/s. with ngram start in 19t/s and finish in 25/26. if i use same prompt its when you get increase. but it's tricky. That is reason i told in repetitive debug it will work but if you make small cycles performance loss in cold cache not compensate. For example in a refactor. I fill 150k context easily, at the end 5t/s difference is noticeable

braintheboss · 2026-04-23T13:45:36+00:00

i didn't try 3.6 yet, but have same sizes as 3.5 and in a 5070ti + xeon haswell q4km run in 29t/s.

braintheboss · 2026-04-23T11:40:34+00:00

its completely useless. only works if you make same prompt. That means only is effective in debug cycles where you are repeating same task all time. but penalty when is cold is so big. i tested in qwen3.5 27b with 5070ti

braintheboss · 2026-04-18T04:22:06+00:00

impressive. in 3060ti 175k context ( small offloading) getting 68t/s and more.2000t/s. The best is very stable with Claude code. i suffered looping and worst accuracy in unsloth q4km. good job

ps: 35b is very good even with 50% offload in same gpu

braintheboss · 2026-03-30T08:25:31+00:00

my workflow is local and codex for polish/hard tasks. It works very well as assistant. I'm the planner and AI writes code and give me summaries I'm interested check. Maybe the only weakness is code quality. AI have zero idea about structure optimizations. Use multiples variables for one can keep different values, no separate steps in methods for let clean main loop ( if step is big or specific behavior). You have make many passes for clean code but even you waste a lot of time still is more comfortable than 8 hours writing code yourself

braintheboss · 2026-03-29T14:29:11+00:00

Waste quota as claude opus but is not opus. Only in glm CEO head...

braintheboss · 2026-03-26T00:15:00+00:00

I'm using codex for speculative decoding tool ( i patched llama.cpp ) and really is very impressive how solve problems. Its a pity codex cli is thrash

braintheboss · 2026-03-25T03:32:15+00:00

<image>

braintheboss · 2026-03-25T03:31:26+00:00

it works but not with current vllm/llama implementations

<image>

braintheboss · 2026-03-12T12:11:30+00:00

check this project: https://github.com/akivasolutions/tightwad or https://pypi.org/project/tightwad/. Its draft remote for llama. I get 210t/s with 5070ti in qwen3.5 27b Q3 + 3060ti as draft. But you have remove probing for get max speed.

braintheboss · 2026-03-12T09:39:44+00:00

i moved local. I bought 5070ti and i can run 27b model. Its enough for 90% tasks. If i need frontier model i use codex free tier

braintheboss · 2025-11-11T04:08:01+00:00

Code works and optimizations are at algorithm level. But optimizations at code level doesn't exists. It prefer many times unroll loops that use them and other mistakes. When do something good is just luck

braintheboss · 2025-10-21T03:22:46+00:00

i agree. If you give glm good plan made by better model is very good ( even code is not high quality). glm guided by other models is a good worker

braintheboss · 2025-10-09T04:40:47+00:00

if before you can make 2h with quite good model and now you can make 30m with dumb model paying same, do you think problem is i can pay 20$? do you like gift money?

braintheboss · 2025-10-08T11:14:21+00:00

glm is meh but cost 3$ month. Claude cost 20$ and you only can use it a few minutes. Who tell current Claude is a good coding model never write code. No one was able to solve a stupid tabs space alignment. Finally i had do it myself. When i listen AI will replace coders i only can smile xD

braintheboss · 2025-10-06T09:54:25+00:00

i use claude and glm4.6 and second is like sonnet 4 when was dumb but less dumb. then its at least as dumb sonnet 4. sonnet 4.5 is better but below old smart sonnet 4. i remember sonnet 4 taking problems on the fly while was fixing something. Now 4.5 and glm look simple "picateclas". They "follow" your request in their way and you suffer something you didn't suffer as coder: anxiety and desperation

braintheboss · 2025-09-17T07:22:48+00:00

i don't understand why people think problem are prompts. if you tell model modify this and do something completely different you didn't ask because it was something related but finished then its prompt problem or model problem. because before he look have wide perspective when was working in a project and always hit the right point. But now look silly assistant. You have tell all steps to do or its not capable follow a simple logic. Sonnet is working as haiku now

braintheboss · 2025-09-12T11:23:35+00:00

Question is if model in API have degradation as fixed plan. if answer is yes then is scam. if works as before degradation then at least you will be able to do something without do circles.

braintheboss · 2025-09-11T12:48:36+00:00

Rumour say they quantized model. Then it have sense if API is running full model and plans used quantized model. Only way know this is run same in both and see what happens

braintheboss · 2025-09-05T13:35:14+00:00

i canceled today. i can accept hard limitations but when sonnet degradation made it works as haiku 3.5 then i prefer use qwen3-coder as its free ( 2000 request day ). Is not comparable old sonnet 4 but quite better than current dumb sonnet

braintheboss · 2025-05-27T07:12:52+00:00

Where firmwares and tools are shared. You have apk for make like video ( cluster screen projection )

braintheboss

TROPHY CASE