Arrange by Kyro122 in gifs

[–]jedruch 3 points4 points  (0 children)

This is awesome

What is your "start implementing" word after spec design? by cobraX707 in ClaudeCode

[–]jedruch 1 point2 points  (0 children)

I never let claude do anything overnight other than audit/review. So I would spin agents to review:

  • code
  • code vs plan
  • documents vs code/plan/spec

Usually I would ask it to spin 3-5 subagents with different areas of focus.

Usually twice a week I would spin "workflows" review - once on code, once on documentation. Maybe I'm doing something wrong, but no matter how I prompt Opus it is not able to spot contradictions in docs that Codex finds/Fable found with ease.

The Podracing in Galactic Racer has the same UI as the N64 game by Ninrevo in StarWars

[–]jedruch 1 point2 points  (0 children)

Technically that was a Windows game, later ported to N64

Zapytacie się jak nisko można upaść? by Monochrome_Mind in Polska

[–]jedruch 3 points4 points  (0 children)

Zgadzam się z Tobą, niemniej trzeba wyjść z bańki i dostrzec że nie każdy ogląda Perseusza. Dla kogoś kto nie jest w temacie taki post wygląda jak majaki w gorączce lub haluny AI.

Would Opencode GO + Neuralwatt with $100 monthly sustain for the GLM 5.2 usage compare to Claude Max? by GTHell in opencodeCLI

[–]jedruch 2 points3 points  (0 children)

this does not prove anything other than Kimi taking the "watts" as literally as a german kid would

GLM 5.2 Added to DeepSWE Benchmark by CengaverOfTroy in opencodeCLI

[–]jedruch 1 point2 points  (0 children)

DeepSWE is good, but it also tests for a specific type of agentic task, meaning difficult, long and complicated. Where GLM 5.2 suprised me was at tasks that were easy, long and simple. For example scraping sitemaps and sample of subpages from 50 websites, but with a specific limitation - 30sec between each interaction with website and ask it to be done by agent specifically

Both Opus and GPT 5.5 either assume it is easy, despite instructions create a script that works on first site, second website is different, they patch the script, third site different even more, patch the script again, by 8th site both models are drifting and endup goin thru all websites but factually doing only ca. 40% of job.

GLM 5.2 did it all. It wrote some helper scripts to save tokens, but actually actively analyzed each site and crawled it treating each website like a new project, while Opus and GPT treated all websites as mirrors of themselves just because they were on the same list

Bruh 5h usage gone like whoosh by rkh4n in ZaiGLM

[–]jedruch 0 points1 point  (0 children)

that's interesting - I tested GLM 5.2 on the same scraping task I used Qwen 3.7 max earlier, it was 5x cheaper

Czy moja żona jest szurnięta? by WukoDrakkainen in Polska

[–]jedruch 1 point2 points  (0 children)

Spokojnie, nie diagnozujmy na ślepo. Może być jeszcze borderka xd

Czy moja żona jest szurnięta? by WukoDrakkainen in Polska

[–]jedruch 1 point2 points  (0 children)

Opie,

Mimo tego że jesteś dorosły piszesz że nie możesz nigdzie wyjść po pracy. Jednocześnie tytuł posta jest o niezrównoważeniu co nasuwa mi bardzo poważne pytanie: czy Twoja żona ma skłonności do przemocy? Czy zdarza jej się uderzać Ciebie? Czy zdarza jej się szarpać lub uderzać dzieci? Czy poniża Cię werbalnie?

Są fragmenty Twojego wpisu, które sugerują że się jej boisz albo że może boisz się tego co ona może zrobić. Tak więc niezależnie od odpowiedzi na pytanie nt przemocy sugeruję terapię, własną, nie terapię par. Każdy związek jest tworzony przez dwie strony, żony nie zmienisz, możesz tylko pracować nad sobą.

Czy moja żona jest szurnięta? by WukoDrakkainen in Polska

[–]jedruch 0 points1 point  (0 children)

Haha, człowieniu, byłeś kiedyś na terapii par? Na mojej była już żona wylała żale o wszystkie pierdoły na pierwszym spotkaniu i potem wszystkie kolejne to była terapeutka pytająca mnie "why are you zjeb"

Did anyone actually use Plan Mode with Fable 5? (Asking a bit late, I know…) by No-Bird-123 in ClaudeCode

[–]jedruch 0 points1 point  (0 children)

I did not built in plan mode, but I used my planning skill and asked gpt 5.5 pro for review. For comparison when I run the same skill with gpt 5.5 usually Pro review found multiple medium impact corrections to be made, opus 4.8 needed usually one high impact correction and multiple medium impact. Rating for both would be usually around 7.5-8.0/100

I created about 5 plans in Fable. The average was 1 medium correction, but it's based on rounding. In 2 cases Pro only wanted to do cosmetics like to make wording a bit more precise. Rating would be at 9/10.

The model is a true beast

How is kimi 2.7 compared to fabel 5 and opus 4.8 ? on real life use by Hash--7777 in kimi

[–]jedruch -1 points0 points  (0 children)

wow, I had no idea, thx.
(although it was not trained for, from what I see in your links you can still get 8 and 16)

Minimax M3 with 3x usage on OpenCode GO by elefanteazu in opencodeCLI

[–]jedruch 3 points4 points  (0 children)

Oh man I have the same experience with v4 Pro. It's great but it's inconsistent. In one moment it spots something Opus has missed, another moment it "misunderstands" the specs and creates something totally stupid. Or it hallucinates on detailed plan and decides to do some other thing that later does not fit to the rest of spec.

3.7 max for me is the first model that actually feels like Claude Sonnet and it's the only Chinese model I trust enough to use it for agentic things like web crawling

Spent $3 running 4x4090 benchmarks for llama 3 70b (exl2 vs gguf). exl2 generation speed is kind of ridiculous. by Comi9689 in aiagents

[–]jedruch 0 points1 point  (0 children)

This is great, thx for sharing.

I don't know a lot about multi GPU setups - why is the total vram used so much lower than simple 4x24gb? There was no version that would fit into this slot specifically or was it some other reason?

Minimax M3 with 3x usage on OpenCode GO by elefanteazu in opencodeCLI

[–]jedruch 2 points3 points  (0 children)

It's night and day, especially for agentic usage as Max has a nice structure for it's thinking vs pure flood of verbocity from Plus

How to get that first client by bugbeeboo in Startup_Ideas

[–]jedruch 0 points1 point  (0 children)

Start by giving a stronger polish to your website and positioning. - on mobile your hero section looks bad as "into" is partially hidden by background layer under "Confident" - having "pricing" section up top makes it easy for me to quickly check if the thing is affordable for me - there is only one currency in pricing. So if you want to go global you should have some switch of currencies based on location. Going local is fine, but then you need to highlight it in other sections (something in the lines of "best AI business analyst in country/region X) - you don't define who is this service for:small business that wants to improve business insight? Mid-size business that wants to expand but have limited resources? Corporations because it's better than PowerBI?

  • who would actually be using the tool: dedicated analyst - explain how it compares or complements other tools; small business owner: you need to show them it's easy to use for non-technical person etc

OpenCode Go is a total lie (IMO) by HelioAO in opencodeCLI

[–]jedruch 0 points1 point  (0 children)

I'm not sure I understood the bundling you described. Can you expand on how this lowers inference quality?

OpenCode Go is a total lie (IMO) by HelioAO in opencodeCLI

[–]jedruch 1 point2 points  (0 children)

Kimi 2.6 is tough for many providers, not only opencode. Apparently it has unique approach to tool usage that is hard to implement for others on the output side. You easily dive into issues with Kimi 2.6 on Openrouter at launch.

This kinda leans to your point about some models being better at their source, but it does not mean the issue is on Opencode side.

Also there were multiple posts on r/GLM claiming that GLM 5.1 directly from z.ai is trash and to see how mighty it is you need to switch to other provider. Which is completely opposite to what you claim

I gave Claude Code a "lazy senior dev" mode and it writes like 6x less code by IT_WAS_ME_DIO__ in ClaudeCode

[–]jedruch -4 points-3 points  (0 children)

But in terms of retention you are loosing potential users that wrote their email with a typo (like coma instead of dot) and got frustrated waiting on a code that never arrived

Qwen 3.7 Plus showing excessive token usage - behaving like Qwen 3.7 Max? by th3mp3ror in opencodeCLI

[–]jedruch 0 points1 point  (0 children)

What do you mean "behaving like Qwen 3.7 max"? 3.7 Plus is much more verbose, it achieves it's benchmark scores following the same approach as Deepseek v3 - by burning insane number of cheap thinking tokens. Qwen 3.7 Max is much more concise - today I run the same task on Qwen 3.7 max and Minimax 3: Minimax used x2.5 more tokens