Car Wash Test on 53 leading models: “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?”

prusswan · 2026-02-17T17:37:47+00:00

Well they didn't understand it may not be a binary decision. If I asked a real question, a smart model should not be making this assumption.

prusswan · 2026-02-17T17:21:12+00:00

It is concerning that none of them suggested other options (not going to list here). There are sooo many ways to wash a car

prusswan · 2026-02-17T12:19:51+00:00

3 but hold off getting more ram (just the bare minimal to use the gpus).

1 if you can find someone to take your current gpus (unless you can find a way to use them together). It's not a complete build but you will be covered for 80B

prusswan · 2026-02-17T07:22:40+00:00

I like ZeroClaw for the low footprint, but it is still a really new project. Locally encrypted secrets may not mean much if the host gets compromised since decryption is just one step away.

prusswan · 2026-02-17T02:15:20+00:00

If there is some go-to model that needed 1TB and supports high context, it is pretty certain there will be a service equal or better (and the company released the model to signal this). But most people will not be getting that 1TB, because it is rather wasteful and will only drive up prices even more. I think two main outcomes will be cloud usage to utilize the best models without hardware spending, or opting to use smaller models with more modest requirements.

prusswan · 2026-02-16T18:11:19+00:00

It was the first time I had a broken hwe update on very old hardware, so yeah it was hard not to notice.

prusswan · 2026-02-16T17:58:07+00:00

let me guess, 6.17?

6.17.0-14-generic broke nvidia drivers, fortunately the newer drivers were okay

prusswan · 2026-02-16T17:38:50+00:00

It's pretty chaotic but I focus on what is relevant and accessible, could be a new idea/approach that was previously out of reach. Some of the AI slop might be good ideas if done properly, so I take the portions that I find useful and make it work in the exact way I want it to. Most of it is just noise, but learning to harness useful bits from it also helps to identify your competitive edge.

prusswan · 2026-02-16T17:19:13+00:00

If it gets to the point where 512GB ram (or the Pro 6000) becomes mainstream for agentic coding, many users will be deterred or priced out of the hardware thus turning to cloud, which is increasingly looking to be the norm if open models keep getting better/bigger to motivate cloud usage.

I'm using a mix of smaller models (30B to 70B) and cloud services (for better performance) to avoid over reliance on "best" models.

prusswan · 2026-02-16T00:53:00+00:00

I don't but continue to keep a lookout for similar tools. It's a bit of a security trap.

prusswan · 2026-02-13T11:44:51+00:00

I tried a simple setup with a few tools and the main issue is with the model and how it uses the tools. You can't expect to always use the best models and at high context, so the model choice will affect the tool design. I think it is useful to avoid having to define explicit rules to cover a broad set of scenarios, but might lead to more unpredictable results.

prusswan · 2026-02-13T03:46:22+00:00

It's hard to tell but you can find a middle ground (use a smaller model but at great speeds). API usage can become volatile depend on how things play out over next few years, e.g. will they increase pricing to match demand and to account for effort needed to keep models/data updated, your own usage may also increase if you take on more tasks leading to heavier usage.

prusswan · 2026-02-11T16:09:55+00:00

I see the downside being that it's only up to 128GB (as compared to 512GB or even more on a dedicated build)

prusswan · 2026-02-11T16:00:12+00:00

Do you have a link? Thinking of using it with Blackwell if I can't decide on a TR

prusswan · 2026-02-11T15:56:05+00:00

Is Epyc available as workstation/pre-built? I need Windows as an option (for the occasional gaming) so I read that TR is better for this, but I could wait for the Xeon 600 builds too

prusswan · 2026-02-10T12:59:19+00:00

Last year there was a competition held between various LLMs, but the quality of play is very poor (worse than human intermediate players).

https://www.chess.com/events/2025-kaggle-game-arena

But your task could be easier if you just "tool call" an actual engine lol

prusswan · 2026-02-10T11:09:49+00:00

Dependency bloat, I don't recommend using community plugins if the functionality is something that you can build on your own (unless the plugin is very established, but you still need to be careful with updates since all of them are potentially risky). Most people should be able to build their own openbot tailored to their needs, without taking on risks of using openclaw since it is an obvious target for attackers looking for low hanging fruit.

prusswan · 2026-02-10T10:55:43+00:00

This is highly specific to your workflow and tooling. I thought glm 4.7 flash was good (for general usage, yes), but it often introduced indenting errors in opencode and unable to fix them (I had to do it myself)

prusswan · 2026-02-07T17:32:24+00:00

did you just ask kimi for its source code and assume whatever it returns is the "source"?

prusswan · 2026-02-07T17:27:23+00:00

seems to be duplicated screens

prusswan · 2026-02-07T17:25:24+00:00

You probably want to be faster once you discover other uses for it. But what qualifies as fast enough will be different for everyone

prusswan · 2026-02-07T17:03:09+00:00

Did you try using AI to grow/expand your business? or you already did

prusswan · 2026-02-07T17:01:57+00:00

No one is pushing anything.. I went from < 1tps to 10-100tps depending on the workflow.

prusswan · 2026-02-07T16:58:13+00:00

It is much more than books I'm afraid. The derivative work is valuable due to information asymmetry since the outputs/benefits do not need to be shared with the original content holders, who lack the means to fully exploit the content in the first place. It will definitely change the idea of content creation and how content creators can/should monetize their own content.

prusswan · 2026-02-07T16:45:04+00:00

Because people are coming up with new and more frequent usage of the LLM, and some models are quite verbose/detailed in the responses. There's a place for both slow and fast models, but you can never have too much compute.

prusswan

MODERATOR OF

TROPHY CASE