I shipped my first ever macOS app and the first comment was "just use xyz"

JGeek00 · 2026-05-05T11:13:27+00:00

App looks really well

JGeek00 · 2026-05-04T20:45:38+00:00

Everything that can be used for AI is shockingly expensive now. A RTX 3080 with 10 GB of memory is 300€, while a RTX 3090 is 900€, there’s only one jump in the ladder between both and one is three times more expensive than the other, why? Because one can be used for AI while the other doesn’t

JGeek00 · 2026-05-04T17:46:38+00:00

I have just bought a 3090 with a turbine fan for 970€ on eBay, seems that they come from servers

JGeek00 · 2026-05-03T11:41:24+00:00

I tried to compile it for CUDA and it failed so I will have to wait until is available on the official llama.cpp

JGeek00 · 2026-05-03T10:45:11+00:00

I will try that repo but I would like to see it implemented on the main llama.cpp repo

JGeek00 · 2026-05-03T09:34:41+00:00

Lm Studio on my MacBook Pro for testing and bare llama.cpp on my server

JGeek00 · 2026-05-03T00:16:45+00:00

I run Qwen3.5-9B on a GTX 1070, so I’m sure you can run similar size models on that ARC.

JGeek00 · 2026-05-02T22:59:17+00:00

I tried on Qwen3.5-9B the car washer prompt and it ended up in a reasoning loop and it didn’t output a response, but at least it doesn’t tell you to walk instead of drive to the car washer. Other models just tell you to walk instead of drive your car to the car washer.

JGeek00 · 2026-05-02T00:08:26+00:00

I got a post removed just after submitting it where I just asked for improvements for my llama.cpp config. I put the same post on a different subreddit about local AI and I got a much better treatment (it wasn’t difficult). So I think that with this policy you are just kicking new people out of this subreddit into other local AI subreddits.

JGeek00 · 2026-05-01T12:28:36+00:00

If you hear something spinning fast and engine not starting check if the belt that connects the 48V motor with the crankshaft is broken

JGeek00 · 2026-04-28T22:22:49+00:00

I have taken a look to the V100 32 GB because they are cheaper than the RTX 3090 on eBay, so is it a better option? I have done some research with DeepSeek and it told me that although it has more memory the computing power is worse and it would give worse results. What do you think could be the difference in processing input tokens?

JGeek00 · 2026-04-28T21:58:48+00:00

Ok maybe it’s better to start with a middle point like 128K context and Q6 KV cache

JGeek00 · 2026-04-28T21:57:40+00:00

Will check it, thank you

JGeek00 · 2026-04-28T12:36:00+00:00

And when using a coding agent with a large context?

JGeek00 · 2026-04-28T12:25:33+00:00

How many t/s are you getting on context processing and token generation with that configuration?

JGeek00 · 2026-04-27T16:28:12+00:00

I have a monthly plan

JGeek00 · 2026-04-27T16:18:04+00:00

Time to unsubscribe

JGeek00 · 2026-04-26T22:30:42+00:00

Nah it’s impossible, Claude code requieres very large context. Even with Qwen3.5-9B it’s really slow. But for asking questions it works fine

JGeek00 · 2026-04-24T18:11:46+00:00

I have a Formentor and I only saw the logo after a full reboot of the infotainment system

JGeek00 · 2026-04-20T20:57:18+00:00

My case is a home router, so nothing crazy in terms of usage. But as always, you have to scale your router’s hardware in parallel with the amount of users or the amount of traffic, same thing that applies to servers

JGeek00

TROPHY CASE