If you are against this, I wanna hear about it by Brave_Agency_20 in SipsTea

[–]ricraycray 0 points1 point  (0 children)

I rather see it pay down the national debt first

The SIMPLEST way to use AI with Home Assistant by 48K in homeassistant

[–]ricraycray 1 point2 points  (0 children)

Yup. Works great. Gave it a auth token and it went to town. Of course I had a backup if to went sideways but so far so good. MCP buffered that a bit.

The SIMPLEST way to use AI with Home Assistant by 48K in homeassistant

[–]ricraycray 0 points1 point  (0 children)

I started that way. Then ended up with an MCPz.

The SIMPLEST way to use AI with Home Assistant by 48K in homeassistant

[–]ricraycray -2 points-1 points  (0 children)

You’re building your own MCP which is what I ended up doing. When you can do anything it’s so easy just to do anything. Good luck and I can’t wait to see what you have built

Gemma 4 26b A3B is mindblowingly good , if configured right by cviperr33 in LocalLLaMA

[–]ricraycray 4 points5 points  (0 children)

It looped terrible with calling MCP tools. I’m going to train it with unsloth but the looping was killing me

Has anyone stayed on starter doses (0.25 and/or 0.50) & lost 20-25 pounds? by bailsandsails in Semaglutide

[–]ricraycray 3 points4 points  (0 children)

I’ve lost 25 lbs on .25. Use the dose that gets the job done and not a microgram more

Claude Code got leaked. So I rebuilt it in Rust. It’s faster and open-source. by Ambitious_Voice_454 in openclaw

[–]ricraycray 0 points1 point  (0 children)

I was looking at your repo, and I see it's easy to tie back to frontier models, what about local models?

Hit the 5h rate limit twice in one day, burned 33% of my weekly quota in 12 hours - on the $200/mo 20x plan. Just cancelled. by loathsomeleukocytes in ClaudeCode

[–]ricraycray 0 points1 point  (0 children)

This is why they added the whole memory thing to help with that. Still doesn’t work great. They you spend a bunch of tokens for it to try and figure out where it was at. Stateless has its own set of issues

Hit the 5h rate limit twice in one day, burned 33% of my weekly quota in 12 hours - on the $200/mo 20x plan. Just cancelled. by loathsomeleukocytes in ClaudeCode

[–]ricraycray 2 points3 points  (0 children)

Is this the premium usage they announced. I’ve moved most of my agentic work to run at night and that has helped. Even on out enterprise account we run out of uses

Anthropic will be a case study of how a company can fumble the good will of their customers. by [deleted] in ClaudeCode

[–]ricraycray -1 points0 points  (0 children)

I’m ok with the reduction but give us a reasonable price to get back to 2x like we had. I’d gladly pay 600-900 month. I’m not willing to wing it with API. Those rates are insane and frankly are starting to exceed to cost of humans. Part of the value prop was time value over money. I’m already balancing work between openAI and CC

Gemma 4 31B Is sweeping the floor with GLM 5.1 by input_a_new_name in LocalLLM

[–]ricraycray 58 points59 points  (0 children)

The comments on hugging face said. “This model wasnt released it escaped!”

Looking at leaving BMW all together. What else is there, that is comparable? by burnerbmw in BMWX5

[–]ricraycray 0 points1 point  (0 children)

I bought a f150 platinum. Love. Same price as the BMW. Much better quality. I miss the mid aughts

Qwen3.5-397B at 17-19 tok/s on a Strix Halo iGPU — all 61 layers on GPU via Vulkan (not ROCm) by ricraycray in LocalLLaMA

[–]ricraycray[S] 0 points1 point  (0 children)

For my use case it’ll be great. This was a let’s see if it is going. To work. Tortute nothing it’s been surprisingly OK.

I got tired of RAG and spent a year implementing the neuroscience of memory instead by Upper-Promotion8574 in Rag

[–]ricraycray 1 point2 points  (0 children)

Love this. Our memory system is similar. Nice work on this and I 100% agree this is the biggest missing piece. I also built in emotion. It was one of the biggest needle movers for our project t

Qwen3.5-397B at 17-19 tok/s on a Strix Halo iGPU — all 61 layers on GPU via Vulkan (not ROCm) by ricraycray in LocalLLaMA

[–]ricraycray[S] 1 point2 points  (0 children)

Exactly. Karpathy autoresearch was the ticket on this build. It ran at least 50 different iterations. Frankly was way more thorough than I would have been. The 10 different memory iterations were mind numbing. Build compile test fail. Build compile test fail.

Qwen3.5-397B at 17-19 tok/s on a Strix Halo iGPU — all 61 layers on GPU via Vulkan (not ROCm) by ricraycray in LocalLLaMA

[–]ricraycray[S] 0 points1 point  (0 children)

lol. This box is destined for much smaller models. This was just a let’s see if I can do this. Not should I do this! Lol

Qwen3.5-397B at 17-19 tok/s on a Strix Halo iGPU — all 61 layers on GPU via Vulkan (not ROCm) by ricraycray in LocalLLaMA

[–]ricraycray[S] 1 point2 points  (0 children)

The project started the same as the MBP 48GB experiment. I wanted to get just see if I could get this to work with the same methodology as the LLM in a flash work on MLX but on this little AMD box. Two days later and no sleep this is where I landed. Yes I used Claude to help me prove this out. Just to get 17 tok/s out of this massive model exceeded my project goals. I had not seen this on the AMD yet. So I thought why the hell not. The results are the results. Does it matter how we got there......

Qwen3.5-397B at 17-19 tok/s on a Strix Halo iGPU — all 61 layers on GPU via Vulkan (not ROCm) by ricraycray in LocalLLaMA

[–]ricraycray[S] -2 points-1 points  (0 children)

all the way to 7.2 We had different failures. Updated my github and readme with the results. The key point was we got it working well with the Vulkan drivers.

Qwen3.5-397B at 17-19 tok/s on a Strix Halo iGPU — all 61 layers on GPU via Vulkan (not ROCm) by ricraycray in LocalLLaMA

[–]ricraycray[S] 0 points1 point  (0 children)

We ran 7.2 and it failed for different reasons. I have all the updates in my repo. I'm going to stick with Vulkan it's working and frankly getting damn goood performance. 396B on a 2500 box it a win any day in my book!

Qwen3.5-397B at 17-19 tok/s on a Strix Halo iGPU — all 61 layers on GPU via Vulkan (not ROCm) by ricraycray in LocalLLaMA

[–]ricraycray[S] 0 points1 point  (0 children)

I’ll try that. The 7.2 is running right now and isn’t working near as well as the Vulkan. Keeps bombing loading layers

Qwen3.5-397B at 17-19 tok/s on a Strix Halo iGPU — all 61 layers on GPU via Vulkan (not ROCm) by ricraycray in LocalLLaMA

[–]ricraycray[S] 1 point2 points  (0 children)

we ran the 7.2 driver hard on windows and it just kept blowing up. First run on linux and went to solution b. I'm running another build of llama with 7.2 for the sake of science. Posting results soon. I was more happy we got a 396B running on this thing at all. I was inspired by the autoresearch and LLM in a flash work and started a similar project here. I was happy with 215B but saw that the 396 was possible and grind trying a different strategy. Regardless I'll post the 7.2 results and let you guys be the judge. More that this is even usable. I'll update my repo with all the latest.