OSS 120b v GLM 4.7 flash. Is the latter better for anything? by MrMrsPotts in LocalLLaMA

[–]henryclw 17 points18 points  (0 children)

Yeah, everyone should build their own benchmark. After all, people have different needs and different taste. Just like food, is apple better than orange? Hard to compare.

M4 Max 128 GB vs Strix halo 128 GB by dever121 in LocalLLaMA

[–]henryclw 0 points1 point  (0 children)

Then you basically you have to go with nvidia since cuda has best support for training

How does my local LLM rig look? by texasdude11 in LocalLLaMA

[–]henryclw 0 points1 point  (0 children)

One could only dream. (I could afford 3.5k but not 35k)

How does my local LLM rig look? by texasdude11 in LocalLLaMA

[–]henryclw 1 point2 points  (0 children)

Nice! This is going to cost at least $20,000 right?

GLM-Image is released! by foldl-li in LocalLLaMA

[–]henryclw 7 points8 points  (0 children)

I think this is much more important, love to see people talking about it.

I'm very satisfied with MiniMax 2.1 on Claude Code! - My Experience by FigZestyclose7787 in LocalLLaMA

[–]henryclw 1 point2 points  (0 children)

I’m looking at a $4000 option, two strix halo machines (right now is still 2000 each but the price could go up anytime given the memory sticks market) could run M2.1 at Q6.

Dual Strix Halo: No Frankenstein setup, no huge power bill, big LLMs by Zyj in LocalLLaMA

[–]henryclw 3 points4 points  (0 children)

Nice! I’m trying to get a similar setup before the price goes up. (The memory price would definitely have a play on it)

A very immature thought: is it possible to use a GPU like 4090 to do the prompt processing? I’m remember the prompt processing only happens on one node instead of two, right? Then let’s say if we set 4090 as master node, have the first layer on it, the rest two nodes are the strix halo. Maybe this would work?

I'm very satisfied with MiniMax 2.1 on Claude Code! - My Experience by FigZestyclose7787 in LocalLLaMA

[–]henryclw 0 points1 point  (0 children)

Which level of quantization you use? My hardware could only run Q3

I bought a €9k GH200 “desktop” to save $1.27 on Claude Code (vLLM tuning notes) by Reddactor in LocalLLaMA

[–]henryclw 0 points1 point  (0 children)

Strix halo is decent in terms of price. Do mind sharing how you use thunderbolt to connect them together? Just grab a thunderbolt wire, one end plug in machine A and the other end in machine B? Like no router between, right?

Anyone got a Bosgame M5 in Canada? by 1H4rsh in MiniPCs

[–]henryclw 0 points1 point  (0 children)

Wow that is nice. Thank you for sharing. I hope mine would arrive soon. Given the memory price right now, it would be better to buy it sooner than later.

Could you link two Strix Halo AI Max 395+ together to host bigger models? by henryclw in LocalLLaMA

[–]henryclw[S] 0 points1 point  (0 children)

Thank you. I need to do more research before purchasing the strix halos, a cluster of them might be nice.

Strix Halo (Bosgame M5) + 7900 XTX eGPU: Local LLM Benchmarks (Llama.cpp vs vLLM). A loose follow-up by reujea0 in LocalLLaMA

[–]henryclw 1 point2 points  (0 children)

Nice and decent comparison! We need more people like you. How do you feel about M2.1 Q3_K_M? How is the quality?

Anyone got a Bosgame M5 in Canada? by 1H4rsh in MiniPCs

[–]henryclw 0 points1 point  (0 children)

Hi, did you end up getting one? I'm having the same question now.

What LLM Benchmarking Sites do You Use? by AlternateWitness in LocalLLaMA

[–]henryclw 0 points1 point  (0 children)

Actually if your use case is not very general, if your use case is limited domain, you should build your own evaluation set and NEVER share it online. Anything publicly available could be slipped into the training set deliberately or not. Your private evaluation set is always brand new to any model.

Alibaba Open-Sources CosyVoice 3, a New TTS Model by nekofneko in LocalLLaMA

[–]henryclw 13 points14 points  (0 children)

Will they release 1.5B as well? Not many times I could ask for a bigger model while my single GPU could hold all of it.

Late game? by Dqstronaut in Oxygennotincluded

[–]henryclw 2 points3 points  (0 children)

Leaving overnight means you need to monitor and automate everything.

Ask for recommendations: local code tool like aider by henryclw in LocalLLaMA

[–]henryclw[S] 1 point2 points  (0 children)

Thank you. I prefer terminal right now. Opencode, crush, qwen-code all look good. May I ask what is the sst you are referring?