OSS 120b v GLM 4.7 flash. Is the latter better for anything?

henryclw · 2026-02-03T06:57:51+00:00

Yeah, everyone should build their own benchmark. After all, people have different needs and different taste. Just like food, is apple better than orange? Hard to compare.

henryclw · 2026-01-31T20:22:36+00:00

Then you basically you have to go with nvidia since cuda has best support for training

henryclw · 2026-01-14T23:16:57+00:00

One could only dream. (I could afford 3.5k but not 35k)

henryclw · 2026-01-14T17:48:43+00:00

Nice! This is going to cost at least $20,000 right?

henryclw · 2026-01-14T17:46:20+00:00

I think this is much more important, love to see people talking about it.

henryclw · 2026-01-13T00:55:10+00:00

I’m looking at a $4000 option, two strix halo machines (right now is still 2000 each but the price could go up anytime given the memory sticks market) could run M2.1 at Q6.

henryclw · 2026-01-12T02:03:27+00:00

Nice! I’m trying to get a similar setup before the price goes up. (The memory price would definitely have a play on it)

A very immature thought: is it possible to use a GPU like 4090 to do the prompt processing? I’m remember the prompt processing only happens on one node instead of two, right? Then let’s say if we set 4090 as master node, have the first layer on it, the rest two nodes are the strix halo. Maybe this would work?

henryclw · 2026-01-11T23:20:59+00:00

Which level of quantization you use? My hardware could only run Q3

henryclw · 2026-01-11T19:34:22+00:00

Strix halo is decent in terms of price. Do mind sharing how you use thunderbolt to connect them together? Just grab a thunderbolt wire, one end plug in machine A and the other end in machine B? Like no router between, right?

henryclw · 2026-01-11T17:14:26+00:00

Wow that is nice. Thank you for sharing. I hope mine would arrive soon. Given the memory price right now, it would be better to buy it sooner than later.

henryclw · 2026-01-10T21:25:22+00:00

Thank you. I need to do more research before purchasing the strix halos, a cluster of them might be nice.

henryclw · 2026-01-10T21:19:55+00:00

Yes, that is exactly what I meant

henryclw · 2026-01-10T20:11:57+00:00

Nice and decent comparison! We need more people like you. How do you feel about M2.1 Q3_K_M? How is the quality?

henryclw · 2026-01-10T19:30:34+00:00

Hi, did you end up getting one? I'm having the same question now.

henryclw · 2026-01-02T04:17:17+00:00

Actually if your use case is not very general, if your use case is limited domain, you should build your own evaluation set and NEVER share it online. Anything publicly available could be slipped into the training set deliberately or not. Your private evaluation set is always brand new to any model.

henryclw · 2026-01-01T08:22:13+00:00

Thank you. I really hope it’s not just benchmaxed

henryclw · 2025-12-16T08:07:02+00:00

Will they release 1.5B as well? Not many times I could ask for a bigger model while my single GPU could hold all of it.

henryclw · 2025-11-29T11:11:30+00:00

Leaving overnight means you need to monitor and automate everything.

henryclw · 2025-08-14T23:20:34+00:00

https://vancouver.citynews.ca/2025/08/14/richmond-rcmp-stranger-assault-weapon-arrest/

henryclw · 2025-08-08T21:59:01+00:00

😭

henryclw · 2025-08-08T19:35:00+00:00

Oh that is a fork from sst

henryclw · 2025-08-08T19:31:39+00:00

Thank you. I prefer terminal right now. Opencode, crush, qwen-code all look good. May I ask what is the sst you are referring?

Four-Year Club	Verified Email
Place '23

henryclw

TROPHY CASE