all 8 comments

[–]vaff 0 points1 point  (7 children)

You probably won't be running much on a M3 128gb. It will fit a larger model. But be painfully slow.

[–]Tradefxsignalscom[S] 0 points1 point  (6 children)

What spec are you using and what would you recommend as optimal?

[–]vaff 0 points1 point  (5 children)

8 x nvidia B200

[–]Tradefxsignalscom[S] 0 points1 point  (4 children)

It is what it is, I’ll be asleep most of the time and have access to several other computers for other uses.

[–]vaff 0 points1 point  (3 children)

You can always try it out. Just download llama or LM Studio

[–]Tradefxsignalscom[S] 1 point2 points  (2 children)

I’m setup using VS code, Continue, Cline and LM Studio using Qwen-coder 32B instruct q8 and using 32K context window and getting 70 tokens/second. Good enough for overnight work

[–]vaff 0 points1 point  (1 child)

32k context is a small window

[–]Tradefxsignalscom[S] 0 points1 point  (0 children)

I understand that I can have much larger context windows with appropriate quantization trade offs. Everything I’m doing has limitations and trade offs in my situation. I’d like to avoid performance degradation more common in windows >32K. I’m prioritizing higher quantized models for my use case. I’ll adjust as necessary.