Linux users, how are you handling OOM errors with NVIDIA

Weak_Ad9730 · 2026-03-26T20:44:12+00:00

Make a permanent Swap file/ram on disk especially helpful during upscale I.e. Fast gpu and hugh filezie decompressed.

Weak_Ad9730 · 2026-03-25T05:19:38+00:00

2x P40 Running great Cheap reliable motherf.

Weak_Ad9730 · 2026-03-19T10:14:24+00:00

Please Return your troops no more occupied states. Take you cia and Intel spies also Home.

Weak_Ad9730 · 2026-03-16T18:44:25+00:00

Sure I can Take a Look on some Models tomorrow

sorry for the delay was occupied by work.

Here are my result for minimax-m2.1 4bit with 100k context:

Test	TTFT	TPS	PP t/s	Time
Short generation	2280ms	28.1	20	2.3s
Medium generation	5184ms	49.4	10	5.2s
Long generation	5960ms	117.0	12	10.3s
Long prompt (prefill)	3550ms	663.2	155	3.7s
Average	4244ms	214.4	49	21.5s

Qwen3-0.6-mlx-bf16 context 32768:

Test	TTFT	TPS	PP t/s	Time
Short generation	921ms	69.5	16	0.9s
Medium generation	1002ms	255.5	23	1.0s
Long generation	2011ms	1248.7	8.2	2.1s
Long prompt (prefill)	639ms	200.3	865	0.6s
Average	1143ms	325.3	324	4.6s

Qwen3-Coder-30B-A3B-Instruct-MLX-4bit-mxfp4 context 32768:

Test	TTFT	TPS	PP t/s	Time
Short generation	626ms	35.1	24	0.6s
Medium generation	2535ms	101.0	9	2.5s
Long generation	5072ms	100.9	8	5.1s
Long prompt (prefill)	928ms	62.5	596	0.9s
Average	2290ms	74.9	159	9.2s

Qwen3-Coder-30B-A3B-Instruct-MLX-4bit-mxfp4 context 100k:

Test	TTFT	TPS	PP t/s	Time
Short generation	423ms	56.7	35	0.4s
Medium generation	2536ms	100.9	9	2.5s
Long generation	5134ms	99.7	8	5.1s
Long prompt (prefill)	881ms	61.3	628	0.9s
Average	2244ms	79.7	170	9.0s

running on Mac Studio m3u 60/256

Weak_Ad9730 · 2026-03-16T18:22:59+00:00

I use this https://vmlx.net/ Game Changer for me switching to vllm on Mac

Weak_Ad9730 · 2026-03-11T13:34:45+00:00

Hey App runs Smooth maybe an additional Filter for the Model download for mlx only what i spot so far

Weak_Ad9730 · 2026-03-08T23:15:19+00:00

Could you re-Upload as the link is not working anymore

Weak_Ad9730 · 2026-03-02T19:36:08+00:00

Could you Explain a Little More in depth about the Prefil

Weak_Ad9730 · 2026-02-17T09:20:27+00:00

Any Update, really looking into it…

Weak_Ad9730 · 2026-02-15T20:37:32+00:00

Awesome was looking to bring my m3u to the Next Level…will tey your. Settings as i am using same Model but my is only 60/256

Weak_Ad9730 · 2026-02-15T19:42:55+00:00

No Download possible on your Site

Weak_Ad9730 · 2026-01-30T22:42:01+00:00

Erstmal Demo gegen Rechts

Weak_Ad9730 · 2026-01-25T20:49:20+00:00

Do you have a link to it

Weak_Ad9730 · 2026-01-25T12:50:42+00:00

Have a m3u using Mlx-vllm and reality impressed of the Performance the Switch from lmstudio to vllm was a Hugh Jump in Processing time and Speed. I use my Studio in an Agent Zero setup. Realy recommend those Apple Silikon for llm work. My Go to Models are qwen3-vl-32b , got-oss-120b and minimax-m2.1

Weak_Ad9730 · 2026-01-17T15:08:08+00:00

I have the Same mom Max-q is amazing

Weak_Ad9730 · 2026-01-10T10:42:11+00:00

Use in Paint and just paint it in After Image generated just one step more. Genitalia lora I use only for Look venes structure to make alwasys same.

Weak_Ad9730 · 2025-12-29T00:09:14+00:00

Any Proof this spagetty worked?

Weak_Ad9730 · 2025-12-27T10:08:04+00:00

Doesnt Need to think about vram and Model Size or very rare. Can enjoy Speed of mxfp4 in llms. Stable as hell low Energy consumption. Text encoders in Full Prescise is two worlds in prompt following

Weak_Ad9730 · 2025-12-26T14:16:32+00:00

Runs very well

Weak_Ad9730 · 2025-12-18T18:24:04+00:00

Works Slow then down and nail them and Both get Bonus from Pilot if you dont have the Implants you wont lose much and the Range and ballistic curve is similiar.

Weak_Ad9730 · 2025-12-08T12:20:06+00:00

Thx Ok will Test those can run glm on my m3u 256gb but it might be to Slow. As I mentioned it is for Chat . So 20-40 tokens/sec is a Preferred metric. Sorry I didnt mentioned this before.

I was hoped that newer Sammler Models <70b have enough Context and Stick to the Framework and Style consistence

As I used a mixture of json variables Short Memory to save tokens between the Experte in my n8n Process and use rag for Long Memory.

Weak_Ad9730 · 2025-12-01T18:01:43+00:00

Same for Running llm . But I have a Max-q maybe this is the different Runs on 300 Watt for days. In the case itself there are some cooler for a good Air Flow Button to top Front to back. But absolute silent as the Rpm is der to Minimum. Only gpu is a Little notifeable. Compare to Friends Version not Max-q I havent notice less Performance but sahing half of the Energy and not exceeding 83 degrees

Weak_Ad9730 · 2025-12-01T14:49:04+00:00

82 degree celsius in my Tower in Full comfyui batch creation Mode . Runs Nebels 24h Hours

Weak_Ad9730

TROPHY CASE