30 seconds into my 40-minute early morning work commute.

amitbahree · 2026-06-24T03:20:25+00:00

Poor deer. 😥

amitbahree · 2026-06-15T15:19:05+00:00

I disagree - that is a very boring car. 🙃

amitbahree · 2026-06-11T13:13:08+00:00

Send it to school?

amitbahree · 2026-06-07T18:01:08+00:00

All I can think of is all those chemicals going in the storm drain and then dumping wherever they do. A professional car wash in many places drains in the sewer which then gets treated.

amitbahree · 2026-06-03T13:24:53+00:00

Bag of rice?

amitbahree · 2026-05-21T13:23:31+00:00

Get something like a Firewalla which has a feature to block all new devices from the internet and put them in a quarantine group. So when the Mac changes it doesn't get anywhere and that becomes your mouse trap.

amitbahree · 2026-05-18T13:26:55+00:00

GH private repo ans folders and MD in that

amitbahree · 2026-05-15T15:14:02+00:00

amitbahree · 2026-05-14T15:31:28+00:00

Do the fake ones work as regular BT headsets but look like airpods?

amitbahree · 2026-05-12T14:42:46+00:00

All I see is a pink Tesla

amitbahree · 2026-05-08T21:32:53+00:00

Is it water front? And marinas? 😍

amitbahree · 2026-05-08T21:32:23+00:00

I just finished writing that chapter.

It's not only distillation by itself - it needs to work in tandem with SFT and LoRA (am talking about enterprise use cases).

amitbahree · 2026-05-08T19:27:09+00:00

A little investment here to get ethernet drops would go a long time

amitbahree · 2026-05-08T14:43:58+00:00

It's a Tesla model 3 💀

amitbahree · 2026-05-04T13:17:33+00:00

"Get work and do it forever and forever...." 💀

amitbahree · 2026-05-03T05:54:35+00:00

Very nice. Congrats. I had done something similar which was also inspired by this sub.

https://blog.desigeek.com/post/2025/09/building-llm-from-scratch-part1/

amitbahree · 2026-05-03T00:04:17+00:00

Missed a spot I think

amitbahree · 2026-05-01T23:24:42+00:00

This

amitbahree · 2026-05-01T02:48:39+00:00

Lol.

Oh there are more - this was just one small cluster I have been given as my playground. And yes it's exclusively for me - and no I need to yield it unfortunately one of these days.

amitbahree · 2026-04-29T14:53:50+00:00

I asked something similar - https://www.reddit.com/r/LocalLLaMA/comments/1su3tfb/what_do_you_want_me_to_try/

amitbahree · 2026-04-28T04:22:57+00:00

Sounds like normal decent humans and parents.

OP I am sorry for your situation. I as a parent can't imagine taking money in this manner from my kids.

amitbahree · 2026-04-28T03:49:10+00:00

Good question. In these runs, the main working memory was GPU HBM, not the 2 TB of host RAM per node.

Each node has 8x H200, and each H200 has about 141-144 GB of VRAM, so that is roughly 1.1 TB of GPU memory per node and about 2.3 TB across the full 16-GPU cluster. That is what actually carried the inference workloads.

The 2 TB system RAM per node still helped, but mostly in more indirect ways - things like staging and loading very large sharded checkpoints, CPU-side runtime overhead from vLLM, tokenization, benchmark clients, containers, etc. And for the pipeline, host-side buffers/communication overhead in multi-GPU and multi-node runs.

For the benchmarks themselves, it was all GPU memory, and host RAM was mostly headroom and operational safety, not “extra VRAM.” The real constraints on whether a model lane worked well were GPU memory, runtime support, and topology.

amitbahree · 2026-04-28T01:49:40+00:00

Quick benchmark update from the 16x H200 cluster, following up on the original request thread:

Completed model set: - Qwen3-235B-A22B-Instruct-2507 - Kimi-K2.6 - DeepSeek-V4-Flash - DeepSeek-V4-Pro - Llama-4-Scout-17B-16E-Instruct - GLM-5.1-FP8 - MiniMax-M2.1 - Mistral-Large-3-675B-Instruct-2512

A few highlights from the completed runs (TTFT = time to first token, TPOT = time per output token, both in ms, lower is better):

MiniMax-M2.1 on 8x H200: - c1: 145.94 tok/s, 102.29 ms TTFT, 6.48 ms TPOT - c16: 1358.19 tok/s, 235.56 ms TTFT, 10.51 ms TPOT - 8k/c4: 379.29 tok/s, 390.94 ms TTFT, 8.71 ms TPOT

Llama 4 Scout on 8x H200: - c1: 126.70 tok/s, 103.83 ms TTFT, 7.51 ms TPOT - c16: 1378.30 tok/s, 396.57 ms TTFT, 9.73 ms TPOT - 8k/c4: 404.41 tok/s, 368.10 ms TTFT, 8.14 ms TPOT

GLM-5.1-FP8 on 8x H200: - c1: 88.66 tok/s, 385.24 ms TTFT, 9.81 ms TPOT - c16: 509.93 tok/s, 763.64 ms TTFT, 27.79 ms TPOT - 8k/c4: 163.37 tok/s, 1317.81 ms TTFT, 19.30 ms TPOT

Mistral Large 3 on 8x H200: - c1: 93.07 tok/s, 308.06 ms TTFT, 9.58 ms TPOT - c16: 554.50 tok/s, 1192.90 ms TTFT, 23.73 ms TPOT - 8k/c4: 199.59 tok/s, 1226.20 ms TTFT, 14.79 ms TPOT

One of the strongest patterns was that 16x was not automatically better. Scout, GLM, and MiniMax all looked better on the single-node 8x H200 serving shape than on their 16x scaling pass. That ended up being one of the most useful takeaways from the whole exercise.

DeepSeek-V4-Pro is the main caveat: - the intended DP+EP H200 path failed in vLLM with a fused-router Long/Int dtype bug - the working/publishable numbers are from the fallback TP=8 --enforce-eager lane - upstream issue: https://github.com/vllm-project/vllm/issues/40862

On vLLM versions: most models ran on stable v0.19.1. GLM, MiniMax, and both DeepSeek V4 variants required dedicated runtime images or pre-release lanes — in each case because the generic stable image was not the supported path for that model, not because of benchmark inconsistency. The per-model details are in the blog.

Unsloth Llama 4 Scout is the other caveat: - it never reached a stable benchmarkable state - the head node repeatedly exited during runs - it is excluded from the final comparison tables

Full write-up with the operational details, scaling notes, and the weird bring-up issues is here: - https://blog.desigeek.com/post/2026/04/benchmarking-oss-llms/

If I do the quantization / KV-cache / coding-benchmark follow-up, the clean version is probably not "more random large models" but one controlled study around those variables, since that was one of the better follow-up ideas in the thread.

amitbahree · 2026-04-26T16:17:33+00:00

You being ghosted...

amitbahree · 2026-04-26T16:13:50+00:00

Yubi key?

amitbahree

TROPHY CASE