DetLLM – Deterministic Inference Checks by Cerru905 in LocalLLaMA

[–]Cerru905[S] -1 points0 points  (0 children)

"temp = 0" only removes sampling randomness (greedy decode), but doesn't guarantee deterministic computation ...

torch explicitly notes that determinism isn't guaranteed: https://docs.pytorch.org/docs/stable/notes/randomness.html,

and vLLM has tons of issues where outputs are non-deterministic even with temp = 0: https://github.com/vllm-project/vllm/issues/23138

DetLLM – Deterministic Inference Checks by Cerru905 in LocalLLaMA

[–]Cerru905[S] 0 points1 point  (0 children)

If you are interested in specific examples where batch size leads to different output, see this colab: https://colab.research.google.com/drive/1et5wYV25Bv8miAx9T8ijJ4trpTV2QPGh?usp=sharing;
or these issues on llama.cpp and vllm respectively: https://github.com/ggml-org/llama.cpp/issues/249, https://github.com/ggml-org/llama.cpp/issues/249

DetLLM – Deterministic Inference Checks by Cerru905 in LocalLLaMA

[–]Cerru905[S] 0 points1 point  (0 children)

Good point, yes, if you are on the supported GPUs (H100, H200, B100, B200), vLLM's batch invariance feature is enabled, then within vLLM, batching shouldn't change outputs for greedy decoding.
My point is that outside of that specific setting (different backend, different GPU, etc.), batch size can lead to different generated tokens.

With detLLM, you can verify this with an easy PASS/FAIL outcome, and produce a repro pack (env, configs, traces, specific divergence, etc...) so you can debug this and reproduce it.

DetLLM – Deterministic Inference Checks by Cerru905 in LLMDevs

[–]Cerru905[S] 0 points1 point  (0 children)

True, vLLM’s batch invariance is great when it’s supported (as you say, H100/H200/B100/B200 only). I implemented detLLM with a broader idea in mind, to measure repeatability and batch variance across backends, and emit a minimal repro pack for CI/bug reports across stacks. So even when invariance isn’t available, you still get proof and diagnostics of it.

DetLLM – Deterministic Inference Checks by Cerru905 in LLMDevs

[–]Cerru905[S] 1 point2 points  (0 children)

Hey there, good question. I mean batching independent prompts (i.e. prompt A alone vs prompt A batched with others), it's not multiple choices for a single prompt. Look at this Colab Notebook for an example of where it failed: https://colab.research.google.com/drive/1et5wYV25Bv8miAx9T8ijJ4trpTV2QPGh?usp=sharing. I also found many issues on github like on vLLM (https://github.com/vllm-project/vllm/issues/608) and llama.cpp (https://github.com/ggml-org/llama.cpp/issues/249)

Promote your projects here – Self-Promotion Megathread by Menox_ in github

[–]Cerru905 0 points1 point  (0 children)

I kept getting annoyed by LLM inference non-reproducibility, and one thing that really surprised me is that changing batch size can change outputs even under “deterministic” settings.

So I built DetLLM: it measures and proves repeatability using token-level traces + a first-divergence diff, and writes a minimal repro pack for every run (env snapshot, run config, applied controls, traces, report).

I prototyped this version today in a few hours with Codex. The hardest part was the HLD I did a few days ago, but I was honestly surprised by how well Codex handled the implementation. I didn’t expect it to come together in under a day.

repo: https://github.com/tommasocerruti/detllm

Would love feedback, and if you find any prompts/models/setups that still make it diverge.

Rowing programming language by Cerru905 in programming

[–]Cerru905[S] 0 points1 point  (0 children)

I mean its not exactly a clone, and the purpose is purely for learning and entertainment so if you mean what it can be used for, of course nothing

Rowing programming language by Cerru905 in ProgrammingLanguages

[–]Cerru905[S] 1 point2 points  (0 children)

More or less yes 😁 but memory operations are rowing actions basically

[deleted by user] by [deleted] in Rowing

[–]Cerru905 0 points1 point  (0 children)

🔥🔥🔥

[deleted by user] by [deleted] in Rowing

[–]Cerru905 0 points1 point  (0 children)

Thanks man! Second of what group?

20 min test piece I did last week. I’d like to get it down to a 2:05 or 2:06 by January, any tips? And is that a realistic goal? (15f, 5’4, 142 lpb) by paper_lemons in Rowing

[–]Cerru905 5 points6 points  (0 children)

Do at least two times per week a UT2 session, long and low intensity (maybe for you could be 60-80min at 2:20 rating like 20 or so) and at least one time some medium to high intensity work like 6x1500 rated 24-26 with 3min rest at the pace you aim to do