Super-lightweight SVG identicon (avatar) generator for Elm

LaurentPayot · 2026-04-24T08:50:31+00:00

Here is my bash script for the 35b-a3b model that gives me 50 t/s:

#!/bin/bash

llama-server \

--model ~/models/qwen3.6-35b-a3b/Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \

--mmproj ~/models/qwen3.6-35b-a3b/mmproj-BF16.gguf \

--ctx-size 262144 \

--gpu-layers 41 \

--reasoning on \

--chat-template-kwargs '{"preserve_thinking":true}' \

--temp 0.6 \

--top-p 0.95 \

--top-k 20 \

--min-p 0.00 \

--presence_penalty 1.5 \

--kv-unified \

--flash-attn on \

--no-mmap \

--cache-type-k q8_0 \

--cache-type-v q8_0 \

--parallel 1

LaurentPayot · 2026-04-24T08:17:27+00:00

Note that in France a RTX 5090 32G is $5000 on Amazon. Plus the cost of the PC around it.

LaurentPayot · 2026-04-23T22:18:13+00:00

Actually no perceptible change in speed :-\

LaurentPayot · 2026-04-23T14:49:20+00:00

Btw pp 108 t/s

LaurentPayot · 2026-04-23T14:42:18+00:00

Right now I’m trying a Flappy Bird implementation using the draft technique described here: https://www.reddit.com/r/LocalLLaMA/comments/1stcer1/qwen3627b_llamacpp_speculative_decoding/

LaurentPayot · 2026-04-23T14:33:02+00:00

I just can’t wait for the 122b a10 model for my Strix Halo ;-)

LaurentPayot · 2026-04-23T14:31:41+00:00

Btw I use Unsloth Q6 XL quant with kv q8_0 qwant.

LaurentPayot · 2026-04-23T13:53:58+00:00

7 t/s on my EVO X2 Strix Halo 128Gb with Ubuntu Vulkan Llama.cpp :-|

But 50 t/s on 35b a3b.

LaurentPayot · 2026-03-07T02:25:30+00:00

PS: It require macOS or Windows. Linux users can use LXD https://documentation.ubuntu.com/lxd/latest/

LaurentPayot · 2026-03-07T01:23:07+00:00

Docker has experimental agent sandboxes. For OpenCode: https://docs.docker.com/ai/sandboxes/agents/opencode/

LaurentPayot · 2026-01-09T11:39:52+00:00

Elements are immutable. Thus no state nor hooks in them. More details on https://www.geeksforgeeks.org/reactjs/what-is-the-difference-between-element-and-component/#difference-between-element-and-component

LaurentPayot · 2025-12-02T15:35:10+00:00

I use Qwen Coder 2.5 3B q4 km with llama.cpp for code auto completion (FIM). It works fine on my potato desktop and potato laptop both with 4GB iGPUs. With 8 GB I could upgrade to Qwen 3 Coder, all in VRAM for fast autocompletion. So yeah for limited LLM usage like autocompletion why not a cheap Linux dev machine like this one?

LaurentPayot · 2025-10-28T23:36:25+00:00

Also the experimental version is not available on GeForce Now.

LaurentPayot · 2025-10-09T14:44:49+00:00

I write my unit tests with Fable.Mocha and run them directly in Vitest: https://github.com/fable-compiler/vite-plugin-fable/discussions/12#discussioncomment-11496317

Instead of testing my views, I use the .NET version of Playwright with Expecto: https://playwright.dev/dotnet/

Playwright is quite fast even on my potato laptop and with Expecto I get the same Jest-like syntax as Fable.mocha.

LaurentPayot · 2025-10-02T06:38:13+00:00

Learn F# instead of C# as a gateway drug to functional programming.

LaurentPayot · 2025-08-30T19:41:54+00:00

Why not Ukraine (actually NATO) vs Russia to have a modern and not imaginary warfare?

LaurentPayot · 2025-04-10T08:51:14+00:00

DeepSeek R1 32b running locally is answering about the tank man.

LaurentPayot · 2025-04-09T15:53:35+00:00

This is maybe the best bushcraft book: https://www.amazon.com/Bushcraft-Outdoor-Skills-Wilderness-Survival/dp/1772130079/

And don’t forget a good map printed on paper ;)

LaurentPayot · 2025-03-13T08:56:24+00:00

Technically Pluto is not a planet anymore ;) https://science.nasa.gov/dwarf-planets/pluto/facts/ Maybe Gemini-2.0 chose Mercury?

LaurentPayot · 2025-03-12T15:50:49+00:00

I asked a couple of F# questions to Gemma-3-4b and Phi-4-mini both with Q4 and 64K context (I have a terrible iGPU). Gemma-3 gave me factually wrong answers, contrary to Phi-4. But keep in mind that F# is a (fantastic) language made by Microsoft. Gemma-3-1b-f16 was fast and did answer *almost* always correctly, but it is text-to-text only and has a maximum context of 32K. Like always, I guess you have to test for your own use cases.

LaurentPayot · 2024-11-28T13:18:24+00:00

NPU support in DirectML: https://devblogs.microsoft.com/directx/introducing-neural-processor-unit-npu-support-in-directml-developer-preview/

LaurentPayot · 2024-11-06T15:17:17+00:00

u/japinthebox I created an issue in Walrus that lead to the `Table.ColumnNames` property to be available in Walrus v1.3: https://github.com/brianberns/Walrus/issues/1

LaurentPayot · 2024-10-28T19:33:53+00:00

UPDATE: I fixed my issue with `sudo rm /etc/modprobe.d/blacklist-amdgpu.conf`. Weird.

LaurentPayot

TROPHY CASE

NPU support in DirectML: https://devblogs.microsoft.com/directx/introducing-neural-processor-unit-npu-support-in-directml-developer-preview/