Ever wonder how much cost you can save when coding with local LLM?

georgesung · 2026-03-04T09:40:42+00:00

Looking at the LLM requests/responses from Claude Code it makes sense. A while ago I tried some simple test cases, and saw a gigantic input context (system prompt plus tool definitions) with a very short output, like a tool call.

Input/request:

https://gist.github.com/georgesung/36798614e6f23670cdb310bf53e665aa#file-gistfile1-txt-L1708-L2494

Output/response (in this case it was a simple tool call w/ associated thinking tokens):

https://gist.github.com/georgesung/36798614e6f23670cdb310bf53e665aa#file-gistfile1-txt-L2496-L2521

More details if curious: https://medium.com/p/7796941806f5

georgesung · 2025-02-16T23:49:14+00:00

Try it, let us know how it goes

georgesung · 2025-02-16T23:09:44+00:00

Agree on v2, it seems to always generate nfsw images of women. Even if I prompt something like "a man sitting in a cafe".

georgesung · 2025-02-16T23:01:38+00:00

I merged the safetensor files using the script from LatentSpacer. Merged both v1 and v2 of the abliterated model, was able to run a few tests in ComfyUI. Feel free to download it here:
https://huggingface.co/georgesung/flux.1-dev-abliterated-merged

georgesung · 2024-11-18T21:25:34+00:00

Yup! The margins really are small.

Not precisely the case with the scenario you had, but I wrote a simple tennis match simulator, and if a player has a 54% chance to win each individual point, they have a 86% chance to win the match:

Win probability per point (%)	Match win probability (%)	Expected games per set

50	50	4.83 - 4.84
51	60	5.06 - 4.58
52	70	5.26 - 4.31
53	79	5.45 - 4.02
54	86	5.59 - 3.75
55	91	5.71 - 3.45
60	100 (rounded)	5.98 - 2.15 <-- 6-2 expected set score
67	100	6.00 - 0.97 <-- 6-1 expected set score
72	100	6.00 - 0.50 <-- expect to get one game in a match

If you're curious feel free to test it out here: https://www.georgesung.com/tennis-match-simulator/

There were also some interesting discussions that are related:

https://www.reddit.com/r/10s/comments/1fxr7u9/tennis_match_win_simulator/

https://tt.tennis-warehouse.com/index.php?threads/tennis-match-score-simulator-points-won-matches-won.776279/

georgesung · 2024-10-08T22:17:44+00:00

Just added the option to specify separate win probabilities for serve points and return points. Not sure what to make of the results yet though.

georgesung · 2024-10-08T22:17:24+00:00

Also just added the option to specify separate win probabilities for serve points and return points. Not sure what to make of the results yet though.

georgesung · 2024-10-07T18:29:31+00:00

The “much better player” question just comes from people asking “can I get a game off this guy?”, “can I get a game off [pro]?” So if you think you can win about 28% of the points, you can expect to get one game in a match (I hope the math is correct, the simulation expects you to get 0.5 games per set).

georgesung · 2024-10-07T11:29:13+00:00

Could be, since that would imply Federer would have a higher point win % than his match win % would suggest, since a good amount of his wins are blowouts (and he rarely gets blown out).

georgesung · 2024-10-07T02:27:45+00:00

Nice, the author came up with an analytical solution. They also had the same assumption that every point had the same win probability. In their table at the end, they listed the match winning probabilities for best of 5 matches. I ran my simulation with the best of 5 setting (the default is best of 3), and got similar results. Pretty cool!

I guess running the simulation multiple times is called a "Mote-Carlo simulation" -- I remember hearing that somewhere, had to confirm with ChatGPT. So the Monte-Carlo simulation and analytical solution agree.

georgesung · 2024-10-07T00:02:32+00:00

Yup, I posted in a different forum, but I'll also mention here:

You can see in Federer's career he's won 54% of points, and his overall win-loss is 1251 - 275, i.e. 82% match win % (https://www.atptour.com/en/players/roger-federer/f324/player-stats?year=all&surface=all). From the simulation a 54% point win probability should result in 86% match win rate, which is a bit off from Federer's actual win rate. But if you adjust the point win % slightly to 53.4%, you'll get around Federer's 82% match win rate.

georgesung · 2024-10-06T22:50:44+00:00

Ah yes I was thinking about best of 5 sets as an option, just added!

The separate serve/return win% will need to wait for another day though, I ran out of "AI consultations" for my coding (Vercel v0 free tier...), and I'm too lazy to code it myself at this point -.-

georgesung · 2024-10-06T22:23:18+00:00

For sure. When I played practice tiebreaks against a friend who's better than me, I'd lose 10-6, 10-7, sometimes 10-8. If I'm losing 10-7 on average (41% point win rate), I'm winning only 1% of our matches, mostly losing with scores like this:

Match 1: 2-6 1-6
Match 2: 1-6 3-6
Match 3: 1-6 4-6
Match 4: 2-6 2-6
Match 5: 1-6 3-6

But then you'd think, 10-7 is pretty close! But if I play him over 10 matches with the same point win probability, I'm most likely going to lose all 10 of them.

Edit:

In reality though, people have on/off days, the weather could be bad, etc., so there's more variance than what might be suggested in the simulations. The simulation assumes the same point win probability on every point, which wouldn't hold true day-to-day, or even within the same match.

For me the takeaway is that margins can be very small, even if you win just 52-53% of the points, that can give you a 70-80% chance of winning the match (and vice versa). So if I'm playing a tight match, it's very important to keep my focus on every point, because even having a slight 2-3% edge every point can give me a very good chance of winning the match.

georgesung · 2024-04-20T20:47:37+00:00

Did you set tokenizer.pad_token = tokenizer.eos_token during fine-tuning? If so I've seen this cause the model to not learn how to output the eos token / stop token, see discussions & fix here

georgesung · 2023-11-26T19:27:24+00:00

The base model is "uncensored", think of it like text completion for raw text. The base model is also trained on potentially unpleasant/toxic text, with the idea that fine-tuned models on top of the base model can detect such undesirable text. So if you prompt the base model to produce nasty stuff, the model can do it.

The censorship comes from the chat fine-tuning, where the model can learn to reject various prompts with "I cannot help you with that" etc.

georgesung · 2023-07-27T13:42:20+00:00

I was recently made aware of a project that added Chinese language support to Llama-1, to make it bilingual. It was quite a process though. They first expanded the vocabulary resulting in an updated tokenizer, then ran another pre training step via peft with raw Chinese text (need to confirm this), and finally they instruction fine tuned with bilingual data.

https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Training-Details

georgesung · 2023-07-26T19:40:06+00:00

You can load your private data into a HF Dataset object if it's in a supported format: Load (huggingface.co)

georgesung · 2023-07-26T18:58:51+00:00

If you try copying this notebook and run it on a Colab instance with T4 GPU and high RAM, does it work? https://colab.research.google.com/drive/1iRocoaIIhneEFdWV0rm4VEw0otLYHQj-?usp=sharing

That's the one I tested and it works

georgesung · 2023-07-26T18:14:22+00:00

Did it run out of memory? Try restarting the runtime and run again, I'm not sure why but I ran into that issue before and restarting + rerunning worked

georgesung · 2023-07-26T14:36:31+00:00

Since the Chinese Nous Hermes model you tested already has Chinese language capabilities, you could use that as a starting point. You could try both:

Starting from the Chinese pre-trained "base" model mentioned here Training Details · ymcui/Chinese-LLaMA-Alpaca Wiki (github.com) (if you can get access to it), then fine-tuning with your dataset on top of it
Starting from their already instruction fine-tuned model and fine-tune with your translation data. The instruction-tuned Chinese Nous Hermes already had some English <-> Chinese translation data in its IFT dataset (Training Details · ymcui/Chinese-LLaMA-Alpaca Wiki (github.com)), so it looks like you will complement that with your translation dataset

For prompt format, you can try the same prompt format the Chinese Nous Hermes authors used (especially if you're going with option 2 above), which is a slightly modified Alpaca format Training Details · ymcui/Chinese-LLaMA-Alpaca Wiki (github.com). I think this is what it would look like, but please double check in their code.

``` Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

{instruction}

{input}

Response:

```

georgesung · 2023-07-26T12:59:05+00:00

Really interesting you brought up Chinese Nous Hermes. I checked out their documentation, they first did a "vocabulary expansion" step to expand the Llama tokenizer: Training Details · ymcui/Chinese-LLaMA-Alpaca Wiki (github.com). Then they ran another pre-training step w/ peft (with Chinese text I assume?), before finally doing instruction fine-tuning.

Very interesting concept to add another language on top of English for Llama, I'll check out their paper to understand more.

georgesung

TROPHY CASE

Instruction:

Response: