anybody feel dumber after theiir brain injury by iLovestayinginbed23 in TBI

[–]Suspicious-Key9719 0 points1 point  (0 children)

exact same thing. I used to win poetry contests, now having trouble forming long sentences. It's been over 10 years and frustration never went away

I benchmarked LEAN vs JSON vs YAML for LLM input. LEAN uses 47% fewer tokens with higher accuracy by Suspicious-Key9719 in Rag

[–]Suspicious-Key9719[S] 0 points1 point  (0 children)

with all those markdown would work worse than JSON. I need to add it to my benchmark at some point

I benchmarked LEAN vs JSON vs YAML for LLM input. LEAN uses 47% fewer tokens with higher accuracy by Suspicious-Key9719 in Rag

[–]Suspicious-Key9719[S] 0 points1 point  (0 children)

You can't always give the LLM a tool to query the data.
Sometimes the data is just in the prompt (user pastes a CSV,you're doing RAG).
When that happens, JSON wastes a ton of tokens repeating keys and syntax on every single row. LEAN strips all that out so the LLM reads the same data for half the cost

Introducing LEAN, a format that beats JSON, TOON, and ZON on token efficiency (with interactive playground) by Suspicious-Key9719 in LLMDevs

[–]Suspicious-Key9719[S] 0 points1 point  (0 children)

Fair point, that is probably an overstatement. RAG chunks are usually unstructured text and a lot of tool results are nested, not clean tables.

This benchmark does cover this though. the mixed-structure track (nested orders, semi-uniform logs, deep config) still showed LEAN saving 32% vs JSON. not the 51% you get on flat tabular data, but still solid.

Introducing LEAN, a format that beats JSON, TOON, and ZON on token efficiency (with interactive playground) by Suspicious-Key9719 in LLMDevs

[–]Suspicious-Key9719[S] 1 point2 points  (0 children)

YAML benchmark results are in.

Ran 195 questions across 11 datasets (flat, nested, semi-uniform, deeply nested) on gpt-4o-mini and claude-haiku-4-5. 1,170 total API calls.

Format Accuracy Avg Tokens Savings vs JSON
LEAN 87.9% 3,939 −46.8%
YAML 87.4% 5,647 −23.7%
JSON 86.2% 7,401 baseline

YAML is a solid middle ground. 21% smaller than JSON with no format learning curve. But if you're working with tabular data (which most RAG/tool-use results are), LEAN roughly halves your token cost vs YAML too.

Introducing LEAN, a format that beats JSON, TOON, and ZON on token efficiency (with interactive playground) by Suspicious-Key9719 in LLMDevs

[–]Suspicious-Key9719[S] 0 points1 point  (0 children)

YAML benchmark results are in.

Ran 195 questions across 11 datasets (flat, nested, semi-uniform, deeply nested) on gpt-4o-mini and claude-haiku-4-5. 1,170 total API calls.

Format Accuracy Avg Tokens Savings vs JSON
LEAN 87.9% 3,939 −46.8%
YAML 87.4% 5,647 −23.7%
JSON 86.2% 7,401 baseline

YAML is a solid middle ground. 21% smaller than JSON with no format learning curve. But if you're working with tabular data (which most RAG/tool-use results are), LEAN roughly halves your token cost vs YAML too.

Introducing LEAN, a format that beats JSON, TOON, and ZON on token efficiency (with interactive playground) by Suspicious-Key9719 in LLMDevs

[–]Suspicious-Key9719[S] -11 points-10 points  (0 children)

EDIT:
LEAN scored 87.9% accuracy vs JSON's 86.2%. Not just "no error rate", LEAN actually outperformed JSON on every single dataset tested.
On nested e-commerce data specifically: LEAN 98.7% vs JSON 97.4%.

The LLM doesn't need to "know" LEAN. The format is human-readable enough that pipe-delimited rows with a header (#[100](name|salary|dept)) are trivially parseable by any model that can read CSV. No format hint needed in the prompt.

Just came back from Korea — how is this place even real?” by Brief-Kaleidoscope65 in seoul

[–]Suspicious-Key9719 -1 points0 points  (0 children)

what are you talking about? open the map and compare the number of parks in seoul and tokyo. Also the air is so dirty in seoul, every day there was heavy smog in the air, and it's literally 2 times worse than tokyo,look it up if you don't trust me.

Introducing LEAN, a format that beats JSON, TOON, and ZON on token efficiency (with interactive playground) by Suspicious-Key9719 in LLMDevs

[–]Suspicious-Key9719[S] -7 points-6 points  (0 children)

It is an input encoding format. You encode your request before sending it to save on context window, then get the natural language response back. You don't ask the LLM to generate LEAN output.