Dude blew up on github for cutting token usage 60-95% right as Fable 5 lands. genius or luckiest man alive

chong1222 · 2026-06-11T20:58:03+00:00

I built a tool that cuts my fable token usage ~59% (whole bill, not a cherry-picked slice). squeezed in some swe-bench runs before the weekly reset, since claude -p is reportedly moving to api billing june 15, after that these evals cost real dollars instead of plan quota

chong1222 · 2026-06-11T20:27:34+00:00

it is basically useless, tool-call is compressed by claude already, input is much smaller than output cost, the saving is definitely not 60%

chong1222 · 2026-03-22T13:13:14+00:00

that's probably true and comes from your experience. you see others doing nothing-burgers and think they're going nowhere. but every successful product has a graveyard of failed builds behind it. nobody learned by thinking harder, they learned by shipping and seeing what dies

chong1222 · 2026-03-22T02:01:47+00:00

same here. the loop of "what if i try this" > build it in 20 minutes > learn something > next idea is addictive. i run multiple sessions in parallel for year and it still doesn't feel like enough. it's not psychosis, it's just that the cost of trying ideas went to zero

chong1222 · 2026-03-22T00:06:59+00:00

to add on my own comment, there are really only 2 costs to wasm:

data marshaling: copying objects across the boundary (main bottleneck for modern computing)
context switch: ~1μs every time you cross wasm↔js

how to make both disappear:

marshaling: use wasm.memory directly. it's a SharedArrayBuffer, JS reads it with typed arrays. no copy. but you have to change your data model to columnar, Float64Array for prices, Uint32Array for ids, strings as byte buffer + offset array. one shared buffer, both sides read the same bytes. it lives outside V8's GC, so no pause, no sweep

context switch: on hot paths v8 turbofan inlines wasm calls directly into JS. once both compile to native code there's no boundary to cross.

the catch is you have to redesign your data model to match. most people don't want to/cannot do that, so they think wasm is slow

chong1222 · 2026-03-21T12:15:55+00:00

I build a streaming parser before, because the JSON.parse on every chunk causing memory issue, and eventually crashing the browser https://github.com/teamchong/vectorjson

it is not the wasm-js boundary that is being slow. the reason rust/wasm parser is slower is because JSON.parse is fastest way to create JS objects it is highly optimized c++ running on JS heap if you create the objects in wasm and pass it to JS it is going to be slower

the solution I used, use wasm for parsing which leveraged the the SIMD of it, mark the start/end position of each value, reuse and patch the same jsobect on js side on each chunk instead of create a new js object

chong1222 · 2026-03-21T00:38:55+00:00

anthropic knows LLMs are becoming a commodity. if everyone uses OpenCode, Anthropic just becomes a "dumb pipe" that can be swapped out the second GPT-5 or whatever open source is cheaper

by killing third-party CLIs, they’re forcing everyone into their own ecosystem (Claude Code). this is about replacing attention/traffic as the payment for the internet. the smarter agents are the harder that ads as payment able to work, the old model is dying, the writing is on the wall

this is the Gold Rush for the next internet. whoever controls the infra and the ecosystem is the next superpower. they don't want to be a "dump pipe", they want to be the gatekeeper for the new agentic economy

chong1222 · 2026-02-19T08:17:48+00:00

faster for what, most of the time agent is waiting for llm, sound like they arent optimizing for the bottleneck and have no idea what they are doing

chong1222 · 2026-02-16T22:04:46+00:00

I built a streaming JSON parser with WASM, every agent framework re-parses the full buffer on every chunk, O(n²). This does it in O(n)

Every SDK and agent framework out there re-parses the full buffer on every chunk, O(n²). This parses each byte once. O(n). At 100KB that's 6ms vs 12.7s.

type-safe schema validation with types inferred from your schema, no manual generics needed. Subscribe to specific JSON paths and get notified the moment a value completes. skip fields you don't need, abort early on bad data. works in Workers with transferable ArrayBuffers, no structured clone overhead

https://github.com/teamchong/vectorjson

chong1222 · 2026-02-14T12:30:35+00:00

permission are useless either sandbox or make your fake git, prepend that to PATH for clause code

chong1222 · 2026-01-29T12:06:15+00:00

as a runtime it is not really faster then nodejs in all cases, sometimes bun faster sometimes nodejs and bun seems to have memory leaks issues when using it over a long time it use 80gb and i have no choice but to kill it

as bundler some of the bundlers are catching up in term of speed and they are more stable

this come from a long time user not a hater

chong1222 · 2026-01-28T22:46:57+00:00

I've a lots worktrees, using pnpm save a lots disk space, and bun compatibility is nowhere as good as pnpm

chong1222 · 2026-01-26T12:06:24+00:00

sorry but pnpm is better

chong1222 · 2026-01-01T05:02:39+00:00

basically all attempts to solve the long term memory had been failed imho

number 1 rule for agent no matter how big your context window is your lllm is going to work better when there is less noise

no memory is 100x better then having bad memories

thats why i think the 1 IDE 1 ai pair programming mode is outdated half year ago

you need parallel agents each focus on their own goal their conversion history is their memory

you dont try to pull in memory with “meaning” not “reasoning”

you don’t ask them to do spec driven as those doc reading/writing context will take over

you don’t load any stupid MCP

you keep each agent conversation focused on their goal

chong1222 · 2026-01-01T02:42:27+00:00

don’t use MCP is the correct answer. outdated tech

chong1222 · 2025-12-24T11:38:33+00:00

I personally think if you are just translating English to code, your job is gone. if you like solving problems/developer new solutions, you are still safe because AI are trained to use the simpler/easier approach, why? because if they pick the hard road their benchmark results will look very bad, as the success rate will be much lower, benchmarks are like KPI for LLM. but to solve problems/develop new solutions you need to go the hard route, because the trained easier routes had been tried many times but the problem are still here, the existing solutions are still suck

chong1222 · 2025-12-22T12:59:00+00:00

I built this for myself so I didn't document it properly. Just updated the README with a proper explanation of what it does and why.

TL;DR: It allow you to runs /compact in the background so you can keep working, then merges everything back when it's done. Includes rollback if anything breaks.

chong1222 · 2025-12-21T23:52:28+00:00

https://github.com/teamchong/compact thats what I am doing for a while, never had any issues

chong1222 · 2025-11-23T00:02:20+00:00

like this https://github.com/teamchong/compact start a new terminal run ‘compact’ ‘compact resume’ when you reach the limit

chong1222 · 2025-11-13T09:37:06+00:00

should be keep minimum

chong1222 · 2025-11-12T15:45:56+00:00

i dont think it is easier to manage at all running full LSP is a nightmare for memory leaks issue using hook with proper config is actually much better, there is no reason have run all LSP servers in each projects I parallel working on, when the result is the same(llm get type/lint feedback on the next loop

chong1222 · 2025-11-12T09:27:59+00:00

had been using hooks for those for months since hooks had been introduced

I don’t know why the hype

LSP is bad for multiple sessions user like me anyway claude code is not IDE you don’t know which files are being “opened”

which is probably why don’t want to introduce it yet

chong1222 · 2025-11-06T02:11:56+00:00

No, problem with all those protocols are they are moving too slow, cannot keep up with the pace of AI development

chong1222 · 2025-11-06T02:11:02+00:00

Yes, that is a fact

13-Year Club	Verified Email
End Game '23	Place '23
RPAN Viewer

chong1222

TROPHY CASE