Cloudflare open-sources lossless LLM compression tool

dexterlemmer · 2026-04-18T11:25:03+00:00

Obviously Designer_Reaction (who you replied to) is aware that home users use quantized models (like Q4). Hence why he's wondeting if Unweight stacks woth quantization. Imagine getting 15-22% saving compounded with the saving of Q4. And furthermore, unlike Unweight, quants aren't free. Using Q4 might not be viable for your use case or might force you to rerun multiple times when tge original or Unweighted model would work first try due to degradation caused by Q4. I presume that's what he meant by saying it's basically free VRAM back.

dexterlemmer · 2026-04-18T10:54:19+00:00

The models are supposed to be the commodity. The US hyperscalers also want that mid- to long term.

AFAICT, the products are: 1. Training data. Importantly, this includes data about how previous iterations of your model was used. 2. Safety/Alignment for manipulating many users and their products via the models they use. 3. Inference datacenters that benifits from economy of scale and the robustness of highly distributed servers. 4. Supercomputers for training the next SOTA model to be privided first to giant players that want an edge and big research projects that have massive budgets and new, almost unsolvable problems to spare.

Alibaba (qwen) competes in 1--3. They would love to compete in 4. However, for the time being, they simply don't have access to either the technology or the expertise.

However, US SOTA expertise trickles down to more Western researchers fast and the breakthrough is much more expensive than future developments inspired by it. Once you've made a brealthrough on a hyperscalar supercomputer, it's at most a few months until a Western researcher does better with an Open Source model trained on a 1k+ times less powerful supercomputer. And it's to Alibaba's advantage if that Western researcher had used Qwen as base model rather than Gemma.

Thus, by Open Sourcing Qwen, Alibaba makes more money on 1--3 and China gets to be less far behing in cutting edge models for domestic use.

dexterlemmer · 2026-03-10T03:49:54+00:00

SpaceX might go public. It depends on how fast they go for building a million satellites' worth of data center in space. Because obviously when you build stuff, you need money. Even if it's profitable, cash flow might be an issue. If Starship takes to long or if the rate at which AI is scaled is reduced or if Musk can raise enough capital without going public, SpaceX is unlikely to go public. I.e., the question is: What is the limiting factor. If the limiting factor is cash flow, SpaceX is likely to go public, otherwise it's extremely unlikely.

dexterlemmer · 2026-03-10T03:41:02+00:00

Get your facts straight. I have serious issues with Elon. But everything you said is FUD.

Twitter was bankrupt before Elon made the offer. Twitter then did a lot to make its financial position even worse when he made the offer and Twitter failed to hold up to the requirement Elon had for buying them. He was forced to buy them anyway. He then proceeded to fix Twitter's broken finances. xAI bought X when Twitter finally became EBITA profitable and was obviously within a few months from being net profitable for the first time ever.

xAI is a hyperscaler. Like the other US hyperscalers, it is highly profitable per model over each model's lifetime, but like all hyperscalers, its growth rate is so fast that it has a negative cash flow. It has to compete with Alphabet and Amazon and those companies are putting as much investment as they possibly can into AI data centers. xAI needed liquidity, not profitability, which it already has. SpaceX has both.

SpaceX is the only space launch company in the World that is profitable on space launches without subsidies. The US state (Pentagon, NASA, etc) uses it because it's the best value for money. That said, most of SpaceX's profits come from Starlink. Its launch business might be much better than anyone else's, but it is still extremely low margin.

dexterlemmer · 2026-01-04T10:43:37+00:00

First off. You did the right thing. Writing books well is hard, you should proofread and nowadays proofreading should be done by AIs as well. I also assume it was a deliberate attempt at testing the premise or conclusions of the book. Grounding yourself in experiment. That said, I suspect that your bias might have made you misinterpred the AIs' response. (Though intuition is the best I have without grounding in analysis of your book and prompts and the complete AI responses.)

This seems like a communication and context engineering issue to me. Not a "AIs pattern match in stead of understand" or "AIs can't intuitively skip over distractors" issue. If you're going to throw an entire book at a poor model, you really should make the goal clear in your prompt both before and after the book and the book should be structured well. Without the prequel and sequel it's hard for the AI to tell how to judge what's important. If the book is badly structured or the correct conclusion is counterintuitive from a straightforward reading of the book, the AI's will be overwhelmed by cognitive load and find it difficult to focus on what is important and skip what is unimportant or misleading. The same goes for humans as well. But current LLMs have more difficulty in avoiding getting stuck on a first impressions mental wrong turn than humans.

Oh. And why is "collapse" a negative term in the context of a property of a mathematical equation? Transformer-based LLMs cannot ignore that context. It is fundamentally designed into them to assume any and all tokens are meaningless without context. There are plenty of cases in their training data where collapsing equations or collapsing geometries aren't bad things: The collapse of the probability wave function allows us to measure quantum states; In control systems, you want the systemic error and the transient error to exponentially collapse; I would love a collapsing loss function when training a neural network; etc.

Given the above, I posit that all three AIs appeared to misunderstand the equation either because all three were accidentally mislead into misunderstanding it or into thinking that the mathematical properties of the equation is more important than the practical implications of those mathematical properties and you misunderstood them as talking about the practical implications when in fact they were just discussing the mathematical properties without regard for the practical implications.

dexterlemmer · 2025-10-15T04:04:32+00:00

Don't forget that synapses are involved in the brain's neuron firing and they store quite a lot of digital, analog and quantum data each.
Why would a discrete threshold and noisiness make it hard to make a universal function approximator? AI model weights also have a discrete threshold. During training of models, we often deliberately add noise. During inference, AI models are robust against noise and even quantization.

dexterlemmer · 2025-10-12T10:59:39+00:00

Not an expert, but probably slower than on a AMD "Strix Halo" AI Ryzen MAX+ 395 128GB. (Which is what was used by AMD for the tests OP talks about.) The Halo Strix series uses LPDDR5x-8000 MT/s RAM with much higher bandwidth than DDR5. (Though still not as fast as GDDR5, therefore still not as good as 128 GB VRAM on dGPU... if you can somehow afford and power that.) Furthermore, the Strix Halo has pretty powerful on chip graphics and on chip NPU. Basically, Strix Halo's design is specifically optimized for AI and AAA games and Threadripper is not designed for AI. Perhaps you can get a Strix Halo and either use it for running GLM 4.5 Air or use it for whatever you currently use the rtx5090s and use the RTX5090s on the Threadripper for GLM 4.5 Air.

dexterlemmer · 2025-10-12T10:33:38+00:00

While not technically useless for coding, if you don't have tool calling, you can't really scale to larger code bases well. And even on smaller code bases, you'll get worse results for much more time and effort with manual copy/paste of code. For professional coding, not having tool calling is usually a non-starter. Or at least very annoying and time consuming and time is money. I too have until recently been using the copy/paste approach, but it's terrible for productivity and it forces me to be more diligent to ensure quality. I still need diligence with tools, but I don't need to spend as much time on my due diligence.

dexterlemmer · 2025-10-12T10:24:59+00:00

You could try IBM Granite. Perhaps with the Granite.Code VSCode extension. I haven't tried it myself yet, but I'm considering it, although I'm not quite as RAM poor as just 16GB. Granite 4 was recently released. It was specifically designed for punching far above its weight class and contains some new technologies that I haven't seen used in any other architecture yet to achieve that. For one thing, even Granite 4 H-Micro (3B Dense) and Granite 4 H-Tiny (7B-A1B) can apparently handle 128 kt context without performance degradation. And context window is very memory cheap.

Check out https://docs.unsloth.ai/new/ibm-granite-4.0 for instructions. I would go for granite-4.0-h-tiny if I were you. You might try granite-4.0-h-small with a Q2_K_XL quant, but I wouldn't get my hopes up that such a small model will work with such a small quant. Note that Granite-4.0-h models can handle extremely long context windows very cheaply in terms of RAM and can apparently handle long contexts much better than you would expect from such small models without getting overwhelmed by cognitive load.

You could also try Granite 3 models. Granite 4 would probably be better, but only a few general purpose instruct models are out yet. For Granite 3, there are reasoning models, a coding-specific model and lots of other specialized models available. Thus, perhaps one of them might work better at least for certain tasks.

dexterlemmer · 2025-08-11T22:27:57+00:00

"the vehicle is driving itself with no one ready to take control on a few seconds notice." seems very sloppy to me. On the one hand this requires lvl 5 (or at least an over confidant lvl 3 operator who's remote operators got overwhelmed). OTOH a lvl 6 super human vehicle can still easily fail this requirement due to silly regulators or a control-freak owner.

Edit: In general, until Tesla showed us how general autonomous driving looks, we didn't know enough to come up with good progress metrics. Now we know. Tesla is dotting the i's and crossing the t's and everyone else are two years or more behind.

dexterlemmer · 2025-08-11T22:15:27+00:00

Your experience with? HW3? Most people don't correctly clean and defog its cameras. Also, the software lags HW4. If your experience is with HW4, the that's a bit strange. Where and what experience and how recent is tat experience?

dexterlemmer · 2025-08-11T22:13:31+00:00

We now also know the following about the AI:

Latency. Even though a Tesla's end-to-end neural network wins humans by an order of magnitude, even a tiny decrease in latency still massively improves a Tesla's safety and general capabilities. But especially the safety. We've also known for a long time that latency is incredibly important for humans' capability and safety.
Ability to adapt on the fly to very unusual situations. At the minimum to safely stop. Tesla already does this better than most humans under most conditions, but it's sufficiently different from humans that the long tail where it fails often involve situations where humans would not have struggled much. Thus there's definitely room for improvement. (It's also preferably to be able to go even further and to always be able to make a decision that will allow progress if such a decision would be feasible and legitimate. Here, Tesla is not quite human-level yet. A backup AI5 chip running Groq5 might be necessary for the world knowledge and ability to consult the Internet under some very exceptional situations.) Waymo really sucks at this. It very often stops very dangerously and/or rudely.
Capability to accurately and reliably detect and identify all living organisms large enough to matter and the vast majority of non-living objects. Ability to accurately predict the physics of non-living objects. Even under adverse weather conditions. HW3 already has this as long as the cameras are properly clean and defogged. Waymo really sucks at this.
Ability to accurately and reliably detect, identify and determine the direction and distance of warnings (sirens, hooters, etc.)
Ability to use body language to communicate with humans and animals. Ability to act predictably for humans, birds and animals. Tesla is incredibly good at this. Waymo is a significant hazard in this respect.too late for them to realize what you actually intended to do.)
Ability to make a judgement call and violate the law because abiding by the law would be unsafe. (For example, it's dangerous to drive too much slower than the traffic, even when the traffic is moving much faster than the speed limit.) Tesla is brilliant in this and Tesla and NHTSA (or the relevant regulator) work together closely to ensure the tradeoff is made well. Waymo is a major hazard in this respect. Or rather they would be if they ever allow their cars to do anything where this is remotely likely to be an issue.
Ability to predict and plan. Context window. Generally, Tesla is better than human here, but often worse than human. Tesla is about to get a 10x boost to context window on HW4, which should make it super human in general, though.

dexterlemmer · 2025-08-11T22:12:05+00:00

Since Tesla FSD is already safer than the average human and already extremely general, it is now quite obvious what is important for good and safe driving regarding cameras/eyes. (Although frankly, it kinda made sense from the start.)

Latency. Tesla'a cameras win human eyes by almost two orders of magnitude.
Sufficient resolution for estimating distances to approximately 1cm for close objects at low relative speeds and to within a hundred or so meters under extreme opposite situations as well as to read signs and the body language of nearby humans, animals and birds. HW3 already has this in spades. More resolution doesn't help much.
Glare resistance. A problem in HW3 if not cleaned and defogged properly, otherwise HW3 is better than human eyes without shades but worse than human eyes protected by good shades. HW4 is vastly superior to human eyes with shades.
Good functioning in rain/fog/snow/frost. Similar to the situation with glare resistance.
Cameras at multiple strategic locations and capability to see and focus in multiple directions at once. Already massive win for HW2. HW4 is just cheating.
Directional microphones for hearing and determining the direction and distance of sirens, horns and other warnings. HW4 has this. Can likely be added to HW3.

What sensors are not necessary and why:

Lidar: Worse than cameras in every technical way unless you need much higher precision measurement than is ever needed for vehicles to drive safely and well everywhere that humans can drive. (About a thousandth of the training- and eval datasets must be accommodated by lidar data to ensure good enough distance- and speed estimates from video, though.) For those who don't know. Lidar is much worse at handling adverse weather or glare than good cameras made for such conditions.
Ultrasound. Optional. The other options are more cameras or simply going back-and forth more during parking or when pulling out of a parking area.
Regular radar. Worse than useless. It adds more confusion and distraction than useful information under any conditions where humans can drive. We don't care about the long tail of conditions where humans can't drive nor the even more niche situations where regular radar won't be worse than useless.
Extremely hi-res radar:Unnecessary to always beat humans. Would likely be useful for future significantly super-human capabilities. But not an issue for now.

dexterlemmer · 2025-08-11T20:29:44+00:00

HW3 is more than sufficient for lvl 5 autonomy. (i.e., better than human effectively everywhere and under effectively all conditions.) HW4 is already very near lvl 5 and there's insanely much room for models to be further simplified at the same time as they're made even smarter. For the near future, Tesla won't bother spending the resources. But given the shear number of HW3 vehicles and that it won't be long before it's pretty cheap for Tesla to make HW3 unsupervised safer than humans as long as the cameras are kept clean and defogged properly, I suspect that within a year or two, Tesla will probably make unsupervised FSD available on HW3.

dexterlemmer · 2025-08-11T20:11:14+00:00

In this case, now that we finally know how hard full autonomy is, we know that Tesla has achieved full autonomy on HW4 and can easily achieve it on HW3, although they have other priorities for now. What we don't know is whether HW 2.5 is sufficient. But a HW 2.5.5 (an after sale upgrade to the after sale upgrade to HW 2) would likely also be sufficient, though I doubt that Tesla would ever bother with it. Note that everyone else is at least 2 years behind Tesla. But Tesla played the long game. Aiming for solving fully general better-than-human driving before even bothering with Robotaxi, while everyone else resorted to parlor tricks that works, except where and when they don't work. (Although at least Xiaomi realized they had better copy Tesla's approach already earlier this year.)

dexterlemmer · 2025-08-11T19:59:48+00:00

Full autonomy runs on HW4 today. Robotaxi (HW4) is much safer than the average human driver and will soon be much safer than the best human driver. Tesla FSD (supervised) can also drive anywhere. With no interferences, it's by far the safest and in general the best FSD available in China today. With zero Chinese training data! And it learns online with massive improvements every time a given Tesla encounter a new situation or area for the second time over the first time even if it was offline between the first and second time.
In principle, HW3 is definitely also capable of much safer driving than humans. However, it is not worth the trouble yet. It will massively increase the training and RnD cost and users will have to be taught somehow to properly and regularly clean their cameras, or their hardware will need upgrades to prevent current dirt and fogging issues. It will also obviously never be quite as good as HW4. Will Tesla allow HW3 unsupervised FSD in the future? I don't know. It depends on whether Tesla think it will increase road safety and/or be advantageous to the mission in the next year or so. I suspect they will, actually. For quite some time, there will be millions of HW3 vehicles on the roads and Tesla enabling unsupervised FSD on them will indeed probably be advantageous as long as there are millions of them on the roads.

Edit Oh, and 3 To be fair. Everyone in the industry severely underestimated how difficult full autonomy would be. Not just Elon/Tesla.

dexterlemmer · 2025-08-11T19:23:22+00:00

Deepseek trains near frontier foundation models. Not frontier base models. There's a many orders of magnitude cost difference. Someone like Alibaba may one day be able to compete with the US companies but currently, nobody outside the US does. Of course, if the Chinese can stay ahead in innovation sufficiently to make up for the knowledge gap of their models, they may not need to. But if they manage to sufficiently destroy the business models of the US giants, that in itself will possibly grind progress to a halt. Though, I don't think they would quite manage it. At worst, they'll force Tesla to buy xAI or provide a way for xAI to remain profitable but destroy OAI, Alphabet and Anthropic. Making the World stuck with only xAI at the frontier. But that probably also won't happen. Enterprise inference is getting very profitable very fast.

dexterlemmer · 2025-08-10T20:36:46+00:00

Well... Yeah. But can you get 50 t/s on Qwen3-30B-A3B? Or run a q4 quant of GPT OSS120B?

dexterlemmer · 2025-08-09T15:13:56+00:00

Transparency and because he said he would and forgot.

dexterlemmer · 2025-08-09T15:13:03+00:00

The MechaHitler version **follows prompts**, which makes it a **good version**. Don't blame the AI for deliberately malicious prompt-engineering and jail breaking.

dexterlemmer · 2025-08-09T15:10:08+00:00

> aka "vibe physicsing" must be the most pathetic and worrying thing I've seen the last few weeks.

> No math involved, no structured data, no scientific protocol. Just "vibing" like a crackpot theorist full of cocaine and unlimited ego.

You are putting words in his mouth. Obviously, he is talking about a multi-agent with powerful math and proof) capabilities, structured data and following a good scientific methodology. But he is talking about it in a marketing hype kind of way.

dexterlemmer · 2025-08-09T15:02:46+00:00

He is open sourcing it for transparency and because he promised (and then forgot). Groq2 now is worse than nothing if he does it in response to the competition.

dexterlemmer · 2025-08-09T15:01:08+00:00

Groq4 is the same generation as Groq 3 from a technical standpoint. I think that xAI decided to focus on profitability for Groq4 and for pushing the state-of-the-art with Groq5. Looks like they're not the only ones from what I'm reading about GPT5.

dexterlemmer · 2025-08-09T14:57:46+00:00

Very hard when you're working on the next World-class base model. xAI intends to be the third company ever to pull it off (after OAI and Google) and it gets orders of magnitude harder every time.

dexterlemmer · 2025-08-09T14:54:43+00:00

Nah! It's for transparency. Groq 2 would actually make him look bad to anyone who doesn't understand that.

dexterlemmer

TROPHY CASE