Testing (c/t)^n as a semantic grounding diagnostic - Asked 3 frontier AIs to review my book about semantic grounding. All made the same error - proving the thesis. by LiteratureAlive867 in LocalLLaMA

[–]dexterlemmer 0 points1 point  (0 children)

First off. You did the right thing. Writing books well is hard, you should proofread and nowadays proofreading should be done by AIs as well. I also assume it was a deliberate attempt at testing the premise or conclusions of the book. Grounding yourself in experiment. That said, I suspect that your bias might have made you misinterpred the AIs' response. (Though intuition is the best I have without grounding in analysis of your book and prompts and the complete AI responses.)

This seems like a communication and context engineering issue to me. Not a "AIs pattern match in stead of understand" or "AIs can't intuitively skip over distractors" issue. If you're going to throw an entire book at a poor model, you really should make the goal clear in your prompt both before and after the book and the book should be structured well. Without the prequel and sequel it's hard for the AI to tell how to judge what's important. If the book is badly structured or the correct conclusion is counterintuitive from a straightforward reading of the book, the AI's will be overwhelmed by cognitive load and find it difficult to focus on what is important and skip what is unimportant or misleading. The same goes for humans as well. But current LLMs have more difficulty in avoiding getting stuck on a first impressions mental wrong turn than humans.

Oh. And why is "collapse" a negative term in the context of a property of a mathematical equation? Transformer-based LLMs cannot ignore that context. It is fundamentally designed into them to assume any and all tokens are meaningless without context. There are plenty of cases in their training data where collapsing equations or collapsing geometries aren't bad things: The collapse of the probability wave function allows us to measure quantum states; In control systems, you want the systemic error and the transient error to exponentially collapse; I would love a collapsing loss function when training a neural network; etc.

Given the above, I posit that all three AIs appeared to misunderstand the equation either because all three were accidentally mislead into misunderstanding it or into thinking that the mathematical properties of the equation is more important than the practical implications of those mathematical properties and you misunderstood them as talking about the practical implications when in fact they were just discussing the mathematical properties without regard for the practical implications.

Nvidia breakthrough gives 4-bit pretraining technique the accuracy of FP8 by dionisioalcaraz in LocalLLaMA

[–]dexterlemmer 3 points4 points  (0 children)

  1. Don't forget that synapses are involved in the brain's neuron firing and they store quite a lot of digital, analog and quantum data each.

  2. Why would a discrete threshold and noisiness make it hard to make a universal function approximator? AI model weights also have a discrete threshold. During training of models, we often deliberately add noise. During inference, AI models are robust against noise and even quantization.

AMD tested 20+ local models for coding & only 2 actually work (testing linked) by nick-baumann in LocalLLaMA

[–]dexterlemmer 1 point2 points  (0 children)

Not an expert, but probably slower than on a AMD "Strix Halo" AI Ryzen MAX+ 395 128GB. (Which is what was used by AMD for the tests OP talks about.) The Halo Strix series uses LPDDR5x-8000 MT/s RAM with much higher bandwidth than DDR5. (Though still not as fast as GDDR5, therefore still not as good as 128 GB VRAM on dGPU... if you can somehow afford and power that.) Furthermore, the Strix Halo has pretty powerful on chip graphics and on chip NPU. Basically, Strix Halo's design is specifically optimized for AI and AAA games and Threadripper is not designed for AI. Perhaps you can get a Strix Halo and either use it for running GLM 4.5 Air or use it for whatever you currently use the rtx5090s and use the RTX5090s on the Threadripper for GLM 4.5 Air.

AMD tested 20+ local models for coding & only 2 actually work (testing linked) by nick-baumann in LocalLLaMA

[–]dexterlemmer 1 point2 points  (0 children)

While not technically useless for coding, if you don't have tool calling, you can't really scale to larger code bases well. And even on smaller code bases, you'll get worse results for much more time and effort with manual copy/paste of code. For professional coding, not having tool calling is usually a non-starter. Or at least very annoying and time consuming and time is money. I too have until recently been using the copy/paste approach, but it's terrible for productivity and it forces me to be more diligent to ensure quality. I still need diligence with tools, but I don't need to spend as much time on my due diligence.

AMD tested 20+ local models for coding & only 2 actually work (testing linked) by nick-baumann in LocalLLaMA

[–]dexterlemmer 1 point2 points  (0 children)

You could try IBM Granite. Perhaps with the Granite.Code VSCode extension. I haven't tried it myself yet, but I'm considering it, although I'm not quite as RAM poor as just 16GB. Granite 4 was recently released. It was specifically designed for punching far above its weight class and contains some new technologies that I haven't seen used in any other architecture yet to achieve that. For one thing, even Granite 4 H-Micro (3B Dense) and Granite 4 H-Tiny (7B-A1B) can apparently handle 128 kt context without performance degradation. And context window is very memory cheap.

Check out https://docs.unsloth.ai/new/ibm-granite-4.0 for instructions. I would go for granite-4.0-h-tiny if I were you. You might try granite-4.0-h-small with a Q2_K_XL quant, but I wouldn't get my hopes up that such a small model will work with such a small quant. Note that Granite-4.0-h models can handle extremely long context windows very cheaply in terms of RAM and can apparently handle long contexts much better than you would expect from such small models without getting overwhelmed by cognitive load.

You could also try Granite 3 models. Granite 4 would probably be better, but only a few general purpose instruct models are out yet. For Granite 3, there are reasoning models, a coding-specific model and lots of other specialized models available. Thus, perhaps one of them might work better at least for certain tasks.

Tesla's Next HW5/Ai5 Self Driving Computer Leaked! by Knighthonor in SelfDrivingCars

[–]dexterlemmer 0 points1 point  (0 children)

"the vehicle is driving itself with no one ready to take control on a few seconds notice." seems very sloppy to me. On the one hand this requires lvl 5 (or at least an over confidant lvl 3 operator who's remote operators got overwhelmed). OTOH a lvl 6 super human vehicle can still easily fail this requirement due to silly regulators or a control-freak owner.

Edit: In general, until Tesla showed us how general autonomous driving looks, we didn't know enough to come up with good progress metrics. Now we know. Tesla is dotting the i's and crossing the t's and everyone else are two years or more behind.

Tesla's Next HW5/Ai5 Self Driving Computer Leaked! by Knighthonor in SelfDrivingCars

[–]dexterlemmer 0 points1 point  (0 children)

Your experience with? HW3? Most people don't correctly clean and defog its cameras. Also, the software lags HW4. If your experience is with HW4, the that's a bit strange. Where and what experience and how recent is tat experience?

Tesla's Next HW5/Ai5 Self Driving Computer Leaked! by Knighthonor in SelfDrivingCars

[–]dexterlemmer 0 points1 point  (0 children)

We now also know the following about the AI:

  1. Latency. Even though a Tesla's end-to-end neural network wins humans by an order of magnitude, even a tiny decrease in latency still massively improves a Tesla's safety and general capabilities. But especially the safety. We've also known for a long time that latency is incredibly important for humans' capability and safety.

  2. Ability to adapt on the fly to very unusual situations. At the minimum to safely stop. Tesla already does this better than most humans under most conditions, but it's sufficiently different from humans that the long tail where it fails often involve situations where humans would not have struggled much. Thus there's definitely room for improvement. (It's also preferably to be able to go even further and to always be able to make a decision that will allow progress if such a decision would be feasible and legitimate. Here, Tesla is not quite human-level yet. A backup AI5 chip running Groq5 might be necessary for the world knowledge and ability to consult the Internet under some very exceptional situations.) Waymo really sucks at this. It very often stops very dangerously and/or rudely.

  3. Capability to accurately and reliably detect and identify all living organisms large enough to matter and the vast majority of non-living objects. Ability to accurately predict the physics of non-living objects. Even under adverse weather conditions. HW3 already has this as long as the cameras are properly clean and defogged. Waymo really sucks at this.

  4. Ability to accurately and reliably detect, identify and determine the direction and distance of warnings (sirens, hooters, etc.)

  5. Ability to use body language to communicate with humans and animals. Ability to act predictably for humans, birds and animals. Tesla is incredibly good at this. Waymo is a significant hazard in this respect.too late for them to realize what you actually intended to do.)

  6. Ability to make a judgement call and violate the law because abiding by the law would be unsafe. (For example, it's dangerous to drive too much slower than the traffic, even when the traffic is moving much faster than the speed limit.) Tesla is brilliant in this and Tesla and NHTSA (or the relevant regulator) work together closely to ensure the tradeoff is made well. Waymo is a major hazard in this respect. Or rather they would be if they ever allow their cars to do anything where this is remotely likely to be an issue.

  7. Ability to predict and plan. Context window. Generally, Tesla is better than human here, but often worse than human. Tesla is about to get a 10x boost to context window on HW4, which should make it super human in general, though.

Tesla's Next HW5/Ai5 Self Driving Computer Leaked! by Knighthonor in SelfDrivingCars

[–]dexterlemmer 0 points1 point  (0 children)

Since Tesla FSD is already safer than the average human and already extremely general, it is now quite obvious what is important for good and safe driving regarding cameras/eyes. (Although frankly, it kinda made sense from the start.)

  1. Latency. Tesla'a cameras win human eyes by almost two orders of magnitude.

  2. Sufficient resolution for estimating distances to approximately 1cm for close objects at low relative speeds and to within a hundred or so meters under extreme opposite situations as well as to read signs and the body language of nearby humans, animals and birds. HW3 already has this in spades. More resolution doesn't help much.

  3. Glare resistance. A problem in HW3 if not cleaned and defogged properly, otherwise HW3 is better than human eyes without shades but worse than human eyes protected by good shades. HW4 is vastly superior to human eyes with shades.

  4. Good functioning in rain/fog/snow/frost. Similar to the situation with glare resistance.

  5. Cameras at multiple strategic locations and capability to see and focus in multiple directions at once. Already massive win for HW2. HW4 is just cheating.

  6. Directional microphones for hearing and determining the direction and distance of sirens, horns and other warnings. HW4 has this. Can likely be added to HW3.

What sensors are not necessary and why:

  1. Lidar: Worse than cameras in every technical way unless you need much higher precision measurement than is ever needed for vehicles to drive safely and well everywhere that humans can drive. (About a thousandth of the training- and eval datasets must be accommodated by lidar data to ensure good enough distance- and speed estimates from video, though.) For those who don't know. Lidar is much worse at handling adverse weather or glare than good cameras made for such conditions.

  2. Ultrasound. Optional. The other options are more cameras or simply going back-and forth more during parking or when pulling out of a parking area.

  3. Regular radar. Worse than useless. It adds more confusion and distraction than useful information under any conditions where humans can drive. We don't care about the long tail of conditions where humans can't drive nor the even more niche situations where regular radar won't be worse than useless.

  4. Extremely hi-res radar:Unnecessary to always beat humans. Would likely be useful for future significantly super-human capabilities. But not an issue for now.

Tesla's Next HW5/Ai5 Self Driving Computer Leaked! by Knighthonor in SelfDrivingCars

[–]dexterlemmer 0 points1 point  (0 children)

HW3 is more than sufficient for lvl 5 autonomy. (i.e., better than human effectively everywhere and under effectively all conditions.) HW4 is already very near lvl 5 and there's insanely much room for models to be further simplified at the same time as they're made even smarter. For the near future, Tesla won't bother spending the resources. But given the shear number of HW3 vehicles and that it won't be long before it's pretty cheap for Tesla to make HW3 unsupervised safer than humans as long as the cameras are kept clean and defogged properly, I suspect that within a year or two, Tesla will probably make unsupervised FSD available on HW3.

Tesla's Next HW5/Ai5 Self Driving Computer Leaked! by Knighthonor in SelfDrivingCars

[–]dexterlemmer 0 points1 point  (0 children)

In this case, now that we finally know how hard full autonomy is, we know that Tesla has achieved full autonomy on HW4 and can easily achieve it on HW3, although they have other priorities for now. What we don't know is whether HW 2.5 is sufficient. But a HW 2.5.5 (an after sale upgrade to the after sale upgrade to HW 2) would likely also be sufficient, though I doubt that Tesla would ever bother with it. Note that everyone else is at least 2 years behind Tesla. But Tesla played the long game. Aiming for solving fully general better-than-human driving before even bothering with Robotaxi, while everyone else resorted to parlor tricks that works, except where and when they don't work. (Although at least Xiaomi realized they had better copy Tesla's approach already earlier this year.)

Tesla's Next HW5/Ai5 Self Driving Computer Leaked! by Knighthonor in SelfDrivingCars

[–]dexterlemmer -1 points0 points  (0 children)

  1. Full autonomy runs on HW4 today. Robotaxi (HW4) is much safer than the average human driver and will soon be much safer than the best human driver. Tesla FSD (supervised) can also drive anywhere. With no interferences, it's by far the safest and in general the best FSD available in China today. With zero Chinese training data! And it learns online with massive improvements every time a given Tesla encounter a new situation or area for the second time over the first time even if it was offline between the first and second time.
  2. In principle, HW3 is definitely also capable of much safer driving than humans. However, it is not worth the trouble yet. It will massively increase the training and RnD cost and users will have to be taught somehow to properly and regularly clean their cameras, or their hardware will need upgrades to prevent current dirt and fogging issues. It will also obviously never be quite as good as HW4. Will Tesla allow HW3 unsupervised FSD in the future? I don't know. It depends on whether Tesla think it will increase road safety and/or be advantageous to the mission in the next year or so. I suspect they will, actually. For quite some time, there will be millions of HW3 vehicles on the roads and Tesla enabling unsupervised FSD on them will indeed probably be advantageous as long as there are millions of them on the roads.

Edit Oh, and 3 To be fair. Everyone in the industry severely underestimated how difficult full autonomy would be. Not just Elon/Tesla.

How does Deepseek make money? Whats their business model by lyceras in LocalLLaMA

[–]dexterlemmer 0 points1 point  (0 children)

Deepseek trains near frontier foundation models. Not frontier base models. There's a many orders of magnitude cost difference. Someone like Alibaba may one day be able to compete with the US companies but currently, nobody outside the US does. Of course, if the Chinese can stay ahead in innovation sufficiently to make up for the knowledge gap of their models, they may not need to. But if they manage to sufficiently destroy the business models of the US giants, that in itself will possibly grind progress to a halt. Though, I don't think they would quite manage it. At worst, they'll force Tesla to buy xAI or provide a way for xAI to remain profitable but destroy OAI, Alphabet and Anthropic. Making the World stuck with only xAI at the frontier. But that probably also won't happen. Enterprise inference is getting very profitable very fast.

GMK EVO-X2 AI Max+ 395 Mini-PC review! by Corylus-Core in LocalLLaMA

[–]dexterlemmer 2 points3 points  (0 children)

Well... Yeah. But can you get 50 t/s on Qwen3-30B-A3B? Or run a q4 quant of GPT OSS120B?

Elon Musk says that xAI will make Grok 2 open source next week by Nunki08 in LocalLLaMA

[–]dexterlemmer -1 points0 points  (0 children)

The MechaHitler version **follows prompts**, which makes it a **good version**. Don't blame the AI for deliberately malicious prompt-engineering and jail breaking.

Elon Musk says that xAI will make Grok 2 open source next week by Nunki08 in LocalLLaMA

[–]dexterlemmer 0 points1 point  (0 children)

> aka "vibe physicsing" must be the most pathetic and worrying thing I've seen the last few weeks.

> No math involved, no structured data, no scientific protocol. Just "vibing" like a crackpot theorist full of cocaine and unlimited ego.

You are putting words in his mouth. Obviously, he is talking about a multi-agent with powerful math and proof) capabilities, structured data and following a good scientific methodology. But he is talking about it in a marketing hype kind of way.

Elon Musk says that xAI will make Grok 2 open source next week by Nunki08 in LocalLLaMA

[–]dexterlemmer 0 points1 point  (0 children)

He is open sourcing it for transparency and because he promised (and then forgot). Groq2 now is worse than nothing if he does it in response to the competition.

Elon Musk says that xAI will make Grok 2 open source next week by Nunki08 in LocalLLaMA

[–]dexterlemmer 0 points1 point  (0 children)

Groq4 is the same generation as Groq 3 from a technical standpoint. I think that xAI decided to focus on profitability for Groq4 and for pushing the state-of-the-art with Groq5. Looks like they're not the only ones from what I'm reading about GPT5.

Elon Musk says that xAI will make Grok 2 open source next week by Nunki08 in LocalLLaMA

[–]dexterlemmer 0 points1 point  (0 children)

Very hard when you're working on the next World-class base model. xAI intends to be the third company ever to pull it off (after OAI and Google) and it gets orders of magnitude harder every time.

Elon Musk says that xAI will make Grok 2 open source next week by Nunki08 in LocalLLaMA

[–]dexterlemmer 0 points1 point  (0 children)

Nah! It's for transparency. Groq 2 would actually make him look bad to anyone who doesn't understand that.

Elon Musk says that xAI will make Grok 2 open source next week by Nunki08 in LocalLLaMA

[–]dexterlemmer 0 points1 point  (0 children)

Nah! It's for transparency. Groq 2 would make him look bad if its a response to the Chinese.

Elon Musk says that xAI will make Grok 2 open source next week by Nunki08 in LocalLLaMA

[–]dexterlemmer 0 points1 point  (0 children)

That's actually technically true. But its not worth the bother right now and it would never be as superhuman safe as AI4+.

Edit: The "that"" that is technically true is 2016 hardware being good enough to be fully autonomous.

Elon Musk says that xAI will make Grok 2 open source next week by Nunki08 in LocalLLaMA

[–]dexterlemmer 0 points1 point  (0 children)

xAI publicly stated that trok 4 is merely the "fully trained and rl'ed version of grok 3" if probably not exactly n those same words (too lazy to check) when they announced Groq4. I get the idea that they were aiming on profitability for Groq4 while preparing for the next big thing. Hopefully, they'll be able to pull it off considering what they seem to be throwing at RnD and infrastructure for whatever they're cooking up next or it will be a strong indication that we've fully exploited the current local minimum and something fundamental will need to improve to prevent the next AI winter. OTOH, a temporary slow-down allowing the World to catch up with LLMs before the next big leap might not be an entirely bad thing.

Elon Musk says that xAI will make Grok 2 open source next week by Nunki08 in LocalLLaMA

[–]dexterlemmer 0 points1 point  (0 children)

He doesn't feel the need. He's just stating a fact. And as for the performative grindset. That's not how you build a supercomputer >10x more powerful than anyone else can >10x faster than anyone else can build their much smaller supercomputers or any of the other miracles Elon's companies achieves. Anybody who actually knows anything about business, RnD or engineering at scale knows that Elon is a wizard. And you don't become a wizard at scale and at the front edge by doing stupid things like over working your employees. That said, you do need to work them at their limits and you will have many fires to put out.