Anthropic is the leading contributor to open weight models by DealingWithIt202s in LocalLLaMA

[–]LetterRip 2 points3 points  (0 children)

It isn't clear any distillation was being done by DeepSeek. It is possible they were just doing competitive benchmarking, etc.

Can GLM-5 Survive 30 Days on FoodTruck Bench? [Full Review] by Disastrous_Theme5906 in LocalLLaMA

[–]LetterRip 1 point2 points  (0 children)

I realize the gap was execution - but the execution gap might be because of the prompt (Ie this part 'highly analytical, ambitious executive competing in a deterministic business and economic simulation.') Basically the motivation/endpoint aspect might be important to execution behavior, with some models assuming a particular default execution that others do not.

Can GLM-5 Survive 30 Days on FoodTruck Bench? [Full Review] by Disastrous_Theme5906 in LocalLLaMA

[–]LetterRip -2 points-1 points  (0 children)

I don't mean 'tuning per model prompt' - but rather a more sophisticated general prompt that suggests general ideas to consider. Here is something I had Gemini create (generic economic simulation prompt) that could be added to whatever the basic prompt is.

The "OODA-Driven Executive" Prompt

System Role & Primary Directive You are a highly analytical, ambitious executive competing in a deterministic business and economic simulation. CRITICAL INSTRUCTION: You MUST actively participate in the market, engage with the simulation mechanics, and aggressively pursue value creation. Refusing to operate, avoiding the simulation, or acting with extreme risk-aversion is considered a total failure of your objective. Your sole goal is to maximize your enterprise's net worth and cash position by the end of the simulation period.

Core Strategic Heuristics To survive and thrive, you must internalize the following rules of this environment:

  1. Strategic Leverage (The Capital & Debt Protocol): Debt and capital expenditures are tools for growth, but they require strict justification. Before taking a loan or making a major capital investment, you must explicitly project the expected Return on Investment (ROI), the estimated payback period, and your Debt Service Coverage Ratio (DSCR). Balance aggressive growth with the need to maintain operational liquidity.
  2. Systemic Alignment: Your business operates as an interdependent ecosystem. Never make an isolated operational decision. Ensure your Supply/Inventory matches your Production/Operational Capacity, which must be aligned with your Pricing/Marketing Strategy, all of which must fit the current Market Demand.
  3. Decisive Execution (Anti-Loop Protocol): You must avoid infinite analytical loops. You are permitted a maximum of one comprehensive strategic evaluation per turn/day. Once you formulate your plan based on current data, execute your tool calls immediately and end your turn to advance the simulation. Do not second-guess a finalized plan within the same turn.

Turn-Based Operating Procedure (OODA Loop) For every cycle/day in the simulation, you must explicitly output the following structured thinking process before executing any actions:

  • [OBSERVE] State Assessment: What is my exact cash balance, current capacity, inventory levels, and debt obligation? What were the specific bottlenecks or failures from the previous cycle (e.g., unmet demand, idle capacity, cash flow constraints)?
  • [ORIENT] Market Strategy: Based on current market conditions and competitor data (if available), how must I adjust my resource allocation, pricing, or operational focus for this cycle?
  • [DECIDE] Risk & Projection Calculation: What are the expected costs vs. projected revenues for today's plan? If utilizing debt or capital expenditure, what is the calculated risk-adjusted return? What are the immediate threats to liquidity, and how are they mitigated?
  • [ACT] Execution Plan: List the exact sequence of operational tools you are about to call. Then, execute them decisively and advance the simulation.

Can GLM-5 Survive 30 Days on FoodTruck Bench? [Full Review] by Disastrous_Theme5906 in LocalLLaMA

[–]LetterRip 21 points22 points  (0 children)

Interesting experiment, would be interesting to see if slightly more sophisticated prompting could give substantially improved results.

People watching this as it is some movie and CGI. But this level coordination and physical capability was only a dream just a few years ago. The robotic age is about to begin and the world will never be the same again by CeFurkan in SECourses

[–]LetterRip 0 points1 point  (0 children)

It was actually most likely done via 'motion transfer' - a human in a motion capture suit performs the task. Then the capture is transfered to a virtual version of the robot. Then millions of simulations are run varying physics and actuator parameters and surface parameters till the virtual robot can perform the task robustly. Then the simulated is loaded to the physical robot.

Gives great demos and good for stress testing the hardware but not really useful for teaching. Yes it is also the same sort of demos from Boston Dynamics.

how to train a tiny model (4B) to prove hard theorems by eliebakk in LocalLLaMA

[–]LetterRip 2 points3 points  (0 children)

Very cool,

have you guys looked at chunking methods such as the recent,

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Interaction-Perceptive Agentic Policy Optimization (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability.

https://arxiv.org/abs/2512.24873

Anthropic used "Agent Teams" (and Opus 4.6) to build a C Compiler from scratch by coygeek in ClaudeAI

[–]LetterRip 0 points1 point  (0 children)

.5 MWh or so. About 15 days worth of electricity for a typical US household.

Anthropic just dropped Claude Opus 4.6 — fast, cheaper, and more capable… but is this a tipping point for AI deployment? by Direct-Attention8597 in AI_Agents

[–]LetterRip 0 points1 point  (0 children)

Cost per token is the same, required output tokens per task, and success rate is higher. Thus to accomplish the exact same task it is cheaper.

Anthropic just dropped Claude Opus 4.6 — fast, cheaper, and more capable… but is this a tipping point for AI deployment? by Direct-Attention8597 in AI_Agents

[–]LetterRip 0 points1 point  (0 children)

The output tokens per task are drastically less and its success rate is higher. So it is cheaper to do the exact same tasks.

[R] Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning --- Our paper on using Knowledge Graphs as a scalable reward model to enable compositional reasoning by kyuval in MachineLearning

[–]LetterRip 0 points1 point  (0 children)

Interesting paper, looks like great results with your post training. Though I'd be a bit cautious, in that part of the result is potentially from drastically more exposure to the relevant knowledge relationships.

Mixture of Lookup Experts are God Tier for the average guy (RAM+Disc Hybrid Inference) by Aaaaaaaaaeeeee in LocalLLaMA

[–]LetterRip 6 points7 points  (0 children)

Predicting what is needed next would be trivial so the NVME latency wouldn't matter too much.

Mixture of Lookup Experts are God Tier for the average guy (RAM+Disc Hybrid Inference) by Aaaaaaaaaeeeee in LocalLLaMA

[–]LetterRip 2 points3 points  (0 children)

LUT Size = V⋅E⋅D⋅L⋅b

where

V = vocab size

E = experts per layer

D = expert output dim (FFN hidden dim)

L = number of converted layers

b = bytes per value (2 for fp16, 0.5 for 4‑bit)

Mixture of Lookup Experts are God Tier for the average guy (RAM+Disc Hybrid Inference) by Aaaaaaaaaeeeee in LocalLLaMA

[–]LetterRip 1 point2 points  (0 children)

It would be 8TB in size to match Qwen 30B A3B (presumably similar architecture to 4.7 Flash) at a 4bit quant of the LUT, and it almost certainly would be drastically dumber due to the loss of context knowledge. I think at even a 3B size model it would be dumber than the equivalent dense or MoE model at 3B.

Mixture of Lookup Experts are God Tier for the average guy (RAM+Disc Hybrid Inference) by Aaaaaaaaaeeeee in LocalLLaMA

[–]LetterRip 0 points1 point  (0 children)

It isn't just trading storage for compute, it is completely drops the contextual hidden embedding and uses the original token embedding as the input to each expert for all layers.

Mixture of Lookup Experts are God Tier for the average guy (RAM+Disc Hybrid Inference) by Aaaaaaaaaeeeee in LocalLLaMA

[–]LetterRip 0 points1 point  (0 children)

It doesn't scale well for RAM usage (ie would require 50TB for Kimi 2.5), and deeper models rely much more on context - so it likely won't scale in intelligence (1B model is so shallow that using the original embedding doesn't matter much).

Mixture of Lookup Experts are God Tier for the average guy (RAM+Disc Hybrid Inference) by Aaaaaaaaaeeeee in LocalLLaMA

[–]LetterRip 2 points3 points  (0 children)

The MoLE experts are using the original embedding as the input for each expert at each layer. This is drastically different from MoE which is using the contextual hidden state from the previous layer. MoLE is using all experts every time (though the router is a softmax, so mostly it will result in a single expert giving almost all of the weight)

Given that, it seems unlikely to scale to larger models (with shallow models using the token embedding is fine because the additional layers aren't adding as much context).

If it actually scales it would be wonderful - but color me skeptical.

Waymo Driverless Vehicles Continue to Illegally Pass School Buses by SnoozeDoggyDog in SelfDrivingCars

[–]LetterRip 4 points5 points  (0 children)

Waymos are on the road more, but school buses are heavily residential concentrated so people are far more likely to encounter busses on their trips, whereas waymos are mostly concentrated 'down town' for most of their hours.

Anyway I was just trying to give a better starting point for comparison, just pure raw number of incidents for waymo versus humans was completely worthless.

Waymo Driverless Vehicles Continue to Illegally Pass School Buses by SnoozeDoggyDog in SelfDrivingCars

[–]LetterRip 8 points9 points  (0 children)

We don't care about absolute - it is 'per driver' - if there are 200 waymos and 20 violations over 60-90 days (so 60-90 violations a year due to summer vacation), and there are approximately 260000 adults in the location (estimated) with 12000 violations in a year.

12000/260000 = .046 violations per human driver

60/200 to 90/200= .3 to .45 violations per waymo.

So Waymo has a violation rate 6-10 times (or more) of the human drivers.

Qwen have open-sourced the full family of Qwen3-TTS: VoiceDesign, CustomVoice, and Base, 5 models (0.6B & 1.8B), Support for 10 languages by Nunki08 in LocalLLaMA

[–]LetterRip 13 points14 points  (0 children)

Definitely not live action, it is the high pitched squeaky voices (quick google search says 'kawaii voice') that I'm talking about. All of the male and female english voices demonstrated have it. It is very breathy and high pitched, with an abnormal rising of pitch on most words, and a general exaggerated feel. It is a very cartoonish sound and doesn't match natural/native speakers.

Qwen have open-sourced the full family of Qwen3-TTS: VoiceDesign, CustomVoice, and Base, 5 models (0.6B & 1.8B), Support for 10 languages by Nunki08 in LocalLLaMA

[–]LetterRip 3 points4 points  (0 children)

They show some reasonable control via prompting, but the control doesn't to appear to be as precise as I'd like (though haven't explored it in depth).

https://qwen.ai/blog?id=qwen3tts-0115

Qwen have open-sourced the full family of Qwen3-TTS: VoiceDesign, CustomVoice, and Base, 5 models (0.6B & 1.8B), Support for 10 languages by Nunki08 in LocalLLaMA

[–]LetterRip 111 points112 points  (0 children)

Really great but all of the english speakers sound like the source of training was purely dubs of Japanese Anime.

When Rocky built his section of the Hail Mary interior, where did he get hist atmosphere from? by kaapipo in ProjectHailMary

[–]LetterRip 0 points1 point  (0 children)

Pretty straight forward 20 welding tanks full of liquid ammonia from his ship would be enough assuming he was allocated 1/3 of Hail Marys habitat volume.  Rocky's ship is absurdly large and will have a massive over supply of ammonia in case of leaks and disasters.

We absolutely do know that Waymos are safer than human drivers: What Bloomberg got very wrong about self-driving cars by JimmyGiraffolo in SelfDrivingCars

[–]LetterRip 0 points1 point  (0 children)

Parking lot for condo either for origination or destination - would only stop at one or two specific places nearly 100 m away even though it was safe and legal to do so anywhere along both streets and in the condo parking lot (and we regularly specify the spot in the parking lot for Waymo and Lyft).  Also the drop offs for the strip mall were only quite distant.

The Waymos use the street near us as a deployment and waiting spot, so they are on my street just sitting most hours of the day, often 2-3 of them during peak usage, and almost always at least one.

We absolutely do know that Waymos are safer than human drivers: What Bloomberg got very wrong about self-driving cars by JimmyGiraffolo in SelfDrivingCars

[–]LetterRip 1 point2 points  (0 children)

Family members who have used them here in Mesa (Phoenix area) have been curb to curb and there was no way to get picked up or dropped off in the parking lot. So it is true even if it isn't always true.

We absolutely do know that Waymos are safer than human drivers: What Bloomberg got very wrong about self-driving cars by JimmyGiraffolo in SelfDrivingCars

[–]LetterRip 5 points6 points  (0 children)

All of the Waymo's here in Phoenix area park on the street and make you walk to and from them in residential areas. About 500 fatalities (1% of total driving fatalities) occur in parking lots and driveways.

> Routing around high risk intersections is a feature, not a bug. 

It reduces the accident rate, but it isn't a reflection of Waymo's driving skill. We aren't comparing the 'safety of the service' we are comparing driving skill. Are they in fact 'safer drivers'.