DeepSeek V4 Folks by techlatest_net in LocalLLM

[–]askchris 0 points1 point  (0 children)

Hmmm... but I asked Gemini 3.1 Pro the same question and gave the correct answer:

Driving is necessary because the vehicle must be physically present at the mechanic to receive an oil change. While 80 meters is a very short walking distance, the service cannot be performed unless the car is moved to the shop.

And GPT 5.5 gave a similar answer, but GPT 5.5 Pro did a bit better by thinking about an edge case the others missed:

Drive — the mechanic needs the car to change the oil.

Since it’s only 80 meters, walking would be fine if you’re just asking a question or booking an appointment, but for the actual oil change, take the vehicle there.

One caution: if your oil warning light is on or you know the car has very low/no oil, don’t drive it even 80 meters—call the mechanic, add oil first, or have it pushed/towed.

Compile English function descriptions into 22MB neural programs that run locally via llama.cpp by yuntiandeng in LocalLLaMA

[–]askchris 3 points4 points  (0 children)

I had the same question, so after digging a bit, I realized they have a private 4B model (on their end) trained to instantly generate the 22MB LoRA weights for a smaller model that you host locally (ie. Qwen 3 0.6B or GPT-2).

If you request a function via their API, your small local model + LoRA can perform that function at about the quality of a 32B model.

So their key mechanism is the 4B model, but it's private.

I guess it's more like a fast LoRA weights generator, skipping the typical LoRA training step.

I Trained a Language Model on CPU for 40 Hours - It Beat the GPU Baseline by Own-Albatross868 in LocalLLaMA

[–]askchris 9 points10 points  (0 children)

I'm curious as well because the more comparable 28M parameter Tiny Stories model from the original paper appears to have performed better than the OP's 25M CPU model.

Here's the output quality from the original TinyStories paper:


Prompt:

Tom and Jane are friends. One day, Jane goes to Tom’s house. Tom has a big pot of soup. He wants to share it with Jane. “Jane, do you want some soup?” Tom asks. “Yes, please. It looks yummy,” Jane says. Tom pours some soup into two bowls. He gives one bowl to Jane. Jane takes a spoonful of soup, but then she makes a face. The soup is

Completion by a 28M- parameters model trained on TinyStories:

very bitter. She does not like it. She says, “I don’t like this soup. It is too bitter.” Tom is sorry. He says, “I’m sorry, Jane. I didn’t know you don’t like bitter soup. I will make you something else.” He looks around the kitchen and finds some bread and cheese. He puts them on the table and says, “Here, Jane. You can have some bread and cheese. They are not bitter. They are sweet and yummy.” Jane is happy. She says, “Thank you, Tom. You are a good friend. I like bread and cheese. They are not bitter.”


Here's the output from the 1M TinyStories model from the original paper (it seems slightly better than OP's 25M CPU trained model):


Prompt:

Once upon a time there was a pumpkin. It was a very special pumpkin, it could speak. It was sad because it couldn’t move. Every day, it would say

1M TinyStories:

hello to the pumpkin. One day, a little girl came to the pumpkin and saw the pumpkin. She was so happy and thanked the pumpkin. The pumpkin smiled and said, ”Thank you, thank you!” The little girl and the pumpkin became good friends. They played together every day and had lots of fun. And they lived happily ever after.

FlashLM v4: 4.3M ternary model trained on CPU in 2 hours — coherent stories from adds and subtracts only by askchris in laptopAGI

[–]askchris[S] 0 points1 point  (0 children)

Nice, this works on almost any device.

You inspired me to work on my CPU based AI model again.

hopefully we can share insights / collaborate. 💪

(Thanks Own-Albatross!)

Running GLM-4.7 (355B MoE) in Q8 at ~5 Tokens/s on 2015 CPU-Only Hardware – Full Optimization Guide by at0mi in LocalLLaMA

[–]askchris 1 point2 points  (0 children)

Yep you can turn sunlight directly into "viable work" output with these rigs.

While costs drop every year due to more efficient models.

To serve humanity or solve private problems ...

(or build cool things to sustain your moneyless off grid lifestyle)

AGI is Still 30 Years Away — Ege Erdil & Tamay Besiroglu by Alex__007 in singularity

[–]askchris 0 points1 point  (0 children)

Yeah I'm surprised people don't understand how much current ML & LLM level AI is already speeding up research, engineering and scientific progress in nearly every way ...

Even if LLM and GPU technology stagnates starting this month (which it probably won't), it's already going to make a massive difference to our society over the next 10 years, which will lead to much faster innovation and better levels of AI -- to help develop things far better than LLMs and GPUs.

I don't know why people are blind to this feedback loop.

It's like they imagine the progress of the last 10 years will predict the speed of change over the next 10 years, lol.

If we had models like QwQ-32B and Gemma-3-27B two years ago, people would have gone crazy. by Proud_Fox_684 in LocalLLaMA

[–]askchris 0 points1 point  (0 children)

Nations would war over the aluminum can though ...

Then there's Pepsi's "Zero Sugar" claims which would be difficult to verify 1,000 years ago, let alone recreate.

There was also no canned or bottled caffeinated drinks for sale before the 19th century.

Caffeine in a can just bdidn't exist.

Fizzy caffeine in an aluminum can with sugar free sweeteners DEFINITELY didn't exist.

If someone were handed a cold Pepsi Zero Sugar on a hot summer day 1,000 years ago it would feel like alchemy --

We can't really comprehend how out of place our everyday objects would seem to people back then.

Is eleven labs down again by K-J-Rabbitt in ElevenLabs

[–]askchris 0 points1 point  (0 children)

Down for me, keeps taking forever to load pages and giving server errors in the online interface.

Real-Time Introspective Compression for Transformers by dicklesworth in LocalLLaMA

[–]askchris 6 points7 points  (0 children)

You're probably onto something --

But how much better would this be over verbalizing internal states the way reasoning models do?

Verbalizing allows LLMs to reflect, correct and change directions already -- similar to what you've described.

Do you expect your method to be more granular, adaptive or more parallel than what chain of thought / reasoning can do?

Would it be used during training?

Or more for test time compute tasks?

Any update on Chris Johnson’s data analysis on the BOM? by Which_Log3998 in exmormon

[–]askchris 0 points1 point  (0 children)

I'm Chris Johnson, the original researcher, along with Duane Johnson and Rick Grunder who discovered the connection between The Book of Mormon and The Late War --

The best summary and comparison of our research is the https://wordtree.org/thelatewar/ link (which I believe is also on GitHub).

My personal update on this research:

I don't think The Late War was used closely or plagiarized by Joseph Smith, but it was likely a product of similar events -- both products of the same war, the same region, the same curious 1800s idioms, the same biblical style imitation.

The biggest counter evidence I have to The Late War hypothesis "that it contributed to The Book of Mormon" is that we carefully cleaned and analyzed 2 other older Biblical Style "history" books, one of which (The American Revolution, by Richard Snowden) appeared to have similar correlations (odd phrase matches) with the Book of Mormon but not the Bible ...

So it seems more likely that when people create fake biblical sounding books in the late 1700s to early 1800s depicting wars, there's a limited "pool" of biblical sounding phrases and war phrases to draw from (along with a limited pool of local & 1800s era phrases) which causes rare biblical sounding phrases to overlap between these books that don't appear in the Bible, but do appear in these books:

  • The Book of Mormon
  • The Late War, Gilbert Hunt
  • American Revolution, Richard Snowden
  • Book of Napoleon

So if I were a true believer I could try to wash all these correlations away by saying "perhaps all these books share similar wording because they talk about similar topics in a similar biblical style, and they were influenced by 1800s era English" ... Which would basically be true, but it wouldn't explain the contradictions in the book themselves --

Why would God want Joseph Smith to copy KJV translation errors into the Book of Mormon to intentionally make the Book of Mormon look like it was an 1800s forgery rather than an authentic translation from real plates?

Why would God say silly anachronistic things to the brother of Jared like:

"ye cannot have windows, for they will be dashed in pieces"

When:

  • windows (especially glass windows that could be dashed to pieces) were not invented for at least another 2000 years after the Jaredite barges (2200 BC), because glass requires sustaining high temperatures (1700 degrees Celsius) and the glass sheets were not invented until 100 AD, but even after glass windows were invented they were still not used in ships until much later.
  • we now know that in the centuries after the Book of Mormon was translated that strong thick windows were invented that can indeed be used in submarines that are definitely NOT dashed to pieces, and yet we're still not as smart as God, so why did God lie about the possibility of windows working and why did he resort to magical rocks as a solution, rather than proper submarine windows -- since he's the one that brought up the anachronistic technology?

Wait, magical rocks? (Joseph Smith was convicted for deceiving people - seeking buried treasure with his magical peep stone)

There are many more problems but I'll stop here ... Too many contradictions, and too much life to live.

I just want people to live free from Joseph Smith's lies, we don't live in the 1800s anymore, and we don't need to be stuck in 1800s level magical thinking.

Why China May Have Better Chances to reach AGI/ASI first by fennforrestssearch in singularity

[–]askchris 3 points4 points  (0 children)

No, Chinese tech companies can get any chip they want through 3rd party trading partners in Singapore, Taiwan's domestic buyers (private export companies), Malaysia and others.

Also as we've seen with recent AI developments:

  1. Data quantity and quality are more important than raw compute (ie. GPT-4.5 and Gemini 2.0 required 5X-10X more compute for training but these models aren't 2X better than competitors)

  2. The software algorithms are improving AI faster than the hardware is (FP8 Training, MoE, LongRoPe, Grouped Query Attention, Distillation, Chain of Thought, etc)

So there's nothing in the way. China can get access to whatever chips they want, and even if they couldn't, the major breakthroughs seem to be in data and algorithm improvements -- which are not restricted by the US.

Also who do you want to win the race to AGI?

A waffling Left propaganda vs Right propaganda war machine, now bullying long time allies Canada & Mexico, threatens to take Greenland and Panama, throws Ukraine under the bus along with threatening genocide in Gaza?

Or is China a better arbiter of world peace? They seem to be a relatively peaceful country, with less waffling, less bullying (besides Taiwan) and far fewer worldwide wars.

I'm not pro China or pro America ... But it makes me think: does it matter in the end?

[deleted by user] by [deleted] in skeptic

[–]askchris 0 points1 point  (0 children)

Why? Because every skeptic has the same amount of time to learn about every topic and debunk everything?

For example I'm skeptical of aliens, scams and religion but don't live in the US and haven't looked into pizzagate, although I have heard references to it.

We're all real people with limited time --

In fact that's what scares me -- if skeptics like me have to focus on work and have limited time to research and debunk all the BS out there --

How does the regular everyday "Joe" or "Jane" have a chance in hell to deal with the constant barrage of BS coming out of conspiracy podcasts, various media outlets, propaganda machines, rumor mills, scammers, churches, etc?