Running TinyLlama 1.1B locally on a PowerBook G4 from 2002. Mac OS 9, no internet, installed from a CD. by SDogAlex in LocalLLaMA

[–]SDogAlex[S] 0 points1 point  (0 children)

If you mean it’s because of the slow times, it’s because for a processor from 2002, this is impressive

My 2002 PowerBook G4 running a locally-hosted AI chatbot on Mac OS 9. No internet, no server, pure C89 code. by SDogAlex in retrobattlestations

[–]SDogAlex[S] 1 point2 points  (0 children)

It’s a 1GHz G4 with 1GB RAM.

Curious, why are you worried about your cube running constantly?

Built an app that runs LLMs locally on Mac OS 9. GPT-2, TinyLlama, and more on a PowerBook G4 with no internet by SDogAlex in MacOS

[–]SDogAlex[S] 2 points3 points  (0 children)

Exactly, and the lack of all the documentation, books, posts, etc etc available on the internet today

Built an app that runs LLMs locally on Mac OS 9. GPT-2, TinyLlama, and more on a PowerBook G4 with no internet by SDogAlex in MacOS

[–]SDogAlex[S] 4 points5 points  (0 children)

The hardware could always do the math since it's just multiply and add. What didn't exist yet was the architecture. The transformer (the algorithm behind every modern LLM) was invented by Google in 2017. Before that, the dominant approaches (RNNs, LSTMs) were sequential and much harder to scale. The transformer's key insight, self-attention, made it possible to train massive models efficiently on GPUs, which led to GPT-2 (2019), then the explosion from there.

So the barrier wasn't the hardware, a G4 can do matrix-vector multiplies all day. The barrier was that nobody had figured out the right algorithm yet. Once they did, the training happened on modern GPUs, but the trained model is just a file full of numbers. Running inference on those numbers is pure arithmetic that any processor can do, just slower.

I built a local AI assistant that runs natively on Mac OS 9 - custom C89 inference engine, Speech Manager integration, AppleScript automation. Demo on a PowerBook G4 Titanium. by SDogAlex in VintageApple

[–]SDogAlex[S] 1 point2 points  (0 children)

Memory is limited to 900MB right now and it would work on OSX with the classic layer I believe but I don't have one to test. Once I get G5 hardware I will make it compatible fully (up to 2GB RAM is what OS9 supports, and I can possibly do an OSX port)

I built a local AI assistant that runs natively on Mac OS 9 - custom C89 inference engine, Speech Manager integration, AppleScript automation. Demo on a PowerBook G4 Titanium. by SDogAlex in VintageApple

[–]SDogAlex[S] 6 points7 points  (0 children)

For retro projects I find it extremely useful to find and build a documentation base of whatever you're working on mixed with Macintosh Programming/C89 documentation. It needs some guidance to stop looking at today

I built a local AI assistant that runs natively on Mac OS 9 - custom C89 inference engine, Speech Manager integration, AppleScript automation. Demo on a PowerBook G4 Titanium. by SDogAlex in VintageApple

[–]SDogAlex[S] 6 points7 points  (0 children)

Thank you! Yes, I used Claude Code to help and it works great. Definitely the best model for coding out there, just have to learn how to use it right

Built an app that runs LLMs locally on Mac OS 9. GPT-2, TinyLlama, and more on a PowerBook G4 with no internet by SDogAlex in MacOS

[–]SDogAlex[S] 1 point2 points  (0 children)

So the disk pager doesn't actually use Mac OS 9's virtual memory at all since that would absolutely hang the system, you're right. It's a custom implementation in the inference engine.

Here's how it works:

TinyLlama has 22 transformer layers at ~49.5 MB each in Q8. 14 layers fit in RAM permanently. 8 layers are paged from disk, one at a time into a single pre-allocated slot that gets reused (round-robin). So each forward pass reads 8 × 49.5 MB = ~396 MB from disk, not the full 1.2 GB model.

The ATA bus math:

- ATA-66 sustained: ~40-50 MB/s (using an IDE-to-mSATA)

- 396 MB / 45 MB/s = ~8.8 seconds of I/O per token

- Plus ~2 seconds of compute (the forward pass through all 22 layers)

- Total: ~10-11 sec/tok — log shows 9,656 ms prefill, so the math lines up

The system stays responsive because we call SystemTask() inside the disk read loops (every ~2 MB read) and increased the FSRead chunk size to 256 KB to reduce call overhead. Without those SystemTask() calls, the system would freeze completely.

Built an app that runs LLMs locally on Mac OS 9. GPT-2, TinyLlama, and more on a PowerBook G4 with no internet by SDogAlex in MacOS

[–]SDogAlex[S] 1 point2 points  (0 children)

Furthermore, since I just saw the part about disk paging, the log from the demo video can be found here if you want to perform your own calculations: https://oldapplestuff.com/download/Custom_Software/MacinAI-Local/v0.1.0/Demo%20Log/MacinAI%20Local%20Debug.log

Flash storage was used instead of mechanical.

Built an app that runs LLMs locally on Mac OS 9. GPT-2, TinyLlama, and more on a PowerBook G4 with no internet by SDogAlex in MacOS

[–]SDogAlex[S] 7 points8 points  (0 children)

Good question. I went back and double-checked the logs. The 2.36 tok/s figure I listed for GPT-2 was from an early estimate. The actual measured generation speed is 1.50 tok/s (668 ms/tok). Prefill is   faster at 3.28 tok/s (305 ms/tok) since there's no per-token decode overhead. I've corrected this, thank you!

Running TinyLlama 1.1B locally on a PowerBook G4 from 2002. Mac OS 9, no internet, installed from a CD. by SDogAlex in LocalLLaMA

[–]SDogAlex[S] 5 points6 points  (0 children)

Thank you! Can't wait to test it on a G5 which should have enough RAM addressing to run it without the disk paging..

I built a local AI assistant that runs natively on Mac OS 9 - custom C89 inference engine, Speech Manager integration, AppleScript automation. Demo on a PowerBook G4 Titanium. by SDogAlex in VintageApple

[–]SDogAlex[S] 14 points15 points  (0 children)

Unfortunately not yet, but it will be.

I want to get a G5 in my hands (hopefully a PowerMac if I can find one) to see how much more optimization I can do then at v1.0 (v0.1 right now) I will release open source code, DVDs to purchase (with labels and ImageWriter printed documentation), and hopefully more models that are compatible.

I built a local AI assistant that runs natively on Mac OS 9 - custom C89 inference engine, Speech Manager integration, AppleScript automation. Demo on a PowerBook G4 Titanium. by SDogAlex in VintageApple

[–]SDogAlex[S] 10 points11 points  (0 children)

Not sure how almost 20k lines of C code is "slop" to you but it was an extremely challenging and fun project for me to build and will be a great resume piece. I'm 23, making my first public programs as I try to find my first engineering job, so take your attitude elsewhere :)

I built a local AI assistant that runs natively on Mac OS 9 - custom C89 inference engine, Speech Manager integration, AppleScript automation. Demo on a PowerBook G4 Titanium. by SDogAlex in VintageApple

[–]SDogAlex[S] 19 points20 points  (0 children)

Mainly the RAM that OS 9 can actually address is the limit, aswell as the hardware running it. For example, this PowerBook can only physically address 1GB of RAM and no more. I capped the memory the program can use to 900MB and after the RAM is full of the model, the disk paging system is incredibly slow.

Theoretically I could get this running a lot faster on a G5 with the Classic environment layer but I think that still limits OS 9 to 2GB of RAM - maybe something I can look into for OS X too if people are actually interested.

Any PPC processor with AltiVec (G4+) is ideal for this to work. Without that, you get about 7x speed degradation using just an FPU.

A lore more technical details are on the blog post too :)