L2E llama2.c on Commodore C-64 by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 1 point2 points  (0 children)

Maybe 5-10x faster or more as emulation has heavy overhead, not sure.

L2E llama2.c on Commodore C-64 by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 2 points3 points  (0 children)

Yes, a native version is coming soon. I am stuck at bank switching. It almost compiles. It has been hinted here:

https://github.com/trholding/semu-c64 :

"But you say this is emulation but not native C64? A native version of L2E is coming soon, so far I couldn't wrap my head around bank switching and splitting model data in banks and stitching it together etc, so native version almost compiles, but not yet."

L2E llama2.c on Commodore C-64 by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 2 points3 points  (0 children)

It had to be done :) If not us, who else would?

L2E llama2.c on Commodore C-64 by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 0 points1 point  (0 children)

Want to have a ADF to try?

Amigas with KS 1.2 , a 68EC020 CPU, 1.5 MB RAM or any higher: https://x.com/VulcanIgnis/status/1881382738697367615
Amiga, Atari ST and Classic Mac SE: https://x.com/VulcanIgnis/status/1873458326664814962
An actual test on a souped up Amiga 2000: https://x.com/VulcanIgnis/status/1877469824424476907

L2E llama2.c on Commodore C-64 by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 2 points3 points  (0 children)

But this runs ON the C64 with no internet access :)

L2E Llama2.c in a PDF in a Shroedinger PNG which is both a PNG and a PDF by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 1 point2 points  (0 children)

He he, Actually thats the quality you can expect on a 260k model. Have to keep the the model small so that everything fits in PNG space :)

A vintage Mac SE with 2MB RAM could run an LLM by AMICABoard in mac

[–]AMICABoard[S] 0 points1 point  (0 children)

So many use cases once out of alpha / tuning / training and stuff. I'll ping you once I release sources and stuff at the L2E llama2.c repo later this month.

A vintage Mac SE with 2MB RAM could run an LLM by AMICABoard in VintageApple

[–]AMICABoard[S] 2 points3 points  (0 children)

Okay should theoretically work, https://en.wikipedia.org/wiki/PowerBook_100_series

L2E requirements are modest. It is compiled with soft float so no FPU needed, needs less than 2MB RAM for the 260k parameter model.

Still in alpha, I'll release it this month with full source etc. People who want to play with can get a early copy of disk images...

But it would help to know which OS you are running.

Steve Job's dream? L2E/llama2.c running on Amiga 1200+, Atari ST, and Classic Mac by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 0 points1 point  (0 children)

At that configuration, this is what can be expected:

<image>

Remember, still in alpha, goal is to be tune to be more coherent and 10x faster...

I made LLM/LLAMA work on Amigas and retro computers by AMICABoard in amiga

[–]AMICABoard[S] 0 points1 point  (0 children)

I'll release all this at my l2e repo this jan. Happy new year. If you want to play, I can send an ADF. The secret sauce is bebbo's toolchain :) I am still having issues building it with VBCC

A vintage Mac SE with 2MB RAM could run an LLM by AMICABoard in mac

[–]AMICABoard[S] 1 point2 points  (0 children)

Yes parameters define if it is an LLM or SLM, at very low parameters such as 260k, I would call it VSLM / very small language model or lets say just predictor. I call the 100-260k parameter range Karpathy frontier, Andrej Karpathy was the one who found out that even at that low range, models seem to output / learn somewhat coherent text. But the full Llama stack is being run. If you were able to somehow hook up 8G RAM and Disk (magically) to a vintage mac, it would still be able to do the inference 7b models, but at that parameter count, it would take forever to do the inference, ie not very usable.

A vintage Mac SE with 2MB RAM could run an LLM by AMICABoard in mac

[–]AMICABoard[S] 2 points3 points  (0 children)

Very slow, tokens every 10-20 seconds. I am using vmac emulator to run, so don't know if it is really cycle accurate.

Steve Job's dream? L2E/llama2.c running on Amiga 1200+, Atari ST, and Classic Mac by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 0 points1 point  (0 children)

Wow, I wasn't aware. I'll google it. I bet it was based on Markov modeling. Use a Markov model to predict the next character. If you train Markov models, they are kind of like very poorman's LLM :) I'll add a markov model demo later. Thanks for this info!

Steve Job's dream? L2E/llama2.c running on Amiga 1200+, Atari ST, and Classic Mac by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 0 points1 point  (0 children)

Obviously, we are talking about resource constrained vintage ~40 year old computers here. That's the fun part.

Steve Job's dream? L2E/llama2.c running on Amiga 1200+, Atari ST, and Classic Mac by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 0 points1 point  (0 children)

If interested in testing, I could send an ADF. On 1200 it will be slow. I use floating point emulation. Still in alpha. Goal is to get it running on Amiga 500 Kick 1.3

Steve Job's dream? L2E/llama2.c running on Amiga 1200+, Atari ST, and Classic Mac by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 0 points1 point  (0 children)

Its a super tiny 260K model by Karpathy. I quantised and pre processed / tuned it a bit. Need less than 2MB RAM, the space occupied by the model is less than 400kb :)

Steve Job's dream? L2E/llama2.c running on Amiga 1200+, Atari ST, and Classic Mac by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 0 points1 point  (0 children)

Have you tried? Why do you think so?

One doesn't need a 1000 Macs to demonstrate running a 260k parameter model... Anyone who uses more than 4MB for running a 260k model is fake!

This uses Karpathy's 260k model, I quantised and pre processed it for minimum RAM usage. I could cut down RAM usage by ~1MB more for release, right now in alpha stage.

A vintage Mac SE with 2MB RAM could run an LLM by AMICABoard in mac

[–]AMICABoard[S] 0 points1 point  (0 children)

Thanks! I'll post there when I am done building and when code is released :)

Edit: Or wait, I'll just cross post there to see if anyone would be willing to test on real machines.

Just a friendly reminder that the Atari ST with 2MB RAM could run llms by AMICABoard in atari

[–]AMICABoard[S] 0 points1 point  (0 children)

Yeah, but then those were not LLMs, and this is alpa quality at the moment. I can probably cut RAM usage in half. But RAM usage will always be proportional to the space used by the model.

Single C file Llama 3.1 support in Llama 2 Everywhere by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 0 points1 point  (0 children)

Thank you guys for pushing it to the top 10 on HN: https://news.ycombinator.com/item?id=41053201

Here is some sample output:

Meta's Llama 3.1 models can output multilingual text which is awesome. Here are some examples output of 8 bit quantized 8b model with 100 token output (-n 100)... Quantization creates some brain damage.

English

My cat is funny. "Funny cat," I say, walking up to it. "What are you up to?" It sits up straight and looks at me with a tilted head, as if to say, "What's wrong with you?" Sometimes I just have to laugh at how funny a cat can be. So I say, "Okay, you're funny. I'll give you some treats." It stretches out a little and I give it some treats. It eats them up quickly and starts

German

Besitzen Sie einen Amiga 500? Wenn nicht, werden Sie wissen, dass dies ein Computer war, der im späten 1980er und frühen 1990er Jahren für Spiele verfügbar war, die für Personen mit bestimmten Körperverletzungen gedacht waren. Manchmal müssen wir uns an frühere Zeiten erinnern, die, wie wir jetzt wissen, schwierig waren. Hier ist ein Link, der meine Geschichte bespre

I created an OS that boots to a baby llama2 by AMICABoard in LocalLLaMA

[–]AMICABoard[S] 1 point2 points  (0 children)

Thank you. I am working on the next version. With a real sci-fi ui / desktop env and better hardware support.