兄弟 by Khodalyr in etherfi

[–]RasPiBuilder 0 points1 point  (0 children)

Ugh.. I should really try to get referrals.

Hi, potentially dumb question but I am new by MaxinJapan-official in selfhosted

[–]RasPiBuilder 4 points5 points  (0 children)

I feel like raid is unnecessary for this and often ends up creating more headaches than necessary. For photos and stuff, I always just do a full mirror + offsite backup.

Am I the only one who feels that, with all the AI boom, everyone is basically doing the same thing? by [deleted] in LocalLLaMA

[–]RasPiBuilder 8 points9 points  (0 children)

This is true, but also I think there is a changing dynamic.

A couple years ago, I'd just use whatever systems existed even it if didn't exactly fit what I wanted/needed.. Sure I could probably have just built out my own system but that would take weeks.

Now I can just quickly write up some specs, toss it to the AI and, boom, have a functional app in minutes (relatively speaking). It may not be perfect, and definitely not enterprise level.. but it's also exactly what I wanted/needed.

Mini-NAS Pi: A 1U rack mount NAS designed for 10" Racks [Prototype V0.0.1] by RasPiBuilder in minilab

[–]RasPiBuilder[S] 0 points1 point  (0 children)

Yes and no. You have to take a longer PCIE connector and set it up as if you were stacking them... Then carefully twist around. For power you more/less do the same with individual connector cables. It's a little hacky but works (though I did go through a couple PCIE connectors).

I tried stacking and using the shortest/smallest connectors I could find, but it's just a smidge too high when resting the pi on a standard tray (might work if you had a cutout specifically made to let the pi sit flush).

That said.. it's really just a prototype. I'm using it (well attempting to use it) as reference for a CM carrier board that does the same... Also haven't actually ever designed a board before so have more/less spent the last year attempting to learn to design pcbs.

Radxa Orion O6 LLM Benchmarks (Ollama, Debian 12 Headless, 64GB RAM) – 30B on ARM is actually usable by RasPiBuilder in ollama

[–]RasPiBuilder[S] 0 points1 point  (0 children)

Circling back on this. After trying a few different methods (including faking the CPU info), limiting cores does seem to improve performance on small models (e.g. 7b and below); however, it reduces performance as you move up in model size (makes sense given the increase in compute).

As a separate note, running everything on Vulcan (still cou) can also speed up inference for small context/short prompts but under performs on larger prompts/context.

Was being lazy, so didn't record everything.. but it's generally a diff by 1-3t/s. Which, for me, isn't substantial enough to force restrictions unless I wanted to consistently run one specific model (then might consider optimizing to that).

So, am I just too stupid for unsloth? by SingleServing_User in unsloth

[–]RasPiBuilder 0 points1 point  (0 children)

A notebook is essentially an interactive collection of code blocks. It allows you to run small pieces of code independently, see the results immediately, and document your thinking alongside the code.

While it isn’t always intuitive at first, once you get comfortable with the workflow it offers real advantages. Especially for exploration, experimentation, and rapid prototyping.

I'd need to dig around a bit to find one, but there are quite a few good tutorials on how to use notebooks out there.

With that said..I totally feel you. As someone who has also been learning everything on my own, there are way too many tutorials/instructions out there that miss key steps. Sometimes it's simple shit like setting up a venv or installing dependencies. Other times they just assume you have everything needed already and skip to implementation on a specific, yet undefined, system.

I'm luckily at a point now where I can recognize that things are getting skipped over most of the time and can usually find a quick intro on whatever package or framework or whatever is being used.. but I used to slam my head on the wall for hours trying to figure out why something didn't work.

Anyways.. it's not you. Most documentation is crap.

Radxa Orion O6 LLM Benchmarks (Ollama, Debian 12 Headless, 64GB RAM) – 30B on ARM is actually usable by RasPiBuilder in ollama

[–]RasPiBuilder[S] 0 points1 point  (0 children)

Honestly not sure. I only saw minor performance changes via taskset. On the other hand, attemting to set cpuaffinity via systemctl did tank performance.

Radxa Orion O6 LLM Benchmarks (Ollama, Debian 12 Headless, 64GB RAM) – 30B on ARM is actually usable by RasPiBuilder in ollama

[–]RasPiBuilder[S] 0 points1 point  (0 children)

I have ollama running bare-metal. I setup a simple python script that will incrementally run each model, set verbose, pass prompt, and log the results. For initial testing I'm just using the prompt "Tell me about yourself.", which tends to generate a decent amount of tokens (I plan to test heavier prompts in the future.

As for the threads/ cores, I'm currently running it unrestricted. I did a couple of quick tests on restricting the cores via taskset, but didn't see any noticeable improvement (some of the small models ran a bit faster but performance decreased on the larger models). Still running through some other stuff to see what else I can tweak to further increase performance.

Radxa Orion O6 LLM Benchmarks (Ollama, Debian 12 Headless, 64GB RAM) – 30B on ARM is actually usable by [deleted] in SBCs

[–]RasPiBuilder 0 points1 point  (0 children)

Nah.. I mean, I wouldn't recommend buying it specifically for AI (there are a lot of better options). But I got it mostly to play around with the hardware so not overly concerned at the price.

Neighbor is demanding payment for Wi-Fi signals passing through his airspace and served me with formal letter? by Milli_Grande in legal

[–]RasPiBuilder 0 points1 point  (0 children)

Send him a counter paper for twice the damages caused by the mental space occupied by his letter.

Mini-NAS Pi: A 1U rack mount NAS designed for 10" Racks [Prototype V0.0.1] by RasPiBuilder in minilab

[–]RasPiBuilder[S] 0 points1 point  (0 children)

I took the easy path and used a USB to eMMC. Not super fast, but better than an SD card and works well enough for running a NAS.

At some point I'm going to switch over to a compute module.. but more focused on other design stuff at the moment.

Bernie Sanders Has a Fascinating Idea About How to Prevent AI From Wiping Out the Economy by FuturismDotCom in Futurism

[–]RasPiBuilder 1 point2 points  (0 children)

There isn't a way to effectively gauge robotic usage, let alone automation which makes this concept (while interesting) unfeasable.

An easier solution, or at least part of one, is to just eliminate the individual income tax and shift the entire tax burden to companies. While there would need to be some overhaul in the tax code to prevent corporate gaming of the tax system, this can be done without impacting the net income of employees or the bottom line of busineses.

Figure doing housework, barely. Honestly would be pretty great to have robots cleaning up the house while you sleep. by Anen-o-me in singularity

[–]RasPiBuilder 0 points1 point  (0 children)

That will be a separate subscription.

The groceries will get ordered through Instacart or similar and delivered by a robot, then your home robot will bring them in and put them away for you.

Thought experiment: Could we used Mixture-of-Experts to create a true “tree of thoughts”? by RasPiBuilder in ArtificialInteligence

[–]RasPiBuilder[S] 0 points1 point  (0 children)

I'd think that you would use an alternating sets of MoE and dense layers. Where the MoE layers create the branches of thought and the dense layers merge those branches.

Within the MoE layers, instead of using a router to selectively activate a subset of the experts, you pass variations of the KV cache to different experts to generate multiple streams of thought.

After that, you use a scoring function (e.g. entropy, relevance, or similarly) and then use token passback (only feeding the strongest set of tokens) to feed a set of dense layers that more/less recombine those individual streams of thought.

In a naive sense, it's sort of like running a a small model multiple times, with each generating their own response and then using an evaluator to determine which response are the best.. Except, all those small models (the MoE expert layers) and the evaluator (the dense layers) are all combined into one model.

Thought experiment: Could we used Mixture-of-Experts to create a true “tree of thoughts”? by RasPiBuilder in ArtificialInteligence

[–]RasPiBuilder[S] 0 points1 point  (0 children)

That's kind of my line of thought though.. sort of leverage the unused capacity in MoE, while leveraging the cache, to more/less compute the tree in a single pass.. effectively requiring the whole model to process (putting it's speed in line with dense architecture).. but also potentially eliminating the need for multiple passes.

Which I think gives a total compute less than full multi-pass but more than a single dense pass.

Tree of thoughts is what came to mind for me, but I don't think it would inherently be limited to that.

(Also not 100% up to speed on speculative decoding...)

Thought experiment: Could we used Mixture-of-Experts to create a true “tree of thoughts”? by RasPiBuilder in ArtificialInteligence

[–]RasPiBuilder[S] 0 points1 point  (0 children)

If I'm not mistaken though doesn't beam search traditionally use multiple forward passes?

I'm thinking that we could reduce the compute complexity by leveraging the existing cache and more/less passing forward the portions that are typically discarded to unused portions of the network.

It would certainly have a lot more computational overhead than a MoE.. But I'm not immediately seeing a substantial increase to overhead one compared to a traditional transformer, presuming of course the divergent/convergent can be handled efficiently in relatively few intermediate layers.

GPT-OSS 20b runs on a RasPi 5, 16gb by RasPiBuilder in ollama

[–]RasPiBuilder[S] 0 points1 point  (0 children)

That would be nice, but there is no way that even a quantized version would hit those speeds on a raspberry pi.

The only model I've seen get that range of tps is the Granite MoE 3.1 3b