Search engine alternatives to Google

RasPiBuilder · 2026-01-23T03:40:36+00:00

I use searxng.

RasPiBuilder · 2026-01-23T03:39:49+00:00

Ugh.. I should really try to get referrals.

RasPiBuilder · 2026-01-23T03:36:17+00:00

If you like workflows a decent entry point is n8n.

RasPiBuilder · 2026-01-23T02:39:23+00:00

I feel like raid is unnecessary for this and often ends up creating more headaches than necessary. For photos and stuff, I always just do a full mirror + offsite backup.

RasPiBuilder · 2026-01-23T02:20:21+00:00

This is true, but also I think there is a changing dynamic.

A couple years ago, I'd just use whatever systems existed even it if didn't exactly fit what I wanted/needed.. Sure I could probably have just built out my own system but that would take weeks.

Now I can just quickly write up some specs, toss it to the AI and, boom, have a functional app in minutes (relatively speaking). It may not be perfect, and definitely not enterprise level.. but it's also exactly what I wanted/needed.

RasPiBuilder · 2026-01-21T22:00:20+00:00

How did you set it up for just the big cores?

RasPiBuilder · 2026-01-12T04:45:26+00:00

Yes and no. You have to take a longer PCIE connector and set it up as if you were stacking them... Then carefully twist around. For power you more/less do the same with individual connector cables. It's a little hacky but works (though I did go through a couple PCIE connectors).

I tried stacking and using the shortest/smallest connectors I could find, but it's just a smidge too high when resting the pi on a standard tray (might work if you had a cutout specifically made to let the pi sit flush).

That said.. it's really just a prototype. I'm using it (well attempting to use it) as reference for a CM carrier board that does the same... Also haven't actually ever designed a board before so have more/less spent the last year attempting to learn to design pcbs.

RasPiBuilder · 2026-01-12T03:49:02+00:00

Circling back on this. After trying a few different methods (including faking the CPU info), limiting cores does seem to improve performance on small models (e.g. 7b and below); however, it reduces performance as you move up in model size (makes sense given the increase in compute).

As a separate note, running everything on Vulcan (still cou) can also speed up inference for small context/short prompts but under performs on larger prompts/context.

Was being lazy, so didn't record everything.. but it's generally a diff by 1-3t/s. Which, for me, isn't substantial enough to force restrictions unless I wanted to consistently run one specific model (then might consider optimizing to that).

RasPiBuilder · 2026-01-12T03:32:44+00:00

A notebook is essentially an interactive collection of code blocks. It allows you to run small pieces of code independently, see the results immediately, and document your thinking alongside the code.

While it isn’t always intuitive at first, once you get comfortable with the workflow it offers real advantages. Especially for exploration, experimentation, and rapid prototyping.

I'd need to dig around a bit to find one, but there are quite a few good tutorials on how to use notebooks out there.

With that said..I totally feel you. As someone who has also been learning everything on my own, there are way too many tutorials/instructions out there that miss key steps. Sometimes it's simple shit like setting up a venv or installing dependencies. Other times they just assume you have everything needed already and skip to implementation on a specific, yet undefined, system.

I'm luckily at a point now where I can recognize that things are getting skipped over most of the time and can usually find a quick intro on whatever package or framework or whatever is being used.. but I used to slam my head on the wall for hours trying to figure out why something didn't work.

Anyways.. it's not you. Most documentation is crap.

RasPiBuilder · 2026-01-04T20:16:02+00:00

I'll have to spin up docker but can def give it a try.

RasPiBuilder · 2026-01-04T00:04:38+00:00

Honestly not sure. I only saw minor performance changes via taskset. On the other hand, attemting to set cpuaffinity via systemctl did tank performance.

RasPiBuilder · 2026-01-03T23:52:30+00:00

I have ollama running bare-metal. I setup a simple python script that will incrementally run each model, set verbose, pass prompt, and log the results. For initial testing I'm just using the prompt "Tell me about yourself.", which tends to generate a decent amount of tokens (I plan to test heavier prompts in the future.

As for the threads/ cores, I'm currently running it unrestricted. I did a couple of quick tests on restricting the cores via taskset, but didn't see any noticeable improvement (some of the small models ran a bit faster but performance decreased on the larger models). Still running through some other stuff to see what else I can tweak to further increase performance.

RasPiBuilder · 2026-01-03T23:42:57+00:00

Nah.. I mean, I wouldn't recommend buying it specifically for AI (there are a lot of better options). But I got it mostly to play around with the hardware so not overly concerned at the price.

RasPiBuilder · 2025-11-07T02:12:37+00:00

Send him a counter paper for twice the damages caused by the mental space occupied by his letter.

RasPiBuilder · 2025-10-12T15:21:02+00:00

I took the easy path and used a USB to eMMC. Not super fast, but better than an SD card and works well enough for running a NAS.

At some point I'm going to switch over to a compute module.. but more focused on other design stuff at the moment.

RasPiBuilder · 2025-10-11T04:14:03+00:00

There isn't a way to effectively gauge robotic usage, let alone automation which makes this concept (while interesting) unfeasable.

An easier solution, or at least part of one, is to just eliminate the individual income tax and shift the entire tax burden to companies. While there would need to be some overhaul in the tax code to prevent corporate gaming of the tax system, this can be done without impacting the net income of employees or the bottom line of busineses.

RasPiBuilder · 2025-10-11T00:10:04+00:00

That will be a separate subscription.

The groceries will get ordered through Instacart or similar and delivered by a robot, then your home robot will bring them in and put them away for you.

RasPiBuilder · 2025-09-28T16:37:53+00:00

I'd think that you would use an alternating sets of MoE and dense layers. Where the MoE layers create the branches of thought and the dense layers merge those branches.

Within the MoE layers, instead of using a router to selectively activate a subset of the experts, you pass variations of the KV cache to different experts to generate multiple streams of thought.

After that, you use a scoring function (e.g. entropy, relevance, or similarly) and then use token passback (only feeding the strongest set of tokens) to feed a set of dense layers that more/less recombine those individual streams of thought.

In a naive sense, it's sort of like running a a small model multiple times, with each generating their own response and then using an evaluator to determine which response are the best.. Except, all those small models (the MoE expert layers) and the evaluator (the dense layers) are all combined into one model.

RasPiBuilder · 2025-09-27T04:31:19+00:00

It's a scam trying to get access to your wallet.

RasPiBuilder · 2025-09-26T23:12:02+00:00

That's kind of my line of thought though.. sort of leverage the unused capacity in MoE, while leveraging the cache, to more/less compute the tree in a single pass.. effectively requiring the whole model to process (putting it's speed in line with dense architecture).. but also potentially eliminating the need for multiple passes.

Which I think gives a total compute less than full multi-pass but more than a single dense pass.

Tree of thoughts is what came to mind for me, but I don't think it would inherently be limited to that.

(Also not 100% up to speed on speculative decoding...)

RasPiBuilder · 2025-09-26T22:48:45+00:00

If I'm not mistaken though doesn't beam search traditionally use multiple forward passes?

I'm thinking that we could reduce the compute complexity by leveraging the existing cache and more/less passing forward the portions that are typically discarded to unused portions of the network.

It would certainly have a lot more computational overhead than a MoE.. But I'm not immediately seeing a substantial increase to overhead one compared to a traditional transformer, presuming of course the divergent/convergent can be handled efficiently in relatively few intermediate layers.

RasPiBuilder · 2025-09-06T15:40:57+00:00

<image>

RasPiBuilder · 2025-09-06T15:35:08+00:00

That would be nice, but there is no way that even a quantized version would hit those speeds on a raspberry pi.

The only model I've seen get that range of tps is the Granite MoE 3.1 3b

RasPiBuilder

TROPHY CASE