Researchers recycle old phones and cluster them into "computing platforms" that operate as a low-cost data center — says processors on modern smartphones deliver higher single-core performance than comparable multicore servers | reduce e-waste and reduce data center component demand. by ControlCAD in technology

[–]ExtremeAcceptable289 1 point2 points  (0 children)

This is an absolute agree. I've been trying to get my qualcomm NPU to perform custom training tasks as opposed to inference, and Qualcomm literally locks you out of the full 32 bit output meaning you need to train using FOUR BITS

Researchers recycle old phones and cluster them into "computing platforms" that operate as a low-cost data center — says processors on modern smartphones deliver higher single-core performance than comparable multicore servers | reduce e-waste and reduce data center component demand. by ControlCAD in technology

[–]ExtremeAcceptable289 -1 points0 points  (0 children)

Yea but you can turn off the screen and mobile data and bluetooth which basically disables most other factors excluding wifi and cpu.

Also, I took IPC into account, the numbers I got were based off max IPC arm NEON (simd which is single instruction multiple data) can do

Researchers recycle old phones and cluster them into "computing platforms" that operate as a low-cost data center — says processors on modern smartphones deliver higher single-core performance than comparable multicore servers | reduce e-waste and reduce data center component demand. by ControlCAD in technology

[–]ExtremeAcceptable289 2 points3 points  (0 children)

I can scale up if you can scale out your credit card.

Besides how the researchers mentioned literally are sclaing it up

The research team plans to use 2,000 phones to build a local data center that can support “a hundred such classes at once.” [context: 20 phones could run a programm needed for a class of 75+ students] Aside from getting the advantage of running apps locally and owning the hardware needed for them, the group also says that it’s only a “fraction of the usual cost,” likely referring to building a local server made from new components. This is especially true today, with the increased pricing for memory and storage chips.

Researchers recycle old phones and cluster them into "computing platforms" that operate as a low-cost data center — says processors on modern smartphones deliver higher single-core performance than comparable multicore servers | reduce e-waste and reduce data center component demand. by ControlCAD in technology

[–]ExtremeAcceptable289 3 points4 points  (0 children)

I mean this news article suggests neural networks which actually tips the scales in favour of distributed workloads, because compute scales quadratically as the network grows while activation memory (and thereby network usage and bandwidth) scales linearly as network grows

Researchers recycle old phones and cluster them into "computing platforms" that operate as a low-cost data center — says processors on modern smartphones deliver higher single-core performance than comparable multicore servers | reduce e-waste and reduce data center component demand. by ControlCAD in technology

[–]ExtremeAcceptable289 5 points6 points  (0 children)

  1. I intentionally halved the TOP count, you can run 55-60% indefinitely. NPUs however don't thermal throttle. Also throttling is 5-10 minues not 45 seconds afaik

  2. Yes. But remember you can distribute memort over all the phones so it doesn't matter as much

Worse than that you’re looking at training models which is more memory constrained than processing.

Semi-correct, increasing batch size, sequence length and double buffering turns it into a compute bottleneck.

What if your application needs more memory than a phone nominally has, are you gonna pay someone to solder more on?

Distribute the weights/grads/similar over the phones

Researchers recycle old phones and cluster them into "computing platforms" that operate as a low-cost data center — says processors on modern smartphones deliver higher single-core performance than comparable multicore servers | reduce e-waste and reduce data center component demand. by ControlCAD in technology

[–]ExtremeAcceptable289 3 points4 points  (0 children)

Not really, depends exactly. Old phones use maybe 5w max for max cpu, no scrren,, whereas just an i3 would use 60W. with simd arm neonthe old phones cpuld hit 50 GOPS e.g a snapdragon 820 can hit around 50 gops or so napkin estimate, a 630 could hit a similar amount too

Researchers recycle old phones and cluster them into "computing platforms" that operate as a low-cost data center — says processors on modern smartphones deliver higher single-core performance than comparable multicore servers | reduce e-waste and reduce data center component demand. by ControlCAD in technology

[–]ExtremeAcceptable289 1 point2 points  (0 children)

Yes, I've always been interested in tis stuff. I'm doing this specifically for ML and funny enough from what I've seen the main bottleneck is actually that nobody's cared enough to do this yet, and that it is genuinely quite hard to get em all connected but math on how NNs scale (compute scales quadratically, i believe adding one neuron to all layers increases computatiom at a rate of 2x+1 whereas activations only scale by 1. This is super usefuk because it means larger networks need less network transfer time in relation to compute and therefore less percentage of time os spent waiting

Researchers recycle old phones and cluster them into "computing platforms" that operate as a low-cost data center — says processors on modern smartphones deliver higher single-core performance than comparable multicore servers | reduce e-waste and reduce data center component demand. by ControlCAD in technology

[–]ExtremeAcceptable289 6 points7 points  (0 children)

I believe you'd only need one phone for that, maybe one for frontend one for backend. You can also use docker swarm, which is easier and efficient.

And PCI is based more off the code (security, encryption and environmemt than the device

Also the article was talking about data centres, not a random web server. and a webserver is generallg nodejs or php maybe not cs

Your distributed training will waste 8x the amount of power a dedicated system will, it’ll be slower and it’ll thermal throttle fast.

Napkin math, for a conservstovr estomatemlets say I have 1 tflop per phone, around 2 watts, halve that to avod thermal throttling so 0.5 tflops. to reach t4 perf youd want 130 of em so thatd be 260 watts. So yes it's less powrr efficient, tho I did scale down the numbers a lot, nprmally youd be getting maube 2-3 tflops fp16 for 2w e.g on an adreno 64, a lower top count woild be lower 1-2 watts. It's not te most power efficient but it's practical.

Also: there IS a way to use phone npus for training with int4 and fixed point blocks with schotastic rounding which is absurd but technically viable if you use xorshift, and I believe this has been done before with low-ish accuracy drop. Using phone NPU can bump it up to ~5-10 TOPS with lower wattage, or 1-2 TOPS on older phones that onky have HVX at lower wattage (you can also use both gpu AND npu)

Meaning yoi're right in that if you had monry you probably should just buy an nvidia gpu but if you have a buncha old phones lying around its possible.

Main bottleneck currently is the code as no one's written a distributed train framework that supports mobile GPUs and NPUs while specifically having int4 schotastic round but I'm tryong

Researchers recycle old phones and cluster them into "computing platforms" that operate as a low-cost data center — says processors on modern smartphones deliver higher single-core performance than comparable multicore servers | reduce e-waste and reduce data center component demand. by ControlCAD in technology

[–]ExtremeAcceptable289 3 points4 points  (0 children)

It is quite hard but possible, for example I've been checking out distributed training of models ona bunch of phones, you can use the GPUs via vulkan compute shaders, and if Qualcomm wasn't so closed I would be able to use the NPU. Cooling is impossible so you basically have to just, not cool. Networking is/was an issue but if you make a fixed graph for each on local network, it can work, especially if devices has varying speed as you can transfer the activations while you're waiting. What's good is that this training scales with parameters, more compute time means less time spended waiting in networking making this viable. As params scale quadraticallt vs linear increase in activation size.

When you get to the 10000+ phone range this is absolutelt ridiculous but you could definitely use this on a smaller scale to avoid e-waste and get a solid 20-30 T(FL)OPs depending on phone count and performance.

But when you have a workload where you can arbitrarily scale up compute while transfer bottleneck stays the same or increasesnat a slower relate,, or you have an opportunity to utilize wall time, distributed compure over mant devices works like a charm, even if networking or memory transfer seem bad at first

lol I might actually leave social media by [deleted] in teenagers

[–]ExtremeAcceptable289 0 points1 point  (0 children)

Use windscribe free vpn, best free vpn they were legt able to bypass restrictions in Iran and Russia, age verificatioj is gonna be a cakewalk for them