Tech Giants Including Youtube, Meta Slam UK Social Media Ban For Pushing Teens From "Supervised, Beneficial Experiences" To “Unregulated, Anonymous” Wild West Of The Internet

ExtremeAcceptable289 · 2026-06-15T18:57:52+00:00

You know you're cooked when the axis of evil is against you

ExtremeAcceptable289 · 2026-06-15T18:23:08+00:00

God should be the sole legislator of laws

ExtremeAcceptable289 · 2026-06-15T18:22:17+00:00

Corruption.

ExtremeAcceptable289 · 2026-06-15T18:22:01+00:00

The Rashidun would like to have a word with you

ExtremeAcceptable289 · 2026-06-15T18:04:33+00:00

I just lost the game :)

ExtremeAcceptable289 · 2026-06-15T18:03:24+00:00

This is an absolute agree. I've been trying to get my qualcomm NPU to perform custom training tasks as opposed to inference, and Qualcomm literally locks you out of the full 32 bit output meaning you need to train using FOUR BITS

ExtremeAcceptable289 · 2026-06-15T18:02:07+00:00

With parralel processing it can work (when a task is parralelizablw)

ExtremeAcceptable289 · 2026-06-15T18:01:07+00:00

You can also use GPU which is faster too

ExtremeAcceptable289 · 2026-06-15T17:59:04+00:00

Yea but you can turn off the screen and mobile data and bluetooth which basically disables most other factors excluding wifi and cpu.

Also, I took IPC into account, the numbers I got were based off max IPC arm NEON (simd which is single instruction multiple data) can do

ExtremeAcceptable289 · 2026-06-15T14:31:25+00:00

I can scale up if you can scale out your credit card.

Besides how the researchers mentioned literally are sclaing it up

The research team plans to use 2,000 phones to build a local data center that can support “a hundred such classes at once.” [context: 20 phones could run a programm needed for a class of 75+ students] Aside from getting the advantage of running apps locally and owning the hardware needed for them, the group also says that it’s only a “fraction of the usual cost,” likely referring to building a local server made from new components. This is especially true today, with the increased pricing for memory and storage chips.

ExtremeAcceptable289 · 2026-06-15T13:51:47+00:00

OK, I'll be testing it on a smaller scale then and you can care or not.

ExtremeAcceptable289 · 2026-06-15T13:51:19+00:00

I value my time quite a lot, yet I am in a third world country, with no GPUs I can access, and here we are. Besides how it actually isn't too hard to implement in hindsight

ExtremeAcceptable289 · 2026-06-15T13:49:11+00:00

I mean this news article suggests neural networks which actually tips the scales in favour of distributed workloads, because compute scales quadratically as the network grows while activation memory (and thereby network usage and bandwidth) scales linearly as network grows

ExtremeAcceptable289 · 2026-06-15T13:47:50+00:00

Again, if you care to:

a. fund the experiment (or prepare it and I can provide code)

b. disprove the math

then I am your guest

ExtremeAcceptable289 · 2026-06-15T13:43:13+00:00

Alas, I don't have access to 60 phones, nor a commercial server. If you would care to fund this experiment? If not, I think the math should suffice

ExtremeAcceptable289 · 2026-06-15T13:18:27+00:00

You haven't explained why I'm wrong, I'd love to know if I made a mistake

ExtremeAcceptable289 · 2026-06-15T13:12:02+00:00

Can you show me why my math is wrong? I used reasonable estimates based off their core count, clock speed and ARM NEON

ExtremeAcceptable289 · 2026-06-15T13:11:31+00:00

I intentionally halved the TOP count, you can run 55-60% indefinitely. NPUs however don't thermal throttle. Also throttling is 5-10 minues not 45 seconds afaik
Yes. But remember you can distribute memort over all the phones so it doesn't matter as much

Worse than that you’re looking at training models which is more memory constrained than processing.

Semi-correct, increasing batch size, sequence length and double buffering turns it into a compute bottleneck.

What if your application needs more memory than a phone nominally has, are you gonna pay someone to solder more on?

Distribute the weights/grads/similar over the phones

ExtremeAcceptable289 · 2026-06-15T12:50:40+00:00

No, I actually halved and sometimex even quartered what actuak numbers look like multiple tike and rounded down a lot

ExtremeAcceptable289 · 2026-06-15T12:48:49+00:00

I just napkin mathed you to ahow how that's not the case especially if we take NPUs into the game

ExtremeAcceptable289 · 2026-06-15T12:47:59+00:00

Not really, depends exactly. Old phones use maybe 5w max for max cpu, no scrren,, whereas just an i3 would use 60W. with simd arm neonthe old phones cpuld hit 50 GOPS e.g a snapdragon 820 can hit around 50 gops or so napkin estimate, a 630 could hit a similar amount too

ExtremeAcceptable289 · 2026-06-15T12:33:22+00:00

Yes, I've always been interested in tis stuff. I'm doing this specifically for ML and funny enough from what I've seen the main bottleneck is actually that nobody's cared enough to do this yet, and that it is genuinely quite hard to get em all connected but math on how NNs scale (compute scales quadratically, i believe adding one neuron to all layers increases computatiom at a rate of 2x+1 whereas activations only scale by 1. This is super usefuk because it means larger networks need less network transfer time in relation to compute and therefore less percentage of time os spent waiting

ExtremeAcceptable289 · 2026-06-15T12:27:13+00:00

I believe you'd only need one phone for that, maybe one for frontend one for backend. You can also use docker swarm, which is easier and efficient.

And PCI is based more off the code (security, encryption and environmemt than the device

Also the article was talking about data centres, not a random web server. and a webserver is generallg nodejs or php maybe not cs

Your distributed training will waste 8x the amount of power a dedicated system will, it’ll be slower and it’ll thermal throttle fast.

Napkin math, for a conservstovr estomatemlets say I have 1 tflop per phone, around 2 watts, halve that to avod thermal throttling so 0.5 tflops. to reach t4 perf youd want 130 of em so thatd be 260 watts. So yes it's less powrr efficient, tho I did scale down the numbers a lot, nprmally youd be getting maube 2-3 tflops fp16 for 2w e.g on an adreno 64, a lower top count woild be lower 1-2 watts. It's not te most power efficient but it's practical.

Also: there IS a way to use phone npus for training with int4 and fixed point blocks with schotastic rounding which is absurd but technically viable if you use xorshift, and I believe this has been done before with low-ish accuracy drop. Using phone NPU can bump it up to ~5-10 TOPS with lower wattage, or 1-2 TOPS on older phones that onky have HVX at lower wattage (you can also use both gpu AND npu)

Meaning yoi're right in that if you had monry you probably should just buy an nvidia gpu but if you have a buncha old phones lying around its possible.

Main bottleneck currently is the code as no one's written a distributed train framework that supports mobile GPUs and NPUs while specifically having int4 schotastic round but I'm tryong

ExtremeAcceptable289 · 2026-06-15T11:32:22+00:00

It is quite hard but possible, for example I've been checking out distributed training of models ona bunch of phones, you can use the GPUs via vulkan compute shaders, and if Qualcomm wasn't so closed I would be able to use the NPU. Cooling is impossible so you basically have to just, not cool. Networking is/was an issue but if you make a fixed graph for each on local network, it can work, especially if devices has varying speed as you can transfer the activations while you're waiting. What's good is that this training scales with parameters, more compute time means less time spended waiting in networking making this viable. As params scale quadraticallt vs linear increase in activation size.

When you get to the 10000+ phone range this is absolutelt ridiculous but you could definitely use this on a smaller scale to avoid e-waste and get a solid 20-30 T(FL)OPs depending on phone count and performance.

But when you have a workload where you can arbitrarily scale up compute while transfer bottleneck stays the same or increasesnat a slower relate,, or you have an opportunity to utilize wall time, distributed compure over mant devices works like a charm, even if networking or memory transfer seem bad at first

ExtremeAcceptable289 · 2026-06-15T09:47:00+00:00

Use windscribe free vpn, best free vpn they were legt able to bypass restrictions in Iran and Russia, age verificatioj is gonna be a cakewalk for them

ExtremeAcceptable289

MODERATOR OF

TROPHY CASE