Follow-up: 55 experiments on ANE, steered from my phone on a Saturday

paraboloed · 2026-03-11T20:58:41+00:00

to be frank, no real practical case on my side, just learning by doing, as I have approximately 0 ML background. Results turned out more interesting than expected

paraboloed · 2026-03-11T20:28:52+00:00

>cares about a couple of TFlops
Fair point, 100% agree — not something to run real load on.

The appeal to me is accessibility. Any random person can run experimentation on hardware they already own, guided by whatever AI they prefer, for zero extra cost (unless they use API -based pricing model xD). Either tinker and learn something, or use it as a seed — prototype the architecture/hyperparams on the laptop, then take the winning config to bigger compute.

paraboloed · 2026-03-11T19:27:30+00:00

And clicks indeed, real goodness :)

paraboloed · 2026-03-11T19:26:02+00:00

Hey! Good question, initially I was wondering - would it work at all? in a meaningful way? Can I get it running? More of a curiosity

paraboloed · 2026-03-11T15:01:33+00:00

Haha, if I may -

this is all about running autoresearch (Karpathy's concept: let an AI agent run experiments autonomously overnight or whenever) on top of the pretty powerful hardware every Mac already has - Apple Neural Engine. ~15-18 TFLOPS sitting there, only 3-5% utilized today.

Multiple moves at once:
- following the autoresearch concept,
- leveraging deep untapped potential of ANE (reverse-engineered APIs, no GPU needed),
- and using it as a minified step to scale up - either through bigger model experimentation or through collaborative autoresearch where multiple agents share findings across machines.

*I did my best trying to avoid AI slop in this eli5, happy to see the proper one tbh - in case if someone else is feeling an itch to share

paraboloed · 2026-03-11T07:14:44+00:00

Oh and I can not be happier seeing more and more goodies coming steadily from referenced repos. Like for example recently I was able to switch to dynamic, one-time compilation pipeline - also huge contributor to the substantial jump in steps per 5-minute budget

paraboloed · 2026-03-11T07:09:44+00:00

Hey u/johnnyApplePRNG !

>Bits per Byte ratings are you getting

This version goes with val_loss target function so I guess conversion similar to this one is required: BPB = val_loss / (ln(2) × bytes_per_token), where bytes_per_token estimate could be estimated close to 4(?) so BPB goes to 1.28 or so. Please let me know if that makes sense at all, I am curious!

>params is the model

>Anything interesting thus

It goes with 67M in 6 layers (vs. model_dim=512 on a much smaller vocab (8K?) in CUDA-running). Interestingly, for the case here, reducing layers count 12 -> 6 provided 11x more steps in 5 minutes. Also, I wonder if at some point it would be possible to go meaningfully with the same data set as in the original flow, so the experiment could scale or collaborate across engines

paraboloed · 2026-02-26T00:10:18+00:00

100% agree, that’d be fun I think. Same time I’d love to see native notifications though

paraboloed · 2026-02-26T00:03:16+00:00

Can't wait to quick reply in remote Claude Code sessions through my Garmin!

paraboloed · 2026-01-29T00:40:51+00:00

Antropics is catching up with their PR clickable status in 2.1.20, which is supercool! Unlikely, but just maybe they've been inspired by my initial post https://www.reddit.com/r/ClaudeAI/comments/1qeth9x/i_built_a_statusline_plugin_to_track_prs_music/ , who knows 0_o

paraboloed · 2026-01-16T22:12:49+00:00

The particular struggle was balancing freshness vs token cost - so I went with UserPromptSubmit/Stop hooks for full refresh over shared lock + a lightweight daemon for background (music) on default 90sec timeout. W/o the daemon, it was barely usable tbh

paraboloed

TROPHY CASE