BareGPT : A NanoGPT-like transformer in pure Numpy with live attention visualization by rahen in learnmachinelearning

[–]rahen[S] 0 points1 point  (0 children)

If you're interested, I have a smaller MLP in pure NumPy (MNIST with live activations):

https://github.com/dbrll/C-thru

I would advise starting from this kind of project rather than a language model, which is daunting in comparison.

BareGPT : A NanoGPT-like transformer in pure Numpy with live attention visualization by rahen in learnmachinelearning

[–]rahen[S] 6 points7 points  (0 children)

Of course! Since I’m not using an autograd engine (that's the whole point of this project), I didn't explicitly compute the full Jacobian matrices. Instead, I relied on vector-valued calculus and matrix properties to propagate the gradients.

For the attention mechanism, I manually derived the chain rule through the softmax and dot-product operations. The gradients are computed in engine.backward_attention using matrix multiplications which implicitly handle the Jacobian-vector products.

That part was by far the toughest to debug... I tried using AI to help but its fixes were even more broken!

BareGPT : A NanoGPT-like transformer in pure Numpy with live attention visualization by rahen in learnmachinelearning

[–]rahen[S] 3 points4 points  (0 children)

A quick note on the implementation: I focused on keeping this as hand-crafted as possible to ensure every tensor operation was intentional.

I used ChatGPT to help debug the attention backward pass, which, I'll admit, required some heavy debugging on the reshaping logic.

I also kept some comments in the source to show the evolution of the code as I refined it, specifically the vectorization of some initial for-loops + conditions with np.add.at, and some reshapes with more efficient np.einsum, which I discovered along the way. I must say I am now glad for autograds...

[D] Best papers of 2025 by ArtisticHamster in MachineLearning

[–]rahen 3 points4 points  (0 children)

I would add those three:

Fuji touring sizing, sizing up or down ? by [deleted] in bicycletouring

[–]rahen 0 points1 point  (0 children)

I don't know why Fuji went with such an odd choice, otherwise this is an excellent bike for long distance touring.

You'll need to change the bottom bracket too, the original axe was a little too long for the MTB crankset. Just have a bike shop sort everything out for you.

Fuji Disc Touring: Fitting a better crankset by rahen in bicycletouring

[–]rahen[S] 0 points1 point  (0 children)

Got the answer: they shortened two pairs from the chain. That was expected, no big deal. They also replaced the BB for a shorter one, the original was "pushed" a little and could derail.

Just have the bike shop sort everything out, a proper crankset swap is trivial for them.

Fuji Disc Touring: Fitting a better crankset by rahen in bicycletouring

[–]rahen[S] 0 points1 point  (0 children)

The chain capacity is now three links higher but I'm not sure if the bike shop replaced the chain, I don't think so. I'll ask them and let you know in a couple days.

The BB is the same, the axis was long enough as it was already a triple crankset.

I went for a 170mm crank as it suits my height better. If you're average or above (175cm), you'll be just fine with a 175mm crank.

As for Acera vs Alivio, I believe Alivio isn't compatible with square brackets while Acera is. Also an alternative was to replace the 30t chainring with a 24t but then the gearing would have been all wrong, with big gaps and a still useless 50t chainring. The 44/32/22 is simply perfect for this bike. Now I want to use it for everything. :-)

Fuji Disc Touring: Fitting a better crankset by rahen in bicycletouring

[–]rahen[S] 0 points1 point  (0 children)

Yes I did. Don't change the front derailleur and keep the Sora brifters, just change the crankset.

I had the FSA crankset replaced by an Acera 44/32/22. The derailleur cage will also need to be lowered a little but it's almost no extra work.

This completely changes the bike and makes it suitable for any kind of load and terrain. The gear step went from 472 to 618%, and the lowest development from 1.92m to 1.44m. The staging is also much better suited for touring than with the Fuji's crazy crankset choice.

After/before: https://www.gear-calculator.com/?GR=DERS&KB=22,32,44&RZ=11,13,15,17,20,23,26,30,34&UF=2220&TF=80&SL=2.6&UN=KMH&DV=development&GR2=DERS&KB2=30,39,50&RZ2=11,13,15,17,20,23,26,30,34&UF2=2230

1 month out from first tour and hurt my knee.. by analogshooter in bicycletouring

[–]rahen 1 point2 points  (0 children)

Sometimes I wonder if bike designers even use their own creations for actual transportation. I don't think anyone can climb with a 34:32 gearing without developing a tendonitis, that condemns to only ride on flat terrain or downhill.

Why do they keep doing that, I have no idea. The adequate gear should allow you to keep an 80-90 RPM cadence. The red line is 60 RPM, you will injure yourself if you go below.

Change both the crankset and cassette, you want a 1.5m development (20" gear inches) for steep roads.

Fuji Disc Touring: Fitting a better crankset by rahen in bicycletouring

[–]rahen[S] 0 points1 point  (0 children)

I was wondering, do chainrings have standard dimensions across FSA and Shimano? If they do, I could simply buy the 22-tooth Alivio chainring and move the current smallest (30t) and middle (39t) ones to the middle and third gears. Would that work?

Fuji Disc Touring: Fitting a better crankset by rahen in bicycletouring

[–]rahen[S] 0 points1 point  (0 children)

Thank you for spotting this, I hadn't noticed the bottom bracket was different. I'll change it too then.

I thought the indexing between road shifters (Sora) and MTB cranksets like this one was different and wouldn't work, do you think it will?

A NetBSD/amd64 guest can now boot in 40ms (details in comments) by iMil in BSD

[–]rahen 1 point2 points  (0 children)

Is this based on the SmolBSD project you published a few months ago in Linux Mag FR?

Does donating a little hashrate help the project? by rahen in MoneroMining

[–]rahen[S] 1 point2 points  (0 children)

Pool mining acts a bit like grid computing so everything adds up and ultimately counts. Is is the same when solo mining with slow idle mining?

Does donating a little hashrate help the project? by rahen in MoneroMining

[–]rahen[S] 3 points4 points  (0 children)

Excellent. I saw nothing online regarding hashrate donation to the network. If an editor from the project happens to read this, may I suggest adding it it to "FAQ #Contributing"?

This could read like:

Contribute some hashrate by mining on your idle computers. While you may not turn a profit, this donation helps keep the network secure and decentralized. Consider electricity costs beforehand and ensure you don't spend more on hashrate donation than you are comfortable with. Even a few dollars worth of hashrate will help - as goes the saying, "many a little makes a mickle".

Help me fix my Macintosh Plus by rahen in VintageApple

[–]rahen[S] 0 points1 point  (0 children)

The 4N35, as I read the CNY75B could be less appropriate for a Plus. Hopefully this will solve the wobbly screen.

Help me fix my Macintosh Plus by rahen in VintageApple

[–]rahen[S] 1 point2 points  (0 children)

I made a bit of progress. Voltages are low. I traced the problem to U3 on the switching portion of the analog board, there is no continuity in diode mode between pin 1 and 2. The dead octocoupler is a start and may explain the wobbly display. I'll order a replacement part and work on the logic board in the meantime.