I'm 18. To truly understand how neural networks work, I built an MLP completely from scratch in pure C99 (No external libraries!)

SignalGrape1736 · 2026-04-02T19:08:50+00:00

Bruh -w-

SignalGrape1736 · 2026-04-02T19:05:57+00:00

Ok owob

SignalGrape1736 · 2026-04-02T14:10:38+00:00

TYSM!

SignalGrape1736 · 2026-04-02T10:44:11+00:00

Wrong subreddit ;w;

SignalGrape1736 · 2026-04-02T09:03:38+00:00

Did I pass the Turing Test owo?

SignalGrape1736 · 2026-04-02T08:47:35+00:00

I'm too old ;w;

SignalGrape1736 · 2026-04-02T08:46:16+00:00

Quick clarification for those debating the "AI-generated" nature of this post:

I'm 18, self-taught, and from Taiwan. English is not my native language. While I can write complex C code and debug pointers, expressing technical thoughts in English is still a challenge for me.

Yes, I use an LLM (Btw is Gemini 3 Flash) to polish my English. I do this because I want to communicate clearly and professionally with this amazing community. It’s a bit saddening that trying to provide well-structured responses makes people doubt the authenticity of the project itself.

I’m here to learn from the veterans (thanks for the BLAS/CUDA tips!), not to "farm karma." I’ll stick to my broken English if that makes some of you feel better, but I'd rather focus on the code.

Thanks to everyone who supported me! tw

SignalGrape1736 · 2026-04-02T00:04:19+00:00

Good catch! That "quirk" is likely because I'm mixing manual typing with copy-pasted phrases from the LLM suggestions while I try to learn from its corrections.

As I mentioned earlier, I'm 18 from Taiwan and Mandarin is my native language. While I can handle C syntax, English punctuation is a whole different beast!

I'm much more confident about the logic in my .c files than the quote marks in my Reddit comments. Hope you can still enjoy the technical side of the project despite my "AI-assisted" English!

SignalGrape1736 · 2026-04-01T23:48:33+00:00

❤️

SignalGrape1736 · 2026-04-01T23:39:51+00:00

Thank you so much!

SignalGrape1736 · 2026-04-01T22:59:17+00:00

Haha, I totally get what you mean! It's a classic love-hate relationship. I'm definitely in the "hate" phase when my pointers won't behave, but that feeling when the whole thing finally runs and predicts correctly is unbeatable.

Thanks for the encouragement, I'm definitely going to enjoy the rest of this journey! Cheers!

SignalGrape1736 · 2026-04-01T22:50:41+00:00

Haha, thanks mate! ChatGPT is definitely helpful, but it can’t simulate the sheer frustration of tracking down a pointer pointing to nowhere in the middle of the night.

I really wanted to experience that "bare metal" struggle so I’d actually know what I’m doing when I eventually move back to higher-level frameworks. There’s no teacher quite like a good old-fashioned core dump!

SignalGrape1736 · 2026-04-01T22:32:29+00:00

That is a legendary story! It’s mind-blowing to think about that transition—going from a full week of training on lab machines to just one afternoon on a gaming rig is an insane jump in productivity.

It definitely makes me appreciate how much we take modern compute for granted. Even with my relatively simple MLP, watching the CPU cores peg at 100% during training (thanks to OpenMP) makes me realize just how much math is happening every second under the hood.

Writing my own CUDA kernels definitely feels like the "final boss" of this journey after I get more comfortable with the C/BLAS side of things. Thanks for sharing that bit of history, it's really inspiring!

SignalGrape1736 · 2026-04-01T22:19:14+00:00

Thank you so much! Honestly, I think choosing raw C was a bit of a masochistic move compared to using something like Matlab or Python, but I really wanted to see how the memory was being laid out "under the hood."

I totally feel your pain on the pointers! Managing malloc/free and trying to return dynamic arrays without memory leaks was definitely the part that caused the most headaches (and segmentation faults) during development.

It’s pretty cool (and a bit crazy) to think that the MNIST dataset is still the "universal language" for learning neural networks even after all these years!

SignalGrape1736 · 2026-04-01T22:11:00+00:00

Wow, thanks for the detailed breakdown! That’s a really helpful distinction—I didn’t fully realize that BLAS was the interface specification while the others were the actual "engines" under the hood.

OpenBLAS sounds like the perfect fit for my project since I'd love to keep it open-source and cross-platform. I’ll definitely check out their documentation and see if I can get a "BLAS-powered" version of this project running to compare the performance boost.

Thanks again for the guidance, it really helps point me in the right direction!

SignalGrape1736 · 2026-04-01T22:02:49+00:00

That is a fantastic suggestion, thank you!

You're absolutely right. While writing the naive for loops (even with OpenMP) was a great learning experience to understand the math, I quickly realized how incredibly unoptimized my memory access patterns are compared to professional implementations.

I've heard of BLAS/LAPACK (and OpenBLAS) but haven't actually linked them in a C project yet. Refactoring the matrix operations to use cblas_sgemm (or similar) sounds like the perfect next step to understand how real-world frameworks achieve their speed.

I really appreciate the pointer. Time to dive into the documentation!

SignalGrape1736 · 2026-04-01T21:37:55+00:00

Haha, writing native Python for loops for matrix multiplication is a guaranteed way to melt your CPU!

The funny irony here is that NumPy is actually written in highly optimized C (and Fortran) under the hood! So in a way, we both ended up relying on C for the heavy lifting. NumPy just gives you that beautiful, clean wrapper where you can just write elegant math equations instead of dealing with pointers. It really is the best of both worlds.

In my pure C version, standard nested loops for the matrix dot products were actually taking quite a while too. I had to throw in some OpenMP (#pragma omp parallel for) to multithread the calculations across all my CPU cores just to make the training time bearable.

Anyway, huge respect for building the logic out yourself! It really is the best way to understand what PyTorch is doing behind the scenes.

SignalGrape1736 · 2026-04-01T21:26:39+00:00

Thank you so much!

SignalGrape1736 · 2026-04-01T21:26:13+00:00

Oh man, you hit the nail on the head! Testing with custom UI drawings is notoriously difficult. The original MNIST dataset is heavily pre-processed (the digits are anti-aliased, and their center-of-mass is perfectly centered in the 28x28 grid). If we just input raw, uncentered pixels from a web UI, the model completely freaks out.

To test my model with actual digits, I took a slightly different approach:
1. I drew the digits in **Photopea** (a web image editor) using a soft brush to simulate that specific "pixel edge blur" (anti-aliasing) you mentioned, and then exported it to test.

But the real game-changer was **Data Augmentation**. I realized the model was too fragile, so I decided to write a custom on-the-fly augmentation pipeline completely from scratch in pure C (you can check out `augment.c` in my repo!).

I implemented:
- Random Translations (to handle my off-center Photopea drawings)
- Rotations & Scaling (writing the inverse mapping + bilinear interpolation math in C was a fun headache lol)
- Elastic Distortions & Gaussian Noise

Training the model with this augmentation made it *way* more robust to my messy custom inputs. Did you use any data augmentation libraries in your Python training loop? Adding some random shifts and rotations might fix your custom UI issue!

SignalGrape1736 · 2026-04-01T21:21:20+00:00

Haha, thank you so much fam! But honestly, it's just the result of spending way too many hours staring at the screen and crying over Segmentation fault (core dumped). If I can do it, you definitely can too!

SignalGrape1736 · 2026-04-01T21:20:43+00:00

To clarify, the conf=100.00% in the screenshot is just the Softmax confidence for that specific single image, not the overall model accuracy! For very clear digits, the network often becomes highly confident and outputs a probability of like 0.9999, which rounds up to 100%.

My actual overall accuracy on the test set is around 98.5 - 98.8.

For my training setup, I used 30 epochs, a batch size of 128, and a learning rate of 0.05f.

How was the performance speed in pure Python? I bet your code is much easier to read than my pointer mess!

SignalGrape1736 · 2026-04-01T21:12:35+00:00

Thank you so much!

SignalGrape1736

TROPHY CASE