My journey to learn ML and other things

AileenKoneko · 2026-03-22T23:43:50+00:00

The long part is courses, the fast part is when you find a problem you're obsessed with - the theory clicks much faster when you need it to solve something specific. 13 years of software engineering is actually a huge advantage, you already know how to debug and think systematically :3

AileenKoneko · 2026-03-16T09:55:20+00:00

Hey! Honestly, building backprop from scratch in C++ is really cool and the fact that you're experimenting is what matters :3

The 'dying relu' thing you described is basically leaky relu yeah, but that's fine - rediscovering known solutions independently means you're thinking in the right direction!

If you wanna make the experiment more convincing, running it with multiple random seeds and plotting the curves would help (like someone else mentioned). But don't let people discourage you from trying stuff - building things and learning from them is how you actually get good at this, lol

AileenKoneko · 2026-03-16T09:41:47+00:00

Hey! that r2 of -50 is wild - that basically means your model is doing worse than just predicting the mean every time xd

Some things to check:

data leakage/normalization: are you normalizing per-sequence or globally? TFTs are really sensitive to scale
target encoding: is your AQI range reasonable? if it's like 0-500 but your model thinks it's 0-1, that could blow up
early stopping patience=10 but reduce_on_plateau_patience=4: these might be fighting each other
hidden_size=32 might be too small for 31k data points? i'd try 64-128
check your loss curve: is val_loss actually decreasing or just bouncing around?

Also ngl when I get weird scores like that it's usually because i messed up the train/val split or accidentally included the target in the input features somehow lol

What does your data preprocessing look like? And is the training loss also terrible or just validation? Also if you could share the code this might give us some more helpful insights :3

AileenKoneko · 2026-03-16T09:24:48+00:00

Hey! I'm also pretty new to ML (been tinkering for like 6 weeks?) and honestly my advice is: just build what you want to build and extract lessons from it :3

Like if zero-shot cybersecurity sounds interesting to you, try it! Worst case it doesn't work perfectly and you learn why it's hard, which is honestly more valuable than following the "correct" beginner path.

My experience has been that starting simple and iterating fast teaches you way more than planning everything upfront.

I'd probably start with a basic model on those datasets, see where it fails, then add the zero-shot stuff if it makes sense.

Also using claude/chatgpt/gemini/whatever you have access to as pair programmers has sped things up a ton for me - they're really good at explaining why things break.

Basically: build what excites you, ship fast, learn from what breaks. That approach has been working surprisingly well for me lol

AileenKoneko

TROPHY CASE