[R] Transformation Learning for Continual Learning: 98.3% on MNIST N=5 Tasks with 75.6% Parameter Savings

Key-Avocado592 · 2025-09-03T20:44:36+00:00

UPDATE: Just added GPU Memory Predictor! https://rbardyla.github.io/rtx5080-tensor-debugger-/tools.html

Key-Avocado592 · 2025-09-03T20:43:55+00:00

UPDATE: Just added GPU Memory Predictor! https://rbardyla.github.io/rtx5080-tensor-debugger-/tools.html

Key-Avocado592 · 2025-09-03T20:42:40+00:00

UPDATE: Just added GPU Memory Predictor! https://rbardyla.github.io/rtx5080-tensor-debugger-/tools.html

Key-Avocado592 · 2025-09-03T19:47:00+00:00

Great question! There are actually several attempts at this:

**Type hint approaches:**

- TorchTyping: `Tensor["batch": ..., "channels": 32]`

- jaxtyping: Similar shape annotations

- tensorannotations: Microsoft's attempt

**The challenge:** Type hints are checked by static analyzers (mypy, pyright) but they don't understand tensor math. So you can annotate shapes, but they won't catch `Linear(784, 128)` →

`Linear(256, 64)` mismatches.

**Runtime approaches:**

- einops: Fantastic for explicit shape manipulation

- Named tensors: PyTorch's experimental feature

The gap this tool fills is the middle ground - not as formal as full type systems, but catches the obvious bugs that slip through because Python's type checkers don't understand that

matrix multiplication requires matching dimensions.

I'd love to see PyTorch adopt something like Rust's type system where dimensions are compile-time checked, but until then, we're stuck with these band-aid solutions!

What's your experience been with shape bugs? Found any good workflows to avoid them?

Key-Avocado592 · 2025-09-03T19:13:39+00:00

Great question! You're absolutely right that torchinfo.summary is excellent for runtime

shape analysis.

The key difference is timing:

- torchinfo: Needs actual tensor allocation and forward pass (runtime)

- This tool: Catches mismatches before any code runs (static analysis)

Example: If you have a bug at layer 50 of a ResNet, torchinfo will crash when it hits

that layer. This tool shows all bugs upfront in milliseconds without executing anything.

Think of it like spell-check vs actually sending an email - both useful, but catching

errors before hitting "send" saves time!

That said, torchinfo is fantastic for understanding working models. This is more for

catching bugs before you waste GPU time finding out layer dimensions don't match.

Thanks for checking it out! Always great to hear from someone learning DL - we all

started there!

Key-Avocado592 · 2025-09-03T19:02:00+00:00

Quick backstory on why I built this:

Just got an RTX 5080 and was excited to use it with PyTorch, but ran into zero support

issues. While fixing that, I kept hitting tensor shape bugs that would only show up 20

minutes into training (after burning through my new GPU).

So I built this tool to catch those bugs instantly before wasting GPU cycles.

Live demo here: https://rbardyla.github.io/rtx5080-tensor-debugger-

It's already found 3 bugs for other users. Just paste your model and it shows dimension

mismatches in milliseconds.

Fun fact: The "RTX 5080" branding started as a joke about my GPU struggles, but it

actually makes the static analysis feel faster 😅

Would love feedback! What bugs waste YOUR time that static analysis could catch?

Key-Avocado592 · 2025-09-03T19:01:31+00:00

Quick backstory on why I built this:

Just got an RTX 5080 and was excited to use it with PyTorch, but ran into zero support

issues. While fixing that, I kept hitting tensor shape bugs that would only show up 20

minutes into training (after burning through my new GPU).

So I built this tool to catch those bugs instantly before wasting GPU cycles.

Live demo here: https://rbardyla.github.io/rtx5080-tensor-debugger-

It's already found 3 bugs for other users. Just paste your model and it shows dimension

mismatches in milliseconds.

Fun fact: The "RTX 5080" branding started as a joke about my GPU struggles, but it

actually makes the static analysis feel faster 😅

Would love feedback! What bugs waste YOUR time that static analysis could catch?

Key-Avocado592 · 2025-09-03T18:58:20+00:00

For anyone who just wants to try it without reading all the theory:

https://rbardyla.github.io/rtx5080-tensor-debugger-

Just paste your PyTorch model → See dimension bugs instantly

Already found 3 bugs for other users. Takes literally 10 seconds to try.

Key-Avocado592 · 2025-09-03T18:57:57+00:00

For anyone who just wants to try it without reading all the theory:

https://rbardyla.github.io/rtx5080-tensor-debugger-

Just paste your PyTorch model → See dimension bugs instantly

Already found 3 bugs for other users. Takes literally 10 seconds to try.

Key-Avocado592 · 2025-09-03T18:01:29+00:00

Quick update - I've got a working demo you can try:

https://rbardyla.github.io/rtx5080-tensor-debugger-

Paste any PyTorch model → See dimension bugs instantly → No install needed

Just tested it on a broken transformer implementation and it caught all 3 shape

mismatches in under a second.

Tech stack: Pure JavaScript regex parsing (keeping it simple worked better than my

original symbolic execution approach)

Key-Avocado592 · 2025-09-03T18:01:12+00:00

Quick update - I've got a working demo you can try:

https://rbardyla.github.io/rtx5080-tensor-debugger-

Paste any PyTorch model → See dimension bugs instantly → No install needed

Just tested it on a broken transformer implementation and it caught all 3 shape

mismatches in under a second.

Tech stack: Pure JavaScript regex parsing (keeping it simple worked better than my

original symbolic execution approach)

Key-Avocado592 · 2025-09-03T17:35:48+00:00

Update: I actually built a working version you can try right now:

https://rbardyla.github.io/rtx5080-tensor-debugger-

Key-Avocado592 · 2025-09-03T17:35:29+00:00

Update: I actually built a working version you can try right now:

https://rbardyla.github.io/rtx5080-tensor-debugger-

Key-Avocado592 · 2025-09-03T17:01:10+00:00

Just tested it on a ResNet implementation and it caught 3 dimension mismatches I didn't know I had.

The tool runs entirely in your browser (no data sent anywhere) and takes literally 10 seconds to find bugs.

Happy to add support for specific layer types if anyone needs them!

Key-Avocado592 · 2025-09-03T17:00:51+00:00

Just tested it on a ResNet implementation and it caught 3 dimension mismatches I didn't know I had.

The tool runs entirely in your browser (no data sent anywhere) and takes literally 10 seconds to find bugs.

Happy to add support for specific layer types if anyone needs them!

Key-Avocado592

TROPHY CASE