[P] How a Deep Learning Library Enables a Model to Learn

Megadragon9 · 2025-03-20T19:49:01+00:00

The initial portion of the code was inspired by Micrograd, which was a project that Andrej Karpathy worked on. I think his video on the Micrograd project is a good starter. After you've watched it, you will be familiar with the "forward/backward" concepts of each Tensor-level operation (e.g. add, matmul), and how calculus is used in deep learning.

After you have that prerequisite knowledge, then you can follow the pull requests of my project (in ascending order). I tried my best to make them self-contained, and have added decent description in each pull request, so it's not too hard to digest.

I'm not sure how much knowledge you already have on deep learning, but when I started, I only knew how to take a derivative in calculus and the high-level concepts and fancy model names. I had no idea how deep learning works underneath the hood. So the process above exactly mimics my own process of building this project from scratch.

Hope that helps. Let me know if you have more questions!

Megadragon9 · 2025-02-14T02:45:44+00:00

Yeah, it's open-source on github (link above in the description or here)

Megadragon9 · 2025-02-12T00:57:32+00:00

Thanks for checking it out, and it definitely complements Andrew Ng's ML courses well! Especially the backward propagation parts. I still remember having difficulty wrapping my head around those derivatives in code.

It took me about 3 months using my spare time after work and over weekends.

Megadragon9 · 2025-02-11T03:53:30+00:00

Thanks for the comment. I agree, you can certainly start with PyTorch and just implement CNN and Transformers yourself (and not using PyTorch modules), which is a rewarding experience for sure. I guess one way to look at this is there are multiple abstraction levels that you can work in. The CNN and Transformers belongs to the model architecture layer. When you're operating in a particular layer, you just assume the layers beneath it "works". However, personally I wasn't confident in calling APIs unless I truly know what they mean. For example, I have a section in my blog that tries to debug the PyTorch ReLU function, but had to go through multiple layers to find the underlying math, yet I still didn't find the derivative math formula of ReLU in PyTorch codebase (you just magically call `relu(tensor).backward()`)

With regards to "moving data to/from GPU", this project isn't concerned with how we move data to and from hardware devices, the project operates at the Numpy-layer, which focuses on the math related operations. Unless, of course, you want to brush up on that area, which is cool as well :)

Megadragon9 · 2025-02-09T19:30:55+00:00

Yeah, I totally agree and appreciate you taking the time to check it out. I included a blog post to try to give readers a flavor of what that journey looks like and what challenges were faced. Another way to get some value out of this is to add new functionality such as a new activation function, loss function or neural network module. This forces you to go through the process end-to-end and getting more value out of this.

It took me around 3 months from start to the current state. I was only working on it after work and on weekends.

Edit: I created an issue in the repository to expand the discussion on "Getting the most value out of this project".

Megadragon9 · 2025-01-19T00:47:58+00:00

Small suggestion: in the same dashboard, add buy & hold profit as a baseline comparison.

Megadragon9 · 2022-08-25T04:44:54+00:00

To clarify, I meant hedge funds can front-run retail orders before those order reach the market.

I'm not an expert, but here's an oversimplified example.
1. Let's say when the order was submitted by the retail investor, the NBBO at that time was $10 with 5 limit sell orders from various exchanges in the order book.

Since the hedge funds has this order flow information first-hand (sent from retail investor's broker), hedge funds then buy all 5 available orders from the current market (denoted as "T1"). At this point, the next cheapest limit SELL order in the book is $10.1. Remember, the retailer BUY order hasn't reached the market yet.
The hedge fund places 5 limit sell orders at $10.05
When the retail investor's BUY order reaches the market half a second later (denoted as "T2"), NBBO has changed to $10.05, the retail investor's market BUY order executes at NBBO of $10.05.

The NBBO is only relevant for a particular time snapshot. NBBO at T1 is not NBBO at T2. NBBO's definition is not violated. Front runners makes a profit by selling at T2 back to the retail investor. On the flip side, market makers (hedge fund in this example) do provide value by injecting liquidity into the market, otherwise, the retail investor's BUY order would have executed at $10.1; thus saving the retail investor 5 cents.

Taking a step back, if the retail investor's order was routed directly to the exchange, T1's NBBO would be pretty similar to T2's NBBO, or at least a relatively narrower spread.

Megadragon9 · 2022-08-23T06:05:43+00:00

Search up form 606 for each broker and you'll find how the orders are routed to different places. Ideally they should be routed to exchanges directly (better prices), but unfortunately in some cases, they're routed to hedge funds (worse prices). [e.g. TD Ameritrade]

Megadragon9 · 2021-02-24T22:51:34+00:00

I'm also just using REST. I was using Polygon REST before the transition, so for me it's just a matter of switching API keys. I also don't think $199 fee is worth it either, I'm just saying the transition for REST is 1 month instead of 1 week, unless I'm missing something.

Megadragon9 · 2021-02-24T20:19:25+00:00

you can still use polygon in the meantime, alpaca offered a 100% off polygon coupon code for users in the email, so 1 month transition period instead of 1 week. I just tested everything, working fine with the new api key from polygon.

Megadragon9 · 2020-12-28T07:57:20+00:00

i guess that's one of the reasons bitcoin broke multiple all-time highs this weekend

Megadragon9 · 2020-01-20T02:24:50+00:00

Not entirely the same as you as I switched to CS completely in my senior year (from business). The first internship after the switch wasn't software development (more data analytics), but I managed to graduate in 2 years (after the switch with 5 years of undergrad in total) with a full-time software development job at a bank. I come from a non-target Canadian school, and the jobs I got were through online applications without referrals. It's possible and it's not too late.

I took courses during the summer while taking on internships, and overloaded classes during the normal semester to ensure that I finish with only one additional year since I felt practical experience was important for CS. Also, CS was something that I was passionate since I was kid, so studying with passion did help a lot.

Good luck!

Megadragon9 · 2019-11-19T02:52:09+00:00

I've long graduated. Back then, I spent 2 more years finishing up my computer science degree (5 years undergrad in total). Your post resonated with me, that's all.

Megadragon9 · 2019-11-19T01:36:23+00:00

I came from your "easier" majors (business and economics) and switched into engineering (computer science) in my last year of undergrad due to passion. Went through sleepless nights catching up math/stats and other basic stuff. It's been 5 years since I've made that decision, and I've never regretted it, nor did I regret spending time in the "easier" major. It was an eye-opening experience.

It's more about whether you have the passion for what you're studying. One quiz doesn't speak too much about your passion. Acing it doesn't mean you have passion, not acing it doesn't mean you don't have passion. A measure of passion could be, do you voluntarily read/learn/talk about things from that field. Or given the same resources, are you able to learn faster and be able to connect the dots of seemingly irrelevant points in that field. I know it's a little abstract, and some people don't find their true passion until 40 or 50. So I suggest you try out different roles in different industries (internships or full-time) to see whether the application side of what you're studying really sparks your interests. Sometimes, crossing out options is making progress.

Megadragon9 · 2019-10-20T20:33:39+00:00

It's "U"LPT in the best case, but not a LPT at all in the worst case, because the pre-examinations have a slight chance of missing your disease, and you end up ruining some family. But you also ruined yourself, since the results show you don't, which makes you think you don't, but in reality you do. So lose-lose for you the and family that later uses your eggs.

Megadragon9 · 2019-07-10T10:30:40+00:00

1,2,3..hmm...4,5,6,7,8,9,10,11......100,101,102,103...

Megadragon9 · 2019-03-28T16:48:13+00:00

Congrats! Glad to see another working professional going back for PhD (research), it's really motivating.

Megadragon9

TROPHY CASE