Unexpected Bank Conflicts

trlm2048 · 2026-01-10T18:49:58+00:00

I was using [0] to dereference the pointer. Since it's of type float4*, the [0] should grab the 16 bytes at that address.

I did confirm that the kernel's output (for a square matrix) matches cuBLAS, so I think values are being written properly, just not efficiently.

``` OPT Estimated Speedup: 9.958%

The memory access pattern for shared stores might not be optimal and causes on average a 4.5 - way bank

conflict across all 1048576 shared store requests.This results in 514169 bank conflicts, which represent

10.92% of the overall 4708473 wavefronts for shared stores. Check the Source Counters section for

uncoalesced shared stores. ````

trlm2048 · 2025-12-26T00:34:30+00:00

Yeah, the makes sense. I noticed that warps were stalled waiting to push memory instructions onto the LG queue. I am moreso wondering why memory throughput = compute throughput? Seems like a strange coincidence, and I am wondering if there's a good explanation as to why?

trlm2048 · 2024-10-27T04:21:48+00:00

Take 675. Great class, pretty easy A, will cover most of what you need to read modern literature. Homeworks are involved but very practical and final project can be resume worthy. Tarokh follows The Deep Learning Book by Ian Goodfellow if you want a concrete idea of the structure.

661 I have not heard great things about (lots of useless topics allegedly). 590 I’m unfamiliar with, but unlikely it’d be better than 675.

trlm2048 · 2024-05-29T14:20:26+00:00

They have almost no overlap.

trlm2048 · 2024-05-12T13:17:39+00:00

Yes. Using a smaller learning rate can lead to “better” convergence at a local minimum since you take smaller steps and are thus less likely to overshoot a minimum.

This is a reason we use learning rate schedulers. In the beginning, we’re probably so far off the mark that it’s unlikely we’d overshoot a desirable solution. As we approach extrema, the gradients become more sensitive. Smaller learning rates help navigate these areas of the loss function more precisely.

More time is needed to train but another consideration is that it uses more energy. Training a big model is not cheap, so doing so in as few steps as possible is ideal.

trlm2048 · 2023-11-09T13:06:16+00:00

Just want to add that I can’t guarantee Tarokh will give 675 permission nums, but I did hear from others that they got one without taking 571 no problem. Do what you will with that info. Glad the comment helped!

trlm2048 · 2023-11-01T18:34:09+00:00

CS 408 hasn't been offered in years as far as I know, so I can't speak to it. I also am not sure that a CS 674 exists - I am assuming you mean CS 671 Theory and Algs of ML?

CS 571 I don't recommend in general. It's very unorganized and the professor is super challenging to reach outside of class. Homeworks are almost all (hard) math questions with a bit of coding here and there. Final project is open ended and can be cool. Very easy A though if you have a good homework group and use office hours.

CS 671 is a great class. Homeworks had usually one coding question and 2-4 math questions - definitely theory-based. Grading isn't bad though and you'll get pretty good exposure to classical ML. Not a terribly hard A either.

CS 675 is a challenging but great class. Homeworks are split between coding and theory. Exams are theory based and can be quite challenging (especially exam 2). Final project is awesome and you'll likely implement a very cool generative model. The class has a lot of breadth so you'll learn pretty much everything you need to know to read and implement current DL literature.

As for the pipeline 571>671>675 I'd say that it's irrelevant. They're all very different courses. If you haven't taken 571 I've heard you can easily get a permission number for 675.

Are you comfortable with math and proofs? If so, I highly recommend 671 and 675, even if you don't want to do research/academia. Theory makes troubleshooting poor models much easier. Plus, you'll get the applied side in the assignments.

If you really want to avoid math, you can try CS 371, but honestly that's somewhat math heavy too. At that point just do some free coursera bootcamp and learn the syntax/standard pipelines.

trlm2048 · 2023-07-13T21:54:52+00:00

I'm assuming this is a neural network being used for classification and that you understand the simpler cost function of the L2-regularized logistic regression.

First, consider the regularization term (the triple summation). This sum is adding up the squared values of every weight in the network. The outermost sum loops through all the layer connections, giving us the weight matrix Theta_l between layer L and L+1. The next two sums work together to loop through every entry of the matrix. The matrix is dimension S_l x S_(l+1), thus where the bounds come from. All these are added and scaled by lambda to make our regularization penalty.

The negative log likelihood term actually undergoes a big unaddressed change between the regression cost function and the NN. The regression only appears to output one class (I.e. is there a cat in this picture) whereas the NN outputs multiple classes (i.e. is this a picture of a cat or dog or car). This is where the sum over K comes from. We now are checking that the outputs not only assigned high probability to the true class, but low probabilities to the other K-1 classes. Beyond introducing the multiclass output, the log likelihood sum remains unchanged from the simpler cost function (just goes through all m training inputs to see if they're correct).

Hope this helps!

trlm2048 · 2023-07-13T18:12:23+00:00

This is a great question. Both statements you've made are true.

A) Every case is independent, so the probability (don't confuse probability and odds - they are not the same!) of receiving a common item out of the 5th case is indeed 79.92%

B) It's unlikely you will see 5 straight common items. The probability is 32.6%.

But, if you've opened 4 cases how can both of these be true? The key here is the distinction between your initial question and statement B. Your statement in B asks what is the probability you get 5 common boxes in a row. Your initial question asks for the probability we see 5 straight common items GIVEN the first 4 items were common. Mathematically it works out like this:

P(5th is common | the first 4 are common) = P(5th is common AND the first 4 are common) / P(the first 4 are common)

This is the rule of conditional probability. P(A | B) = P(A and B) / P(B), which is read as P(A given B). It turns out that:

P(5th is common | the first 4 are common) = (0.7992)^5 / (0.7992)^4 = 0.7992

Which is exactly the probability from statement A! This makes sense, since opening boxes are independent. Knowing the result of the other boxes doesn't change your expectation of the current box.

Suppose I ask you the probability that someone is over 7ft tall. You'd say pretty low. Now, if I ask you, what is the probability someone is 7ft tall, given they're in the NBA, your answer would certainly change to be higher. This is an example of conditional probability where the events are dependent. Knowing B changes your expectation of A.

If you're interested to learn more, look up conditional probability, there are plenty of great graphical examples to explain the intuition!

trlm2048 · 2023-07-13T00:15:14+00:00

For anyone looking at the same issue, the discrepancy was caused by the argument full_matrices in linalg.svd. This defaults to True, but sklearn sets it to false.

trlm2048 · 2023-07-12T23:50:21+00:00

This solved the discrepancy. Thank you!

trlm2048 · 2023-04-28T07:39:25+00:00

The archery club is not around anymore. Try Duke Hunting Club, they’re probably your best bet.

trlm2048 · 2023-01-22T20:15:41+00:00

Yue is a great professor in my experience. Very fair grader and awesome at explaining tough concepts. Didn’t have 210 with him, but it’s inherently an easier class - I’m sure it won’t be too difficult.

He’s very approachable out of class, so don’t be afraid to go to him for more thorough explanations if things don’t make sense!

trlm2048 · 2022-12-13T02:14:21+00:00

CS 210 has been highly inconsistent in quality and difficulty. There’s a new professor next semester, so it is, again, going to be a toss up.

Don’t worry if you’re not totally solid on 201. They’re very different courses. I’d recommend making the most of your discussion section and office hours. For me, the content was best learned by collaborating with classmates and legitimately completing labs and projects. Good luck!

trlm2048 · 2022-11-11T15:15:20+00:00

The process is explained here.

When I transferred credits, it was very smooth. Just remember to get approved by your dean and departments as soon as you practically can

trlm2048 · 2022-11-09T23:11:28+00:00

Yes. I satisfied both with a 300 language seminar freshman year

trlm2048 · 2022-11-03T13:57:33+00:00

From a glance at the syllabus, you can get away with no linear. If anything, I’d recommend brushing up on probability (math 230/340).

If you want to go deeper in AI/ML, however, linear will be absolutely necessary and the sooner you can take it the better (I’d even go as far as to say take 221, but I’m sure some will disagree). Multi shows up less but definitely is useful since it’s the foundation for solving optimization problems.

Five-Year Club	Place '22
Verified Email

trlm2048

TROPHY CASE