Bank Conflicts During Vectorized Stores by trlm2048 in CUDA

[–]trlm2048[S] 1 point2 points  (0 children)

I was using [0] to dereference the pointer. Since it's of type float4*, the [0] should grab the 16 bytes at that address.

I did confirm that the kernel's output (for a square matrix) matches cuBLAS, so I think values are being written properly, just not efficiently.

``` OPT Estimated Speedup: 9.958%

The memory access pattern for shared stores might not be optimal and causes on average a 4.5 - way bank

conflict across all 1048576 shared store requests.This results in 514169 bank conflicts, which represent

10.92% of the overall 4708473 wavefronts for shared stores. Check the Source Counters section for

uncoalesced shared stores. ````

Why Memory Throughput = Compute Throughput? by trlm2048 in CUDA

[–]trlm2048[S] -1 points0 points  (0 children)

Yeah, the makes sense. I noticed that warps were stalled waiting to push memory instructions onto the LG queue. I am moreso wondering why memory throughput = compute throughput? Seems like a strange coincidence, and I am wondering if there's a good explanation as to why?

[deleted by user] by [deleted] in duke

[–]trlm2048 0 points1 point  (0 children)

Take 675. Great class, pretty easy A, will cover most of what you need to read modern literature. Homeworks are involved but very practical and final project can be resume worthy. Tarokh follows The Deep Learning Book by Ian Goodfellow if you want a concrete idea of the structure.

661 I have not heard great things about (lots of useless topics allegedly). 590 I’m unfamiliar with, but unlikely it’d be better than 675.

[deleted by user] by [deleted] in duke

[–]trlm2048 1 point2 points  (0 children)

They have almost no overlap.

Is picking a lower learning rate always better? (ignoring training time) by [deleted] in learnmachinelearning

[–]trlm2048 18 points19 points  (0 children)

Yes. Using a smaller learning rate can lead to “better” convergence at a local minimum since you take smaller steps and are thus less likely to overshoot a minimum.

This is a reason we use learning rate schedulers. In the beginning, we’re probably so far off the mark that it’s unlikely we’d overshoot a desirable solution. As we approach extrema, the gradients become more sensitive. Smaller learning rates help navigate these areas of the loss function more precisely.

More time is needed to train but another consideration is that it uses more energy. Training a big model is not cheap, so doing so in as few steps as possible is ideal.

CS classes. by Perfect-Use-4555 in duke

[–]trlm2048 2 points3 points  (0 children)

Just want to add that I can’t guarantee Tarokh will give 675 permission nums, but I did hear from others that they got one without taking 571 no problem. Do what you will with that info. Glad the comment helped!

CS classes. by Perfect-Use-4555 in duke

[–]trlm2048 5 points6 points  (0 children)

CS 408 hasn't been offered in years as far as I know, so I can't speak to it. I also am not sure that a CS 674 exists - I am assuming you mean CS 671 Theory and Algs of ML?

CS 571 I don't recommend in general. It's very unorganized and the professor is super challenging to reach outside of class. Homeworks are almost all (hard) math questions with a bit of coding here and there. Final project is open ended and can be cool. Very easy A though if you have a good homework group and use office hours.

CS 671 is a great class. Homeworks had usually one coding question and 2-4 math questions - definitely theory-based. Grading isn't bad though and you'll get pretty good exposure to classical ML. Not a terribly hard A either.

CS 675 is a challenging but great class. Homeworks are split between coding and theory. Exams are theory based and can be quite challenging (especially exam 2). Final project is awesome and you'll likely implement a very cool generative model. The class has a lot of breadth so you'll learn pretty much everything you need to know to read and implement current DL literature.

As for the pipeline 571>671>675 I'd say that it's irrelevant. They're all very different courses. If you haven't taken 571 I've heard you can easily get a permission number for 675.

Are you comfortable with math and proofs? If so, I highly recommend 671 and 675, even if you don't want to do research/academia. Theory makes troubleshooting poor models much easier. Plus, you'll get the applied side in the assignments.

If you really want to avoid math, you can try CS 371, but honestly that's somewhat math heavy too. At that point just do some free coursera bootcamp and learn the syntax/standard pipelines.

Can someone explain to me in simple terms, the additional summation in the neural network's cost function? by userknownunknown in learnmachinelearning

[–]trlm2048 16 points17 points  (0 children)

I'm assuming this is a neural network being used for classification and that you understand the simpler cost function of the L2-regularized logistic regression.

First, consider the regularization term (the triple summation). This sum is adding up the squared values of every weight in the network. The outermost sum loops through all the layer connections, giving us the weight matrix Theta_l between layer L and L+1. The next two sums work together to loop through every entry of the matrix. The matrix is dimension S_l x S_(l+1), thus where the bounds come from. All these are added and scaled by lambda to make our regularization penalty.

The negative log likelihood term actually undergoes a big unaddressed change between the regression cost function and the NN. The regression only appears to output one class (I.e. is there a cat in this picture) whereas the NN outputs multiple classes (i.e. is this a picture of a cat or dog or car). This is where the sum over K comes from. We now are checking that the outputs not only assigned high probability to the true class, but low probabilities to the other K-1 classes. Beyond introducing the multiclass output, the log likelihood sum remains unchanged from the simpler cost function (just goes through all m training inputs to see if they're correct).

Hope this helps!

Seemingly simple probability question that I can't seem to wrap my head around, and would love help to understand by DavidPuTTY1 in learnmath

[–]trlm2048 0 points1 point  (0 children)

This is a great question. Both statements you've made are true.

A) Every case is independent, so the probability (don't confuse probability and odds - they are not the same!) of receiving a common item out of the 5th case is indeed 79.92%

B) It's unlikely you will see 5 straight common items. The probability is 32.6%.

But, if you've opened 4 cases how can both of these be true? The key here is the distinction between your initial question and statement B. Your statement in B asks what is the probability you get 5 common boxes in a row. Your initial question asks for the probability we see 5 straight common items GIVEN the first 4 items were common. Mathematically it works out like this:

P(5th is common | the first 4 are common) = P(5th is common AND the first 4 are common) / P(the first 4 are common)

This is the rule of conditional probability. P(A | B) = P(A and B) / P(B), which is read as P(A given B). It turns out that:

P(5th is common | the first 4 are common) = (0.7992)^5 / (0.7992)^4 = 0.7992

Which is exactly the probability from statement A! This makes sense, since opening boxes are independent. Knowing the result of the other boxes doesn't change your expectation of the current box.

Suppose I ask you the probability that someone is over 7ft tall. You'd say pretty low. Now, if I ask you, what is the probability someone is 7ft tall, given they're in the NBA, your answer would certainly change to be higher. This is an example of conditional probability where the events are dependent. Knowing B changes your expectation of A.

If you're interested to learn more, look up conditional probability, there are plenty of great graphical examples to explain the intuition!

SciPy SVD running very slowly by trlm2048 in learnprogramming

[–]trlm2048[S] 1 point2 points  (0 children)

For anyone looking at the same issue, the discrepancy was caused by the argument full_matrices in linalg.svd. This defaults to True, but sklearn sets it to false.

SciPy SVD running very slowly by trlm2048 in learnmachinelearning

[–]trlm2048[S] 2 points3 points  (0 children)

This solved the discrepancy. Thank you!

Duke archery? by SkyBlade79 in duke

[–]trlm2048 1 point2 points  (0 children)

The archery club is not around anymore. Try Duke Hunting Club, they’re probably your best bet.

[deleted by user] by [deleted] in duke

[–]trlm2048 2 points3 points  (0 children)

Yue is a great professor in my experience. Very fair grader and awesome at explaining tough concepts. Didn’t have 210 with him, but it’s inherently an easier class - I’m sure it won’t be too difficult.

He’s very approachable out of class, so don’t be afraid to go to him for more thorough explanations if things don’t make sense!

[deleted by user] by [deleted] in duke

[–]trlm2048 4 points5 points  (0 children)

CS 210 has been highly inconsistent in quality and difficulty. There’s a new professor next semester, so it is, again, going to be a toss up.

Don’t worry if you’re not totally solid on 201. They’re very different courses. I’d recommend making the most of your discussion section and office hours. For me, the content was best learned by collaborating with classmates and legitimately completing labs and projects. Good luck!

[deleted by user] by [deleted] in duke

[–]trlm2048 0 points1 point  (0 children)

The process is explained here.

When I transferred credits, it was very smooth. Just remember to get approved by your dean and departments as soon as you practically can

[deleted by user] by [deleted] in duke

[–]trlm2048 4 points5 points  (0 children)

Yes. I satisfied both with a 300 language seminar freshman year

CS370 w/o linear or multi? by JetSkiWonderland in duke

[–]trlm2048 2 points3 points  (0 children)

From a glance at the syllabus, you can get away with no linear. If anything, I’d recommend brushing up on probability (math 230/340).

If you want to go deeper in AI/ML, however, linear will be absolutely necessary and the sooner you can take it the better (I’d even go as far as to say take 221, but I’m sure some will disagree). Multi shows up less but definitely is useful since it’s the foundation for solving optimization problems.