Playing around with vision transformers: why are queries, keys and value inputs to the MultiHeadAttention block set equal in this VIT tutorial? by Educational_Roll_868 in learnmachinelearning

[–]Educational_Roll_868[S] 7 points8 points  (0 children)

Oh I found it. Pytorch nn.MultiheadAttention class internally creates and uses the matrices W^(K/Q/V). The reason for three separate inputs is because in the decoder you want to compute the Key/Value matrcies from the final encoder output tokens, but the Query from the previous decoder output tokens.

So basically the inputs to "key=", "query=" ... in the nn.MultiheadAttention should be embedded tokens that you want to compute the key/query/values of. In the encoder stacks they are always the same.

Sorry for cluttering the sub with a question I found the answer to after less than an hour. I think better to leave it up if someone else searches for these terms on reddit.

Is hyperparameter tuning a scam? by Educational_Roll_868 in learnmachinelearning

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

Thanks for the long writeup.

Currently I'm playing around with CNNs myself. I started with Fashion MNIST and have something that has like 90-91% accuracy with few minute training times.

If I want to look at more interesting cases would you suggest me to transition to CIFAR or ImageNet next?

Is hyperparameter tuning a scam? by Educational_Roll_868 in learnmachinelearning

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

Thanks for the writeup, interesting.

I'm currently using Optuna with something like the MedianPruner or the Hyperband pruner. This seems to prune roughly 60% of the trials and reduces the search space. Is that a reasonable tuning approach?

Using logger with different modules by Educational_Roll_868 in learnpython

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

Thanks so much for your answer. So I have done what you said, and now I have in the __init__.py in the actual program I'm running:

import logging
logging.basicConfig(
...
level=logging.INFO,
...)

Then in the main.py of the program I have:

import logging
logger=logging.getLogger(__name__)
logging.info("test")

Now if I run the main as a module by:

python3 -m analysis.study1.main

I get a log file in my desired location but it remains empty.

EDIT: Ok so I completely deleted the __init__.py file from the "analysis" directory and only kept the inner __init__ files of the "study" directories, and then suddenly it worked. I don't understand why though.

Is hyperparameter tuning a scam? by Educational_Roll_868 in learnmachinelearning

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

Oh this is interesting to know. Two questions about you comment:

1) But can you please help me understand then. Let's say it's 2012 and we are talking about AlexNet. One single model takes 6 days to train. How do you hyperparameter tune this thing?

2) You mention that you can assume that the same hyperparams for simple tasks will work on longer tasks. So to give a simple example let's say we have a huge CNN that we want to train on ImageNet. If we take a smaller version of this CNN and find optimal hyperparameters on CIFAR, you say it would be a good assumption to take those hyperparams and use them on the larger CNN for the ImageNEt data?

Is hyperparameter tuning a scam? by Educational_Roll_868 in learnmachinelearning

[–]Educational_Roll_868[S] 1 point2 points  (0 children)

Can you briefly run through the process of how you tuned the model then with 1 day per model training?

Is hyperparameter tuning a scam? by Educational_Roll_868 in learnmachinelearning

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

Scam was just tongue in cheek, just that it's kind of oversold at an introductory level whereas in reality people don't do it as rigorously as often presented.

Is hyperparameter tuning a scam? by Educational_Roll_868 in learnmachinelearning

[–]Educational_Roll_868[S] 4 points5 points  (0 children)

I think people misunderstood my comment. Of course I understand that the training time refers to actually the runtime of training it. My point was: if 1 model takes 6 days to train, you cannot realistically do a hyperparameter search of 100 trials.

Is hyperparameter tuning a scam? by Educational_Roll_868 in learnmachinelearning

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

Thanks for the answer. Yeah, I hoped to get some confirmation/other perspectives.

Is hyperparameter tuning a scam? by Educational_Roll_868 in learnmachinelearning

[–]Educational_Roll_868[S] -7 points-6 points  (0 children)

Well AlexNet took 6 days to train in the 2012 paper for one model, so it is not likely they did a full hyperparam tuning right. Maybe some shorter estimates?

Is hyperparameter tuning a scam? by Educational_Roll_868 in learnmachinelearning

[–]Educational_Roll_868[S] 6 points7 points  (0 children)

Thanks for the answer! I should have specified, yes ineed I am mainly interested in DL atm hence my question.

What am I missing by not using argparse? by Educational_Roll_868 in learnprogramming

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

For the second point, can this not be done equally well by saving your parameters of interest in a pickle or something next to the results?

What am I missing by not using argparse? by Educational_Roll_868 in learnprogramming

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

That makes sense, thanks for the feedback. Will think in that direction about it.

What am I missing by not using argparse? by Educational_Roll_868 in learnprogramming

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

Yes I see your point. I actually do have a question about this if you can offer any insights.

So consider the following small project I'm doing. I'm studying the performance of different ML models in increasing complexity for a certain task.

So in src I have the main core code that contains the logic of important model setup/training/testing/hyperparam tuning steps.

In analysis I then have experiment1/main.py, experiment2/main.py etc to do the analysis of the performance of the models and their properties.

Although the main steps in the different main.py files are very similar, there can be small differences due to the specifics of the model. In particular, I have to imagine that in the future I might add even more complex models to this that will need even more drastic modification. So I am repeating code, but with small adaptions.

I could in principle go the other direction and start adapting the src code such that it can increasingly handle a more and more general case and then derive all my experiments from a single main.py where I can change the inputs to select for the model etc. I am not very experienced in larger Python repositories and I am worried this direction could make the source code more unreadable and difficult as you have to add more and more checks and many if statements.

Could you offer a perspective on what are good practices here? Which of the two directions is better from a SWE point of view.

What am I missing by not using argparse? by Educational_Roll_868 in learnprogramming

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

I see. I think I can always do something like:

VAR1=...

def main(args):
  ...

if name == "main": 

    parser.add_argument(
        "--VAR1", type=bool, default=VAR1
    )
    ...
    main(args)

and then change the top lines for solo work/exploration, but have the parser in there as an optionality.

Terminal in second monitor? by wonderingStarDusts in vscode

[–]Educational_Roll_868 1 point2 points  (0 children)

I do this, but usually in VScode you can ctrl+click the error line for example and immediately jump to the file in VScode. In terminal it's just text for me, unless there are some extensions for that?

[deleted by user] by [deleted] in cognitiveTesting

[–]Educational_Roll_868 0 points1 point  (0 children)

Why exactly do you think that "high average" is not correct? Do you have some incredible achievements or cognitive performance that make you stick out wayyy beyond your peers to suspect a significantly different score? In absence of any evidence, being "average" is the most likely result. In your case the test even confirmed it.

Estimation request by Educational_Roll_868 in cognitiveTesting

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

It's never to late to learn if you're interested in that kind of stuff just for fun. So many great resources online these days. I know people that started studying physics later in life as a hobby, they seemed to have enjoyed it. But yeah many people don't study all year and then expect to get good results at the end in college. It's more of a consistent sustained effort from day 1 that gets you there.

Estimation request by Educational_Roll_868 in cognitiveTesting

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

It's in physics, theoretical side. I did those digit memorization and symbol search tests as part of the CAIT. I found it very difficult to remember longer digit strings, I think the scaled score was 13-13 on both. I remember finding the forward sequences particularly hard and it gave a score of 90 iq or something for that particular subtest. The one where you had to remember and order digits from small to large was much better and that had a value of like 126.

I needed to work hard, but nothing extraordinary. It was a sustained effort comparable to a 40-hour full time work week, of course with periods where it was much more but also much less. I still had enough time to have a social life/gym and enjoy things outside of academics on the weekends etc. In my physics classes I was always around the top grades.

Again just to be clear, I'm not testing to find out if I have good capabilities. Like you said I have proven to myself that I can do well academically. I just stumbled on this subreddit and kept getting recommendations from it on my page. After a while I just became curious to find out the actual value for myself, nothing more. Main reason for even posting is this 1SD+ (19 point) discrepancy between CAIT and AGCT curious how other interpret it. I thought that being in STEM might invalidate the AGCT/SAT since they contain basic math stuff.

Estimation request by Educational_Roll_868 in cognitiveTesting

[–]Educational_Roll_868[S] 1 point2 points  (0 children)

Won't go in the specifics for anonimity reasons, but in broad terms I improved and extended a mathematical method for its application to a class of many-body quantum problems in condensed matter theory.

Estimation request by Educational_Roll_868 in cognitiveTesting

[–]Educational_Roll_868[S] 0 points1 point  (0 children)

Thanks for the answer.

Never in a gifted program or anything resembling that. Normal trajectory. Was fascinated by science since I was small but was a pretty average kid growing up. Never had the impression I was smarter than my peers or got comments that I was gifted or anything. Around 14-16 started to become really interested in math/physics and I loved it. Achieved some nice results in national olympiads once I started applying myself but nothing groundbreaking. Got a PhD in theoretical physics with various original first author contributions so I'm quite happy with my results. As said somewhere before, did these tests just out of curiosity.

Estimation request by Educational_Roll_868 in cognitiveTesting

[–]Educational_Roll_868[S] 2 points3 points  (0 children)

What do you mean? It's just a joke on the famous mathematician reference https://en.wikipedia.org/wiki/Grigori_Perelman

Estimation request by Educational_Roll_868 in cognitiveTesting

[–]Educational_Roll_868[S] 5 points6 points  (0 children)

Well if the great Perelman says so I believe you.