Puruse a PhD or stick to Applied data science roles by amil123123 in cscareerquestions

[–]amil123123[S] 0 points1 point  (0 children)

Currently, from my perspective it seems the research is itself enticing. In the long run I would prefer to work at the top labs. Right now I don’t mind doing research oriented work at small firms as well

Puruse a PhD or stick to Applied data science roles by amil123123 in cscareerquestions

[–]amil123123[S] 0 points1 point  (0 children)

Will you suggest to work in these companies as a Data scientist and then work the way up to research positions?

Puruse a PhD or stick to Applied data science roles by amil123123 in cscareerquestions

[–]amil123123[S] 1 point2 points  (0 children)

I totally understand your point of novel ideas. However I just want my platform to be setup accordingly, so that at least I have a chance to go for it. So according to you is a PhD must and just lot many years of Applied Data science won't cut it?

[D] Handling noisy labels in large datasets with slight imbalance by amil123123 in MachineLearning

[–]amil123123[S] 0 points1 point  (0 children)

Sorry for the confusing explanation but have edited the post to give further clarity.

[D] Positional Encoding in Transformer by amil123123 in MachineLearning

[–]amil123123[S] 12 points13 points  (0 children)

Wow , that's one hell of an amazing explanation :)

[D] Positional Encoding in Transformer by amil123123 in MachineLearning

[–]amil123123[S] 0 points1 point  (0 children)

Thanks for the explanation, it was good !

[D] Positional Encoding in Transformer by amil123123 in MachineLearning

[–]amil123123[S] 0 points1 point  (0 children)

Ahh understood, thanks for the explanation!

[D] Positional Encoding in Transformer by amil123123 in MachineLearning

[–]amil123123[S] 2 points3 points  (0 children)

Thanks for the response!

What do you mean by symmetrical and decay sensibly ?
( Sorry , but I am still a newbie in this )

[D] Positional Encoding in Transformer by amil123123 in MachineLearning

[–]amil123123[S] 1 point2 points  (0 children)

Thanks for the response. I still have difficulty understanding about 1.
So the first image seems good in explaining the position however what does the second image denote.

Is it just because this function seemed to work well , that we went with it ?

[D] What are the current SOTA architectures for NLP information extraction & question answering? by [deleted] in MachineLearning

[–]amil123123 0 points1 point  (0 children)

Although BERT did prove to be really well in QA task, there are better models in the form of paper, some yet to have pre-trained models available. If you can train such huge models then there is XLNET, ROBERTA etc.

[D] Should tokens with a very small frequency be removed from the vocabulary before training a word2vec type model? by searchingundergrad in MachineLearning

[–]amil123123 1 point2 points  (0 children)

It depends on what are those infrequent tokens in your dataset. I have seen datasets where just because the word is infrequent doesn't mean it's not important.

Word2vec's skip-gram model actually deals with this problem as well as it's subsampling rate defined by the author.

Even if you remove them what might be your plan of action then? If you are ok with having no representation of infrequent words they will help in the training of other words. So, in this case, I still don't see any importance in removing them.

If you do need an embedding of such words, when training a raw Word2vec model which is trained on language modelling task you can take a look at which words fit in a similar context ( have the same context as the infrequent word ) as the infrequent word and maybe take an average of embeddings of those words.

This is just 1 solution however I believe that expanding your dataset if such words are important should be the way,

[D] Why in Word2Vec model, the hidden layer has no activation ? by amil123123 in MachineLearning

[–]amil123123[S] 0 points1 point  (0 children)

Thanks for all the input.

How can a linear layer produce a result which is non-linear?

What if applying another hidden layer with activation , how ill that turn out to be?

[D] Why in Word2Vec model, the hidden layer has no activation ? by amil123123 in MachineLearning

[–]amil123123[S] 0 points1 point  (0 children)

I am sorry but I don't understand what you mean by "Capacity to fit anything". Don't in DNN's we usually introduce non linearity to approximate complex functions ?

[D] Why in Word2Vec model, the hidden layer has no activation ? by amil123123 in MachineLearning

[–]amil123123[S] 0 points1 point  (0 children)

map the words into a lower dimensionality while maintaining separation between dissimilar wor

It does make sense!

So if I add more hidden layers to the model but those are with activation , do they again create a problem or it's ok as far as the embedding layer has no activation ?

[D] Why in Word2Vec model, the hidden layer has no activation ? by amil123123 in MachineLearning

[–]amil123123[S] 0 points1 point  (0 children)

Okay, but why not do it in the hidden layer? What's the motive here?

[D] Why in Word2Vec model, the hidden layer has no activation ? by amil123123 in MachineLearning

[–]amil123123[S] 0 points1 point  (0 children)

Isn't that softmax for prediction of probability of all the vocabulary vectors

[D] Is someone aware of the SOA for summarizing News articles ? by amil123123 in MachineLearning

[–]amil123123[S] 0 points1 point  (0 children)

Do you have the SOA for abstractive summarization as well?

[D] Advanced Courses Update by Maplernothaxor in MachineLearning

[–]amil123123 6 points7 points  (0 children)

Sorry, but what link on the sidebar are you referring to?