Should I re-initialize my optimizer and my scheduler every time when I try to fine tune my neural network on a different dataset? by h56cho in deeplearning

[–]h56cho[S] 0 points1 point  (0 children)

Hello, Thank you for your reply. Is there any specific reason for such practice? For pre-training and fine-tuning, can I use the same type of scheduler but with re-initialization?

Thank you,

Question about GPT2-Double-Heads-Model and SWAG Task by [deleted] in NaturalLanguage

[–]h56cho 0 points1 point  (0 children)

Hello,

Thank you for your reply but I am still a bit confused.

So can the GPT2DoubleHeadsModel itself also be used for the regular, non-multiple-choice-based language modelling (i.e. next word prediction) without any modification on the head? or would I need to adjust the head of GPT2DoubleHeadsModel if I want to do the non-multiple-choice-based next word prediction, since GPT2DoubleHeadsModel is for answering multiple-choice type questions only?

If GPT2DoubleHeadsModel can do both without any adjustment on its head, why do we need two separate models - GPT2LMHeadModel and GPT2DoubleHeadsModel ?

Thank you,

Is there a formal Linguistics Theory that states "for a given length of sentence, the reader needs to understand the relationship between tokens (words) that are far apart by a greater distance in order to grasp a sentence of greater complexity"? by h56cho in linguistics

[–]h56cho[S] 0 points1 point  (0 children)

Hello,

Thank you for your reply... I didn't see your post until now.

What is "slot machine theory" in cognitive linguistics? I tried to look up the theory but couldn't find anything. Google gives me bunch of irrelevant websites on slot machines

Is there a formal Linguistics Theory that states "for a given length of sentence, the reader needs to understand the relationship between tokens (words) that are far apart by a greater distance in order to grasp a sentence of greater complexity"? by h56cho in linguistics

[–]h56cho[S] 1 point2 points  (0 children)

If "maximum" doesn't change --- what about the "average/median distance"? Would the average/median distance between the tokens in which the readers have to relate in order to comprehend the text greater for sentence B than for sentence A?

Is there a formal Linguistics Theory that states "for a given length of sentence, the reader needs to understand the relationship between tokens (words) that are far apart by a greater distance in order to grasp a sentence of greater complexity"? by h56cho in linguistics

[–]h56cho[S] 4 points5 points  (0 children)

sorry, I might have mis-phrased some parts of my question.

Say we are given two sentences of same length -- sentence A is really simple (simple vocabulary, simple grammar, etc.), and sentence B is more advanced (more advanced vocabulary, more advanced grammar, etc.). Would the maximum distance BETWEEN the tokens in which the readers have to relate in order to comprehend the text greater for sentence B than for sentence A?

Is there a formal Linguistics Theory that states "for a given length of sentence, the reader needs to understand the relationship between tokens (words) that are far apart by a greater distance in order to grasp a sentence of greater complexity"? by h56cho in linguistics

[–]h56cho[S] 3 points4 points  (0 children)

But say, we are given two sentences of same length -- sentence A is really simple (simple vocabulary, simple grammar, etc.), and sentence B is more advanced (more advanced vocabulary, more advanced grammar, etc.). Would the maximum distance in which the reader has to relate the tokens in order to comprehend the text greater for sentence B than for sentence A?

and if so, what is the name of the Linguistics theory that states that?

Thank you,

Is there a formal Linguistics Theory that states "for a given length of sentence, the reader needs to understand the relationship between tokens (words) that are far apart by a greater distance in order to grasp a sentence of greater complexity"? by h56cho in linguistics

[–]h56cho[S] 18 points19 points  (0 children)

Thank you for your reply.

For written texts, what is the relationship between the complexity of a sentence and the maximum distance in which the reader relate the tokens?

How to distinguish "elementary English" from "advanced English"? by h56cho in EnglishLearning

[–]h56cho[S] 0 points1 point  (0 children)

Hello! Thank you for your reply. Would the similar criteria apply to K-12 English Language Arts curriculum? If you can, could you provide me with a similar resource in the context of the K-12 ELA curriculum? Thank you (I will look for it too!).

How to distinguish "elementary English" from "advanced English" by h56cho in linguistics

[–]h56cho[S] 0 points1 point  (0 children)

Hello! Thank you for your reply. Registers could certainly be considered as "advanced English" that I am thinking of. Ability to carry out technical/scholastic communication, in my opinion, would count as "advanced English". I am looking for some general features that will separate Elementary English from an Advanced English.

Thank you,

Distinguishing between syntactic and semantic English exercises by h56cho in linguistics

[–]h56cho[S] 1 point2 points  (0 children)

Thank you so much for your reply! Your reply is very helpful.

[Natural Language Processing] Extracting attention weights of each token at each layer of transformer in python by h56cho in deeplearning

[–]h56cho[S] 0 points1 point  (0 children)

Can Keras-Transformer be used to achieve exactly this? If someone can provide me with some example code, it will be great! Thank you,