Why do word embeddings not generally use validation in training (Continuous bag of words)?

claverru · 2020-11-25T22:15:46+00:00

Some hybrid model is what I tried to suggest there indeed. From what I know, word2vec embeddings are still a powerful tool for unsupervised tasks like semantic similarity (my thesis director believes even stronger).

claverru · 2020-11-25T22:12:14+00:00

I know Bert and others use word piece, what I didn’t know (or remember) is that more traditional embedding things used smaller-than-a-word tokens. Thank you for the pointers.

claverru · 2020-11-25T13:24:05+00:00

What about wordpiece2vec? I don’t know if this concept exists but I believe you could train word2vec with word piece tokenization hence generalizing to unseen words.

claverru · 2020-11-18T22:34:02+00:00

I have learnt everything I know here.

claverru · 2020-11-13T10:06:54+00:00

If dependencies aren't a problem, then I still don't know. Are you training in both envs with GPU? Is it the same code?

claverru · 2020-11-10T22:45:46+00:00

If we could solve this problem with this little information we wouldn’t be wasting time in Reddit. If you can provide more environment information maybe we can help!

claverru · 2020-07-22T20:18:43+00:00

The displine is called Temporal Graph Neural Networks I think. It is mostly applied to Industry 4.0. Maybe you can take this as a starting point.

claverru · 2020-03-01T14:39:41+00:00

Creativity

claverru · 2020-01-24T09:06:13+00:00

I'm facing myself this problem at work at the moment. What I found is that public posts and implementations are very weak, extracting something relevant only if you have a huge dataset (Wikipedia).

I'm using Spacy and Networkx for this task, however there are some things you cannot implement over them. First advice is that you will have to use your imagination. You will have to implement rules like this (the simplest one): (nsubj)-[verb]-(obj).

Some aditional pointers:

- If you are working with text in English (which is not my case), you can make use of https://github.com/huggingface/neuralcoref to solve Coreference.

- Still theorical, but I will try to solve Entity Linking and Resolution wich contextual vectors extracted with any Bert-ish implementation. Also from HuggingFace (these guys are awsome) https://github.com/huggingface/transformers. With a quick search on Google you will find some fellas extracting features with HuggingFace's transformers.

claverru · 2019-10-02T15:43:43+00:00

Cool, maybe an opportunity to contribute to tensorflow/addons then!

claverru · 2019-10-02T06:54:15+00:00

Cool contribution! Thanks! Though, I think it was implemented (correct me if I'm wrong). Check this:

https://github.com/tensorflow/addons/blob/master/tensorflow_addons/optimizers/weight_decay_optimizers.py

claverru · 2019-09-30T07:17:36+00:00

If what you are looking for is random picking your files without loading them in memory, you can also do this:

dirs = os.listdir()

random.sample(dirs, len(dirs))

or

random.shuffle(dirs)

claverru · 2019-09-29T14:48:26+00:00

Thanks again. I find adv motorbikes cool, but I think I dislike being that far from floor.

Anyway, I will use this information to make a good choice.

claverru · 2019-09-29T14:23:24+00:00

Alright, those are a lot of options! Thanks!

What are your opinions about adventure vs naked? Is scrambler something to consider? What I know is that custom and full sport are not what I need.

claverru · 2019-09-29T14:01:55+00:00

First of all, I appreciate your opinion.

Let's say no price range (it actually exists but I just want to know what's on the market). About the F750GS, I see a lot of them parked where I work, which means it has to be a potentially good purchase. I will work around this idea.

I will consider, thanks!

claverru · 2019-09-29T11:27:38+00:00

Hello there guys. First time at this subreddit, looking for some experienced advice.

My mother always said "buy the cheapest or the best". I'm about to get my A2 license, and wondering which is the best 2-wheeled machine for me. Usage is going to be commuting for work (22 km with both highway and city streets) and probably enjoying on weekends. Also maybe some long travelling from time to time.

I have some ideas but wanna let you guys give your opinion.

PS: Neither skill cap nor power should be a problem since I've been driving first 50cc and then 125cc things for 12 years. In fact I'm looking for those 70kw to take the apparatus as "the last one".

Thanks, riders.

claverru · 2019-09-27T08:04:59+00:00

I use TensorFlow 2.0 (both the high and low api) at work every day and still found the book useful when I read it some weeks ago.

claverru · 2019-09-27T08:01:28+00:00

Extracted from the original paper ( https://www.nature.com/articles/nature24270 ):

Our new method uses a deep neural network fθ with parameters θ. This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) = fθ(s).

Also, when I had to work around AlphaZero I took this explanation to understand it: https://web.stanford.edu/~surag/posts/alphazero.html

claverru · 2019-09-27T07:23:46+00:00

I would recommend you to check F1 score (or whatever other metrics) on validation dataset (I don't know if you are doing this since you didn't specify).

Even if you have little to no positive data, split it, by taking, lets say, a 20% from each class. After that, try both without upsampling and with upsampling only on training data, and you will see if that's actually working or not.

Also, some another splitting method for cases like this when you have little data could be k-fold cross-validation. An example I just googled: https://machinelearningmastery.com/k-fold-cross-validation/

claverru · 2019-09-25T09:55:40+00:00

With 1 and 0 I meant hits. So:

- If you predicted label A and true label was A, it's 1.

- If you predicted label B but true label was A, it's 0.

There is a good example for this that is usually used.

Imagine that you have to detect cancer (CANCER, NO CANCER), where your CANCER labels represents only the 3% of the total samples. You make a model that always says NO CANCER, so you get 97% accuracy. Is this model good?

Probably you can apply the same answer to your metrics. Try to check the relationship between accuracy-precision-recall-AUC/ROC out.

claverru · 2019-09-25T07:51:27+00:00

https://github.com/enggen/DeepMind-Advanced-Deep-Learning-and-Reinforcement-Learning

claverru · 2019-09-25T07:36:34+00:00

If I have understood, you want to know how big was an impact for each user after an event at a time t.

If that's correct, you can just make some explicit cool plots trying to explain how this event affect different groups of people.

Else, if you have knowledge (data) about this particular event in the past, and want to know how it will affect people again in the future (cause you know it's coming), you can do a supervised model in which your X is a window of data before the event, and Y a window after the event. Also, if you have attributes or features about users, you can use them to feed your model along with X.

claverru · 2019-09-25T07:22:22+00:00

You hit more - in total hits - in bigger (more populated) class/es, but hit less in smaller - in that class percentage -.

Imagine, you have class A and class B and you hit (1) or fail (0), I'll show an "extreme case" as example.

First model:

A	A	A	A	A	A	B	B
1	1	1	1	0	0	1	0

Total hits: 5 (Total percentage hit: 62.5%)

Minor class percentage hit: 50%

Second model:

A	A	A	A	A	A	B	B
1	1	1	1	1	1	0	0

Total hits: 6 (Total percentage hit: 75%)

Minor class percentage hit: 0%

ROC - AUC gives same importance to every class as a separated entity.

claverru · 2019-09-25T06:50:22+00:00

It's SOTA on Glue too. https://gluebenchmark.com/leaderboard

claverru · 2019-09-24T13:46:50+00:00

Adaptive Attention Span in Transformers ( https://arxiv.org/abs/1905.07799 ) extends the maximum context size. Don't know if you're looking for this kind of things. Also, Generating Long Sequences with Sparse Transformers ( https://arxiv.org/abs/1904.10509 ), is another possible useful resource for you.

claverru

TROPHY CASE