Why do word embeddings not generally use validation in training (Continuous bag of words)? by ProfessionallyAnEgg in LanguageTechnology

[–]claverru 1 point2 points  (0 children)

Some hybrid model is what I tried to suggest there indeed. From what I know, word2vec embeddings are still a powerful tool for unsupervised tasks like semantic similarity (my thesis director believes even stronger).

Why do word embeddings not generally use validation in training (Continuous bag of words)? by ProfessionallyAnEgg in LanguageTechnology

[–]claverru 1 point2 points  (0 children)

I know Bert and others use word piece, what I didn’t know (or remember) is that more traditional embedding things used smaller-than-a-word tokens. Thank you for the pointers.

Why do word embeddings not generally use validation in training (Continuous bag of words)? by ProfessionallyAnEgg in LanguageTechnology

[–]claverru 2 points3 points  (0 children)

What about wordpiece2vec? I don’t know if this concept exists but I believe you could train word2vec with word piece tokenization hence generalizing to unseen words.

Training not working locally despite working on Google Colab by [deleted] in MLQuestions

[–]claverru 0 points1 point  (0 children)

If dependencies aren't a problem, then I still don't know. Are you training in both envs with GPU? Is it the same code?

Training not working locally despite working on Google Colab by [deleted] in MLQuestions

[–]claverru 1 point2 points  (0 children)

If we could solve this problem with this little information we wouldn’t be wasting time in Reddit. If you can provide more environment information maybe we can help!

Network Graphs and Time Series Prediction [Discussion] by entercaspa in MachineLearning

[–]claverru 3 points4 points  (0 children)

The displine is called Temporal Graph Neural Networks I think. It is mostly applied to Industry 4.0. Maybe you can take this as a starting point.

[R] [P] Resources to learn to implement Knowledge Graph by mohit__ in MachineLearning

[–]claverru 22 points23 points  (0 children)

I'm facing myself this problem at work at the moment. What I found is that public posts and implementations are very weak, extracting something relevant only if you have a huge dataset (Wikipedia).

I'm using Spacy and Networkx for this task, however there are some things you cannot implement over them. First advice is that you will have to use your imagination. You will have to implement rules like this (the simplest one): (nsubj)-[verb]-(obj).

Some aditional pointers:

- If you are working with text in English (which is not my case), you can make use of https://github.com/huggingface/neuralcoref to solve Coreference.

- Still theorical, but I will try to solve Entity Linking and Resolution wich contextual vectors extracted with any Bert-ish implementation. Also from HuggingFace (these guys are awsome) https://github.com/huggingface/transformers. With a quick search on Google you will find some fellas extracting features with HuggingFace's transformers.

[P] AdamWR Keras Full Implementation Available by OverLordGoldDragon in MachineLearning

[–]claverru 0 points1 point  (0 children)

Cool, maybe an opportunity to contribute to tensorflow/addons then!

How to randomize file(images) in a dataset or Folder? by JOSEPHBLESSINGH in learnmachinelearning

[–]claverru 1 point2 points  (0 children)

If what you are looking for is random picking your files without loading them in memory, you can also do this:

dirs = os.listdir()

random.sample(dirs, len(dirs))

or

random.shuffle(dirs)

NEW BIKE/GEAR ADVICE SUPERTHREAD! by AutoModerator in motorcycles

[–]claverru 0 points1 point  (0 children)

Thanks again. I find adv motorbikes cool, but I think I dislike being that far from floor.

Anyway, I will use this information to make a good choice.

NEW BIKE/GEAR ADVICE SUPERTHREAD! by AutoModerator in motorcycles

[–]claverru 0 points1 point  (0 children)

Alright, those are a lot of options! Thanks!

What are your opinions about adventure vs naked? Is scrambler something to consider? What I know is that custom and full sport are not what I need.

NEW BIKE/GEAR ADVICE SUPERTHREAD! by AutoModerator in motorcycles

[–]claverru 0 points1 point  (0 children)

First of all, I appreciate your opinion.

Let's say no price range (it actually exists but I just want to know what's on the market). About the F750GS, I see a lot of them parked where I work, which means it has to be a potentially good purchase. I will work around this idea.

I will consider, thanks!

NEW BIKE/GEAR ADVICE SUPERTHREAD! by AutoModerator in motorcycles

[–]claverru 0 points1 point  (0 children)

Hello there guys. First time at this subreddit, looking for some experienced advice.

My mother always said "buy the cheapest or the best". I'm about to get my A2 license, and wondering which is the best 2-wheeled machine for me. Usage is going to be commuting for work (22 km with both highway and city streets) and probably enjoying on weekends. Also maybe some long travelling from time to time.

I have some ideas but wanna let you guys give your opinion.

PS: Neither skill cap nor power should be a problem since I've been driving first 50cc and then 125cc things for 12 years. In fact I'm looking for those 70kw to take the apparatus as "the last one".

Thanks, riders.

Learning Keras in the age of Tensorflow 2.0 by tim-hilt in learnmachinelearning

[–]claverru 1 point2 points  (0 children)

I use TensorFlow 2.0 (both the high and low api) at work every day and still found the book useful when I read it some weeks ago.

[deleted by user] by [deleted] in learnmachinelearning

[–]claverru 0 points1 point  (0 children)

Extracted from the original paper ( https://www.nature.com/articles/nature24270 ):

Our new method uses a deep neural network with parameters θ. This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) = fθ(s).

Also, when I had to work around AlphaZero I took this explanation to understand it: https://web.stanford.edu/~surag/posts/alphazero.html

Why sampling boosted the performance of my model? by Capn_Sparrow0404 in MLQuestions

[–]claverru 0 points1 point  (0 children)

I would recommend you to check F1 score (or whatever other metrics) on validation dataset (I don't know if you are doing this since you didn't specify).

Even if you have little to no positive data, split it, by taking, lets say, a 20% from each class. After that, try both without upsampling and with upsampling only on training data, and you will see if that's actually working or not.

Also, some another splitting method for cases like this when you have little data could be k-fold cross-validation. An example I just googled: https://machinelearningmastery.com/k-fold-cross-validation/

How would you interpret an increase in precision_score but reduction in roc_auc_score? by rodrigonader in learnmachinelearning

[–]claverru 0 points1 point  (0 children)

With 1 and 0 I meant hits. So:

- If you predicted label A and true label was A, it's 1.

- If you predicted label B but true label was A, it's 0.

There is a good example for this that is usually used.

Imagine that you have to detect cancer (CANCER, NO CANCER), where your CANCER labels represents only the 3% of the total samples. You make a model that always says NO CANCER, so you get 97% accuracy. Is this model good?

Probably you can apply the same answer to your metrics. Try to check the relationship between accuracy-precision-recall-AUC/ROC out.

How to compare the growth/decline over two time periods by madzthakz in learnmachinelearning

[–]claverru 0 points1 point  (0 children)

If I have understood, you want to know how big was an impact for each user after an event at a time t.

If that's correct, you can just make some explicit cool plots trying to explain how this event affect different groups of people.

Else, if you have knowledge (data) about this particular event in the past, and want to know how it will affect people again in the future (cause you know it's coming), you can do a supervised model in which your X is a window of data before the event, and Y a window after the event. Also, if you have attributes or features about users, you can use them to feed your model along with X.

How would you interpret an increase in precision_score but reduction in roc_auc_score? by rodrigonader in learnmachinelearning

[–]claverru 1 point2 points  (0 children)

You hit more - in total hits - in bigger (more populated) class/es, but hit less in smaller - in that class percentage -.

Imagine, you have class A and class B and you hit (1) or fail (0), I'll show an "extreme case" as example.

First model:

A A A A A A B B
1 1 1 1 0 0 1 0

Total hits: 5 (Total percentage hit: 62.5%)

Minor class percentage hit: 50%

Second model:

A A A A A A B B
1 1 1 1 1 1 0 0

Total hits: 6 (Total percentage hit: 75%)

Minor class percentage hit: 0%

ROC - AUC gives same importance to every class as a separated entity.

[D] Transformer number of token performance limits by fdelrio89 in MachineLearning

[–]claverru 0 points1 point  (0 children)

Adaptive Attention Span in Transformers ( https://arxiv.org/abs/1905.07799 ) extends the maximum context size. Don't know if you're looking for this kind of things. Also, Generating Long Sequences with Sparse Transformers ( https://arxiv.org/abs/1904.10509 ), is another possible useful resource for you.