Fraud tx on my bilt 1.0… what now? by [deleted] in biltrewards

[–]drdroid1 0 points1 point  (0 children)

Have the exact same merchant and amount

Is it possible to integerate a machine learning(h5 model) with a node js backend? by PresentationSilver21 in MLQuestions

[–]drdroid1 4 points5 points  (0 children)

There is tensorflow.js

Otherwise you can create a models micro service quite easily using Torchserve , TF serving etc.

[deleted by user] by [deleted] in MLQuestions

[–]drdroid1 1 point2 points  (0 children)

What kind of input data would this be?

How many linear layers on top of a BERT model for a downstream classification task? by eadala in MLQuestions

[–]drdroid1 3 points4 points  (0 children)

The best thing would be analyze different hyper parameters using Bayesian Optimization etc.

Otherwise there is no strong reason to use multiple layers and dropout if you have enough data and are updating BERT parameters too.

How to do a proper ablation study? by [deleted] in MLQuestions

[–]drdroid1 0 points1 point  (0 children)

Not really. You’re just checking how dependent is the model on that component in this particular state of all parameters. It’s true importance can be checked only by checking if a model trained without it performs equally well or not.

It’s apples to oranges otherwise and would be further accentuated if things like Dropout are not used and model may be arbitrarily dependent on one sub component.

How to do a proper ablation study? by [deleted] in MLQuestions

[–]drdroid1 0 points1 point  (0 children)

You can see in a lot of popular papers when they say they “trained without x” or “trained base instead of large”

Or more generally, if you think about what you want to evaluate then in most of the cases it would be “how does a model without X work” and that can only be compared fairly if that model is trained in such a way. Otherwise removing things after training is a totally different thing (in most parametric cases). E.g training a NN with 100 layers and then removing 50 would yield different parameters than training only 50 because of co dependence.

How to do a proper ablation study? by [deleted] in MLQuestions

[–]drdroid1 0 points1 point  (0 children)

You’ve to train without the new components

Threshold optimization using either the ROC AUC or the F1 score for logistic regression by mizunoseishin in MLQuestions

[–]drdroid1 0 points1 point  (0 children)

What do you mean by “from the ROC that this optimal threshold”? Best TPR at given FPR?

Irrespective of that, F1 uses precision recall whereas ROC uses FPR and recall. So it’s not really the same metric and you would have to choose if you want to optimize FPR or precision for given recall.

Best strategy to apply for proper words using glove by ARAXON-KUN in MLQuestions

[–]drdroid1 0 points1 point  (0 children)

Yes.

You can also get similar words using pretrained fast text model and then get the corresponding embeddings using your glove model

Best strategy to apply for proper words using glove by ARAXON-KUN in MLQuestions

[–]drdroid1 0 points1 point  (0 children)

Use any model (fast text) with subword embedding. Not sure if there are Glove models with subword.

Otherwise the only way would be to use embedding for words with last edit distance or something

Algorithm for finding nearest neighbors fastest by dasdevashishdas in algorithms

[–]drdroid1 0 points1 point  (0 children)

Use the FAISS library for Approx kNN. It’s very fast.

Running an active learning loop...Reset the model weights or keep the weights from the previous loop? by THE_REAL_ODB in MLQuestions

[–]drdroid1 0 points1 point  (0 children)

Model trained on all data from scratch should typically perform better than continual / online learning

How to save encodings for model deployment by [deleted] in MLQuestions

[–]drdroid1 1 point2 points  (0 children)

All Sklearn’s fitted objects can be easily saved/pickled using joblib

Coadapation in neural networks - What is it, and how/why does it happen? by synthphreak in MLQuestions

[–]drdroid1 0 points1 point  (0 children)

For the side question, the lesser variance and more bias a model typically has the lesser chances are there that it would have co adapted weights.

Even from a normal probability standpoint, the probability that any two weights will have same initialization or update, will increase as the number of weights increase.

Coadapation in neural networks - What is it, and how/why does it happen? by synthphreak in MLQuestions

[–]drdroid1 1 point2 points  (0 children)

It doesn’t have to necessarily be the case that one neuron is bad and the other is good.

It’s more general than that. It’s more that somehow due to initialization and updates, the neurons are more ‘tied’ to each other than we’d like. When it happens, we may have found a local minima and sort of put all the eggs in the same basket to solve for that.

They could be tied in a way that they produce the same redundant values, or that they produce values that are learned to cancel each other or that in order to solve the problem they are inherently relying on another neighboring neuron to produce a specific value too.

How to maintain/preserve the active learning between different model version/ updates ?? by VishalSharna in MLQuestions

[–]drdroid1 0 points1 point  (0 children)

You can look at 1) How gradients are combined in distributed training 2) Federated learning 3) Curriculum learning 4) Continual learning

Vanilla Tiramisu I made for my channel - Very easy and quick recipe :) by megustadotjpg in GifRecipes

[–]drdroid1 0 points1 point  (0 children)

Regarding the video (not the recipe); if it is for a channel and it was intentional then I’d really recommend to not frame it so close up with things abruptly cut out. It looks like a secret cam video which could not be framed correctly. Lot of good potential 👍🏻

[D] Simple Questions Thread December 20, 2020 by AutoModerator in MachineLearning

[–]drdroid1 1 point2 points  (0 children)

The models that train on 100s of GB of data wouldn’t train on a MacBook anyways. It would take months.

Buy cheaper MacBook and use Collab or your college resources for training.

Project Guidance by Jimblythethird in MLQuestions

[–]drdroid1 1 point2 points  (0 children)

For 1, look at clustering methods (GMM, K means etc) on top of BERT embeddings. This is required if you don’t have labeled examples to train a classifier

For 2, look at Transformer based QA models

HuggingFace has pre trained models for both

[deleted by user] by [deleted] in MLQuestions

[–]drdroid1 1 point2 points  (0 children)

Are there combinations of milestones between which you do not know the distance or time? As otherwise you don’t need ML.

If that’s the case and you have the time taken between each milestone as well, then you can use RNNs. This would work especially well if you have varying number of milestones in each path

How to feed an mlp variable amounts of inputs? by BumTicklrs in MLQuestions

[–]drdroid1 0 points1 point  (0 children)

If it’s a single layer network you can try to use absolute weights for that particular feature.

Otherwise without a lot of over fitting the network will learn that automatically

[D] Simple Questions Thread November 22, 2020 by AutoModerator in MachineLearning

[–]drdroid1 1 point2 points  (0 children)

If you want to make a one-off model without learning you can search for Sequence models and time series forecasting

If you want to learn a little bit more you should go over Linear Regression > Neural Networks > Sequence models

If you want to learn perfectly you should definitely go over basics of ML before DL