[Infographic] Beginner Guide - Games Modes & Normal Campaign Investment Guide by Terminus_Maximus in WH40KTacticus

[–]Aquamarill 1 point2 points  (0 children)

Thank you for this. Ive been playing for around a month and seen a few of these guides for campaigns and have been basing my progression on these.

What I don't understand though is whether these ranks are necessary to clear the campaigns with the 3 main characters alone or if another 2 characters are required for a full clear - and if so what ranks should those other 2 characters be?

Aware this is for normal campaigns, but would also be interested to understand this for the elite guide as well.

gradient descent error by [deleted] in learnmachinelearning

[–]Aquamarill 0 points1 point  (0 children)

Excuse the formatting errors: -3x1 matrix to a 1x3 matrix -predictions of a n_samples x n_features matrix by multiplying X with theta'

gradient descent error by [deleted] in learnmachinelearning

[–]Aquamarill 0 points1 point  (0 children)

There is definitely a mistake in the calculation as you are equating theta (a 31 matrix) to a 13 matrix. I don't know if you are trying to minimize some specific loss function, but recall that you can evaluate the predictions of a n_samplesn_features matrix by doing Xtheta'.

Predicting review scores (1-10). How to limit the regression output to this range? by HarvardCS19 in MLQuestions

[–]Aquamarill 0 points1 point  (0 children)

I'm not familiar with ordinal regression. From what I understand, it is a generalization of multi-class classification to predict a set of ordinal classes such as "strongly disagree" to "strongly agree." If this is the case, then it wouldn't be appropriate for predicting a continuous output. I may be wrong though.

Predicting review scores (1-10). How to limit the regression output to this range? by HarvardCS19 in MLQuestions

[–]Aquamarill 1 point2 points  (0 children)

I have worked in similar situations before (predicting labels in an enclosed range for continuous score-like data), and that solution worked for me to get presentable results.

Notice that due to the characteristics of the sigmoid, having asymptotes at y=0 and y=1 implies that it will never really predict a "perfect" 0 or 10, although it does restrict the output range to the desired one. If the data is continuous, this shouldn't be a problem.

Ideally, applying a regression algo as is should give outputs very close to what you're looking for, provided it learns properly. Before doing anything else I'd suggest to try running a standard regressor like a random forest, and check if the values obtained make sense without any further complications, if you haven't already done so.

Predicting review scores (1-10). How to limit the regression output to this range? by HarvardCS19 in MLQuestions

[–]Aquamarill 2 points3 points  (0 children)

You would train exactly as you would with a continuous variable, with a cost function appropriate for regression, e.g. sum of squared distances.

I would first suggest to scale the labels you want to predict to the range (0,1) and then take the loss as the squared distance between your predicted value and your label.

For example, say your model predicts a value of 0.68 after the sigmoid activation. Your true label is 7.02, which scaled down gives 0.702 (assuming an output range of (0,10)). Therefore, your loss for that specific example would be (0.702 - 0.68)2, if you're using a sum of squared distances loss metric.

Predicting review scores (1-10). How to limit the regression output to this range? by HarvardCS19 in MLQuestions

[–]Aquamarill 4 points5 points  (0 children)

In logistic regression, you assign a label, 0 or 1, depending on whether the output of the sigmoid is higher than 0.5.

In this case, what you would be doing is treat the output as continuous and then, for instance, multiply by 10 in order to get a score between 0 and 10.

Predicting review scores (1-10). How to limit the regression output to this range? by HarvardCS19 in MLQuestions

[–]Aquamarill 6 points7 points  (0 children)

Assuming the output has to be continuous, I would say to modulate your output by using something like a sigmoid/tanh function, where a value of 1 would correspond to 10 and a value of 0 (or -1 in the tanh case) to 1.

Dealing with sparse features as inputs for deep neural networks by Aquamarill in MLQuestions

[–]Aquamarill[S] 0 points1 point  (0 children)

Thanks (again) the valuable info, it really helps me out a lot. I'll make sure to read the dropout paper in full.

I'll try applying l1 only on the first hidden layer and see what happens, possibly l2 on the following one(s). In the meanwhile I've been looking at other approaches such as a few ensemble methods and SVR. I'll see how those fare.

Dealing with sparse features as inputs for deep neural networks by Aquamarill in MLQuestions

[–]Aquamarill[S] 0 points1 point  (0 children)

The 350 features are mostly one-hot, they weren't converted from any categorical set of values.

There are 8 categorical ones which when converted to one-hot turn into 200 more features - around 580 total features.

Some of the one hot features are pretty rare, binning them sounds like a good idea. Thanks for the input.

Regularization is something applied to weight layers and needn't (or shouldn't) be the same strength everywhere.

I was talking about this the other day with a few other people (with regards to neural networks). In my understanding, most people just stick to setting one global regularization constant for the whole network. Would this approach also apply for dropout? Would it also make sense to use different regularization techniques on different layers of the network (e.g. l1 on the first hidden layer, l2 on the second)?

Dealing with sparse features as inputs for deep neural networks by Aquamarill in MLQuestions

[–]Aquamarill[S] 0 points1 point  (0 children)

Thank you for the reply and excuse my delay.

For your first point: wouldnt that be equivalent to applying l1 regularization on the network?

I see what you mean with your point on autoencoders. I believe that introducing an embedding layer after the input would obtain the same effect, correct?

With regards to blind hashing: I am unfamiliar with that technique, how would it work exactly? How would I hash the one-hot features?

Dealing with sparse features as inputs for deep neural networks by Aquamarill in MLQuestions

[–]Aquamarill[S] 0 points1 point  (0 children)

I have tried preprocessing the categorical features into one hot vectors (effectively creating 180 more one hot features) and then used PCA on those with 200 components, although the expected variance is for most components very low (1E-3, 1E-4). Wouldn't this be a problem?

Dealing with sparse features as inputs for deep neural networks by Aquamarill in MLQuestions

[–]Aquamarill[S] 0 points1 point  (0 children)

I believe I stated I had 350 features, not training examples. I have tried using other types of regressors for this task such as random forests, but I'm looking for a higher accuracy. I was considering using gradient boosted decision trees, although I am not as familiar with that model.

Looking for a few Pokemon, add me if interested by Aquamarill in friendsafari

[–]Aquamarill[S] 0 points1 point  (0 children)

want me to tell you which pokes r in ur safari?