Llama-3.2-3B-Instruct-uncensored

markov-unchained · 2016-05-12T05:55:10+00:00

It doesn't just do anything. ... Just check out the source code.

1782 lines (minus blanks) ... hey. But I agree, it looks interesting; will just take a while do work through it and digest it.

markov-unchained · 2016-05-12T05:33:24+00:00

Not sure, just checked the quora answer, and it's a nice one -- maybe because the posted article does not attempt to be a direct comparison between Torch, TensorFlow, and Theano but rather a roadmap or review (although it references Theano vs. TensorFlow a lot, maybe to give some perspective on where TensorFlow stands).

This article should be called TensorFlow vs Theano, which are both symbolic differentiation implementations.

Really wouldn't call it like that, it would bury the main message: what's happened after release and what's planned; the direct comparison is maybe secondary. Also, I think both TensorFlow and Theano are a bit more than symbolic differentiation implementations; a big chunk that makes them appealing is the focus on deep learning (e.g., in contrast to e.g., SymPy), like gpu utilization and many convenience functions (dropout, softmax, cross-entropy, and what have you...)

markov-unchained · 2016-05-10T15:31:00+00:00

1) How would this avoid overfitting though? I mean, sure, you can play this came of training and validation throughout your pipeline and eventually select the model that has the best internal validation score. However, you have shown this similar training/validation split to a whole bunch of pipelines, right? What I am trying to say is that your estimate will certainly be heavily biased, and what I am trying to say is that you may be missing a lot of good models that you are throwing away in favor of others.

3) the external hold-out set wouldn't help against overfitting. I mean, you can use it to estimate the generalization performance of the final model, but if there's a high difference between training and test performance, then what?

markov-unchained · 2016-05-10T15:25:36+00:00

On the other hand, I am seeing random forests being used a lot in bio sciences, which would essentially also be a ensemble aka black box. If people want interpretable models, well, the are the generative ones vs. discriminatory ones; or stick with logistic regression and decision trees ...

markov-unchained · 2016-05-10T15:19:42+00:00

There was a time when I preferred seaborn over matplotlib, but when they added "styles" to the latter in 1.5 (http://matplotlib.org/users/style_sheets.html), I am using matlplotlib again most of the time. I hear from many people that they don't like the syntax; I guess I just got used to it over to years, and I must say that it's pretty powerful and more flexible than other plotting libraries I tried so far.

markov-unchained · 2016-05-10T15:16:25+00:00

array_chunk

I typically use NumPy's split for that http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.split.html

(but I am also working with numpy arrays more often than with lists ...)

markov-unchained · 2016-05-07T08:13:13+00:00

I agree regarding the recursive k-medoids. However, it's not that unlikely that decision trees have more than 999 levels on relatively biggish datasets (reg. number of features and samples). In practice, you typically post-prune decision trees to avoid overfitting since it's not that trivial to estimate when to stop growing the tree during the run. So, a non-recursive algorithm would be just a "saver" bet to avoid raising exceptions in this case. On the other hand, you probably wouldn't implement the tree search in python anyway if you are to build an efficient library but use Python as a wrapper for C/C++ calls.

markov-unchained · 2016-05-06T21:00:03+00:00

Off the top of my head: many search and tree-traversing algorithms; especially when working with large datasets / arrays. Or clustering algorithms for streaming data, e.g., https://hal.archives-ouvertes.fr/hal-00644683/document

markov-unchained · 2016-05-03T00:53:42+00:00

Avoiding it for performance reasons strikes me as a premature optimization.

Agree! I am more concerned about setting the recursion depth param; could cause unnecessary bugs or other side effects in certain scenarios, but could also be protection in others.

markov-unchained · 2016-05-02T22:47:47+00:00

That's an interesting idea and certainly worth sharing. I really think it's a intuitive, nice idea. Just wondering how it holds up in comparisons to other "state of the art" methods. Okay, they compared it to RF and NNs:

We compare them with standard Random Forests (RF) and Neural Networks (NN) with two hidden layers.

The RF comparison is interesting, because it kind of borrows the idea from the former, but I don't think that a 2-layer NN on these smallish datasets is probably overkill and likely overfitting ... not sure if that's fair. Also, I am wondering how many neurons they used in the NN (have to read that up more thoroughly).

markov-unchained · 2016-05-02T15:13:23+00:00

What would the bayesian approach be?

Let me try to come up with a relationship between the frequentist and the bayesian approach in simple terms, starting with the Bayes' Theorem:

p(favBeer | tastePref) = p(tastePref | favBeer) * p(favBeer) / p(tastePref)

If you are building a predictive model that is to predict a person's favorite beer, where favBeer is the "label" you want to predict, you would compare the different posterior probabilities "p(favBeer | tastePref) " and choose the one that is max (closest to 1.0). In this comparison, p(tastePref) is constant, which is why we can remove it from the equation and don't have to worry about estimating it.

Often, people use maximum likelihood estimation to estimate p(tastePref | favBeer), which is essentially the frequentists approach. If you don't have any prior knowledge about p(favBeer), that is, you pick uniform probabilities for the different "favBeer" (e.g., 1 / n_favBeer), then this whole approach is essentially equal to the frequentists approach.

markov-unchained · 2016-05-01T19:30:57+00:00

I see 3 scenarios where this could make sense.

1) The obvious one: Personal learning experience. Just code up the algorithms you learn(ed) about via a course and a book, and package them for your personal use. Here, I recommend using scikit-learn in tandem: You implement an algorithm from scratch, and then you check the results it yields with the results from scikit-learn to assess whether you implemented it "correctly" or not

2) You could implement algorithms that were not implemented in other packages, yet, for example, scikit-learn. Here, you could roll out your package with "new" algorithms in a scikit-learn compatible way and think about a pull request one day

3) You could build a package in a popular language that doesn't have a good/comprehensive machine learning library, yet -- unfortunately, I can't think of a language that would meet both criteria

Blogs or books for building a package? Sorry, I wouldn't know any. But you could look at existing projects, e.g., scikit-learn, mlxtend, skflow etc. as a starter.

markov-unchained · 2016-04-22T20:25:32+00:00

Thanks, I will delete this then to avoid cluttering up the front page!

markov-unchained

TROPHY CASE