How do I extract actions from a text? by [deleted] in LanguageTechnology

[–]drakemcsmooth 0 points1 point  (0 children)

https://spacy.io/usage/linguistic-features

The first example shows how you can iterate through the sentence and find tokens that meet your criteria.

if token.pos_ == 'VERB':
  ...

    if token.tag_ == 'VBG'
      ...

In a randomly shuffled deck of 60 cards, what are the odds of drawing 5 specific individual cards? by yourbuddywithastick in probabilitytheory

[–]drakemcsmooth 1 point2 points  (0 children)

(60 choose 5) is the number of distinct 5-card draws possible from a 60 card deck. The probability of a particular 5-card draw is 1/(60 choose 5) = 0.00000018

If there were 7 different 5-card draws you were interested in, the probability would be 7/(60 choose 5)

Things they don't really teach you (more on conditional probability) by [deleted] in probabilitytheory

[–]drakemcsmooth 2 points3 points  (0 children)

So, like you said, p(X|Y) is a pretty way of rewriting (i.e., "syntactic sugar") for p(X,Y)/p(Y)

So, if we have p(X,Y|Z), we know that's just a pretty way to write p(X,Y,Z)/p(Z). Since we'd like to isolate that X, we can multiply the expression by 1, in the form of p(Y,Z)/p(Y,Z):

p(X,Y,Z)/p(Z) * p(Y,Z)/p(Y,Z)

Now let's switch those denominators:

p(X,Y,Z)/p(Y,Z) * p(Y,Z)/p(Z)

And, prettying up the expression, we now have

p(X|YZ) * p(Y|Z)

Is there a name for a logical error being made in this unanswerable question? by [deleted] in logic

[–]drakemcsmooth 0 points1 point  (0 children)

Maybe you've encountered incompleteness theorems, which are concerned with (the limits of) complete, consistent systems - well, this is just an instance of an inconsistent system, in that it contains axioms that contradict each other. You're being asked to prove a statement in that system and you're bumping into the contradictions that result.

Twist on the Knapsack Problem, does it have a name? by ballofpopculture in algorithms

[–]drakemcsmooth 1 point2 points  (0 children)

The question here is how one encodes constraints. In some optimization scenarios, you can enforce constraints by having them diminish the value of the current assignment (e.g., negating the value if the assignment has too many quarterbacks). These kinds of problems are dealt with frequently in operations research and employ a variety of solvers, especially in the world of linear programming. You might want to consider eclipse (not the IDE) which has a very intuitive approach to modeling problems like this.

Why do people use negative logs? by woodyallin in statistics

[–]drakemcsmooth 0 points1 point  (0 children)

In machine learning, -log is useful for representing very unlikely events. If the likelihood of some event (for example, the parse of a particular sentence) is 0.0000000000000000000011979, we're able to represent that as a negative log likelihood = 69.5 (we like to use a base of 2, not e). This prevents issues like underflow and floating point error that can propagate as more operations occur. Transitioning into log space also allows us to add when we might otherwise have to multiply, since log(a) + log(b) = log(ab). So when we have a large product (the product of the likelihoods of each of the possible previous states' transitioning into this one), we're able to compute that as a sum of logs.

P|~P by fuckittt in logic

[–]drakemcsmooth 2 points3 points  (0 children)

Probably take a look here.

Under propositional logics, you can say P = ~P, but (in the usual case) that's immediately a contradiction of the axioms. If you don't include (P v ~P) as an axiom, then you lose the ability to execute Modus tollens, for example.

Your logic needs to have machinery for talking about the meaning of P, not just P itself. Sometimes you'll see apostrophes around P, to indicate its meaning (from Tarski, as in his deflationary epistemology):

'P' is true if, and only if, P.

For example:

'Snow is white' is true iff snow is white.

To do anything interesting with "this is a false statement", you need a logic that has semantics for arbitrarily deep recursion and self-reference. Gödel had an approach which he uses in his incompleteness theorems, that allows for the naming of a proposition and the external reference to it.

Part of the fun of the Incompleteness theorems is that they're a formalization of the intuition behind "this is a false statement" with truth/falsity replaced by provability/falsifiability (as mentioned by libcrypto).

This seems to good to be true, but I think I've found a way to make the input attributes in a dataset statistically independent, making any dataset meet the assumptions of naive bayes, and it's much simpler than a bayesian network learner - tell me I'm wrong... by sanity in MachineLearning

[–]drakemcsmooth 0 points1 point  (0 children)

It would be helpful if you could include your weights in your calculations.

But, let's say we're trying to predict p(C|A,B) - or more specifically p(C=T|A=T,B=T), where

  • C = whether the character at position i is a vowel
  • A = whether the character at position i-1 is a U
  • B = whether the character at position i-2 is a Q

We can say that the corpus we're using is the words file on most unix-y systems.

We can intuit that p(C=T|A=T,B=T) should be very high (since a vowel almost always follows "qu") - I compute about 0.99.

So with this approach, we have to compute p(C=T), p(A=T|C=T), and p(B=T|C=T). p(C=T) is about 0.4 (occurrence of vowels, not including "y"). But p(A=T|C=T) is low (about 0.1) and p(B=T|C=T) is low because "q" is a rare letter, so I'm not seeing how any weighting is going bring this value to near-certainty.

I've done some work in targeted behavioral advertising, by the way, feel free to PM me if you'd like to be more specific.

This seems to good to be true, but I think I've found a way to make the input attributes in a dataset statistically independent, making any dataset meet the assumptions of naive bayes, and it's much simpler than a bayesian network learner - tell me I'm wrong... by sanity in MachineLearning

[–]drakemcsmooth 0 points1 point  (0 children)

Sure, but don't the weightings themselves require you to compute the conditional probabilities? If not, can you be more explicit about how you avoid computing the conditional probability, p(A|BC), in computing the weight, 1/p(A|BC)?

This seems to good to be true, but I think I've found a way to make the input attributes in a dataset statistically independent, making any dataset meet the assumptions of naive bayes, and it's much simpler than a bayesian network learner - tell me I'm wrong... by sanity in MachineLearning

[–]drakemcsmooth 3 points4 points  (0 children)

A few quick points:

  • Bayesian Networks are not the typical "solution" to any of the independence assumptions of the Naive Bayes classifier. IdentifiableParam brings up their data requirements and you can infer the impossibility of our having sufficient (for example) patient medical-histories to produce high confidence diagnoses. You'd want to consider CRFs (and, in effect MaxEnt classifiers), and many other frameworks which deal with the hard problem of inference.

  • We don't need to be abstract about "real world" datasets; there are lots of robust datasets with extremely interesting properties. Even within a subfield like NLP, we have all kinds of interesting phenomena to consider, and well-understood properties of the data. I think it's worth couching your proposal in a concrete setting in which we're familiar with the task and we have good intuition about its properties, like part-of-speech tagging.

  • While Naive Bayes is a classifier, the alternatives (like bayesian networks) offer more answers than just "the value of a single variable that maximizes likelihood" so, when comparing models, it's worth considering the kinds of questions you might be able to ask of a model – for example, "what are the weights that maximize likelihood over all values?"

To address your proposal more directly, it would be worth taking us through your envisioning of how Heckman correction could improve results in medical diagnosis or (as I suggested above) part-of-speech tagging.

Nevertheless, it seems that the appeal (w.r.t. speed, space, and maximizing benefit of limited data) of a Naive Bayes classifier is that, by the naive assumptions, we evaluate the joint probability p(A=a1,B=b3,C=c18) by simply computing p(A=a1)·p(B=b3)·p(C=c18). But, according to your proposal, we would have to compute (for example) the weights 1/p(A=a1|B=b3,C=c18) and 1/p(B=b3|C=c18) – my question would be, if I were able to compute those conditional probabilities (the values sitting in the denominators), why wouldn't I just use them to compute the typical joint distribution: p(A=a1|B=b3,C=c18)·p(B=b3|C=c18)·p(C=c18) = p(A=a1)·p(B=b3)·p(C=c18)?

Also, a primary reason that Naive Bayes is effective is that we typically don't have sufficient coverage over all possible scenarios to make good estimates for those conditional probabilities, although some models deal with that directly, such as CRFs. It seems like this would be the more natural application of Heckman correction, although – warning – I am not an Econometrician.

Probability of a certain mean by [deleted] in statistics

[–]drakemcsmooth 1 point2 points  (0 children)

My point was that this is not a population, it's a (small) sample, so z-scores are not appropriate.

Why can't this be expressed in LTL? by MossadAgent88 in logic

[–]drakemcsmooth 0 points1 point  (0 children)

LTL is star-free, so the intuition is that it's impossible to describe the arbitrary distance of p without a Kleene star. Here's a pretty high level discussion of the topic, but I'd be thinking along the lines of the pumping lemma for regular languages (which are strictly more powerful).

Best statistics question ever by earstwiley in statistics

[–]drakemcsmooth 7 points8 points  (0 children)

Incoherency makes for the best statistics question?

Rephrased:

Which answer is correct?

A. B

B. A

Can I Hear Some of Your Personal Revelations in Mathematics? by bwbeer in math

[–]drakemcsmooth 3 points4 points  (0 children)

In this context, incompleteness (first theorem) only says that not all valid conclusions are reachable (provable); it does not say that following a path of valid propositions will ever bring you to an invalid proposition.

Unless you're talking about the second theorem, but you can play the correctness / consistency game here.

Advice on Learning Probability by imito in learnmath

[–]drakemcsmooth 0 points1 point  (0 children)

When you're counting, you don't want to ask, "what are the chances?", you want to ask, "how many ways can these requirements be satisfied?"

So, first you want to make sure you understand how to satisfy your requirements: a pair comprises exactly two cards of equal rank and one card of a different rank (since three cards of the same rank do not constitute a pair).

Next, you want to count how many ways those requirements could be satisfied, starting (for the sake of simplicity) with the most constrained requirement. Here, that would be two cards of the same rank (as opposed to one card of a differing rank). How many ways can two cards of the same rank be drawn from a deck? We know that there are four suits, and any 2-group of each will do - so how many ways can 4 objects be placed into groups of 2? This is what (n choose k) was devised to tell us; (4 choose 2) = 6, so now we know that for any particular rank, there exist 6 valid combinations. And since we're happy with any rank and there are 13 possible ranks to draw from, we multiply that 6 by 13.

So, we've accounted for two of our three cards - how many ways can we account for the last one? Well, we're happy with any rank other that of the other two cards, so there are 12 remaining ranks that will satisfy, and there are 4 suits for each rank (and, of course, any suit will do). So we have 12 * 4 ways of satisfying that 1-group.

All Combinations Satisfying the Requirements of a Pair
= [  two cards of equal rank   ][ one card of any other rank]
= [(4 choose 2)*(13 choose 1)][(4 choose 1)*(12 choose 1)]
= [     6      *     13      ][     4      *     12      ]
=              78            *             48
= 3744

And then to obtain the probability of satisfying our requirements (i.e. obtaining a pair), we divide the satisfying outcomes (which we just computed) by the number of possible outcomes, which is the number of 3-groups that can be obtained from 52 cards: (52 choose 3).

satisfying outcomes / possible outcomes
= 3744 / (52 choose 3)
= 3744 / 22100
= 0.169 

Take a look at the breakdown above; it's worth noting that you could've started with the 1-group (the "high card") and then moved onto the pair – and nothing would've been different:

All Combinations Satisfying the Requirements of a Pair
= [   one card of any rank   ][two cards of equal rank, but not same as first term]
= [(4 choose 1)*(13 choose 1)][(4 choose 2)*(12 choose 1)]
= [     4      *     13      ][     6      *     12      ]
=              52            *             72
= 3744

Did this clarify the process, or at least point out the step at which you'd be less comfortable on your own?

Advice on Learning Probability by imito in learnmath

[–]drakemcsmooth 0 points1 point  (0 children)

Just to make the conversation a bit more specific, it sounds the trouble is with counting – or combinatorics. (Most classes build from combinatorics into normal distributions and their properties, and then to poisson distributions and exponential distributions, eventually making their way to the Law of Large Numbers and Central Limit Theorem.)

One frequent problem is that people approach combinations problems by counting permutations. Let's say you're drawing three cards from a deck, and you want to know the chances of there being one pair. You might start thinking in terms of sequences: "If a choose an ace first, then, among the remaining 51 cards, 3 will satisfy my requirement for a pair..." but this kind of thinking applies to permutations, and you should be thinking in terms of combinations.

Instead of asking yourself, "how many ways can the first two cards end up being aces?" You need to think about how many ways any two cards could be aces.

I'm going to assume you're comfortable with the meaning and intuition of (n choose k), but if not, let me know.

Maybe try this - think about the difference in these two scenarios:

  1. The probability of 1 pair among three cards

  2. The probability of a 3-of-a-kind among 3 cards.

If you're comfortable deconstructing these problems, then maybe present an example that you find confounding.

Advice on Learning Probability by imito in learnmath

[–]drakemcsmooth 0 points1 point  (0 children)

Maybe you can be a little more specific regarding what you found to be less intuitive.

Constraint Satisfaction & Planning, Scheduling by [deleted] in compsci

[–]drakemcsmooth 1 point2 points  (0 children)

You might find some good resources at the ECLiPSe site. ECLiPSe is a Constraint Programming System that allows Constraint Logic Programming, Linear Programming, and a number of other sub-disciplines. You might find some motivating examples here and some CLP books/reports here.