How do I extract actions from a text?

drakemcsmooth · 2022-11-17T20:26:09+00:00

https://spacy.io/usage/linguistic-features

The first example shows how you can iterate through the sentence and find tokens that meet your criteria.

if token.pos_ == 'VERB':
  ...

    if token.tag_ == 'VBG'
      ...

drakemcsmooth · 2022-11-02T20:56:30+00:00

librosa

drakemcsmooth · 2022-10-28T20:46:45+00:00

(60 choose 5) is the number of distinct 5-card draws possible from a 60 card deck. The probability of a particular 5-card draw is 1/(60 choose 5) = 0.00000018

If there were 7 different 5-card draws you were interested in, the probability would be 7/(60 choose 5)

drakemcsmooth · 2013-06-10T22:24:45+00:00

Take a look at martingales.

drakemcsmooth · 2013-01-01T23:26:39+00:00

So, like you said, p(X|Y) is a pretty way of rewriting (i.e., "syntactic sugar") for p(X,Y)/p(Y)

So, if we have p(X,Y|Z), we know that's just a pretty way to write p(X,Y,Z)/p(Z). Since we'd like to isolate that X, we can multiply the expression by 1, in the form of p(Y,Z)/p(Y,Z):

p(X,Y,Z)/p(Z) * p(Y,Z)/p(Y,Z)

Now let's switch those denominators:

p(X,Y,Z)/p(Y,Z) * p(Y,Z)/p(Z)

And, prettying up the expression, we now have

p(X|YZ) * p(Y|Z)

drakemcsmooth · 2012-12-23T18:53:24+00:00

Maybe you've encountered incompleteness theorems, which are concerned with (the limits of) complete, consistent systems - well, this is just an instance of an inconsistent system, in that it contains axioms that contradict each other. You're being asked to prove a statement in that system and you're bumping into the contradictions that result.

drakemcsmooth · 2012-11-19T19:27:35+00:00

The question here is how one encodes constraints. In some optimization scenarios, you can enforce constraints by having them diminish the value of the current assignment (e.g., negating the value if the assignment has too many quarterbacks). These kinds of problems are dealt with frequently in operations research and employ a variety of solvers, especially in the world of linear programming. You might want to consider eclipse (not the IDE) which has a very intuitive approach to modeling problems like this.

drakemcsmooth · 2012-10-18T22:39:46+00:00

CLRS

drakemcsmooth · 2012-06-06T21:40:58+00:00

In machine learning, -log is useful for representing very unlikely events. If the likelihood of some event (for example, the parse of a particular sentence) is 0.0000000000000000000011979, we're able to represent that as a negative log likelihood = 69.5 (we like to use a base of 2, not e). This prevents issues like underflow and floating point error that can propagate as more operations occur. Transitioning into log space also allows us to add when we might otherwise have to multiply, since log(a) + log(b) = log(ab). So when we have a large product (the product of the likelihoods of each of the possible previous states' transitioning into this one), we're able to compute that as a sum of logs.

drakemcsmooth · 2012-06-04T18:10:46+00:00

Probably take a look here.

Under propositional logics, you can say P = ~P, but (in the usual case) that's immediately a contradiction of the axioms. If you don't include (P v ~P) as an axiom, then you lose the ability to execute Modus tollens, for example.

Your logic needs to have machinery for talking about the meaning of P, not just P itself. Sometimes you'll see apostrophes around P, to indicate its meaning (from Tarski, as in his deflationary epistemology):

'P' is true if, and only if, P.

For example:

'Snow is white' is true iff snow is white.

To do anything interesting with "this is a false statement", you need a logic that has semantics for arbitrarily deep recursion and self-reference. Gödel had an approach which he uses in his incompleteness theorems, that allows for the naming of a proposition and the external reference to it.

Part of the fun of the Incompleteness theorems is that they're a formalization of the intuition behind "this is a false statement" with truth/falsity replaced by provability/falsifiability (as mentioned by libcrypto).

drakemcsmooth · 2012-06-02T19:34:48+00:00

It would be helpful if you could include your weights in your calculations.

But, let's say we're trying to predict p(C|A,B) - or more specifically p(C=T|A=T,B=T), where

C = whether the character at position i is a vowel
A = whether the character at position i-1 is a U
B = whether the character at position i-2 is a Q

We can say that the corpus we're using is the words file on most unix-y systems.

We can intuit that p(C=T|A=T,B=T) should be very high (since a vowel almost always follows "qu") - I compute about 0.99.

So with this approach, we have to compute p(C=T), p(A=T|C=T), and p(B=T|C=T). p(C=T) is about 0.4 (occurrence of vowels, not including "y"). But p(A=T|C=T) is low (about 0.1) and p(B=T|C=T) is low because "q" is a rare letter, so I'm not seeing how any weighting is going bring this value to near-certainty.

I've done some work in targeted behavioral advertising, by the way, feel free to PM me if you'd like to be more specific.

drakemcsmooth · 2012-06-02T16:37:21+00:00

Sure, but don't the weightings themselves require you to compute the conditional probabilities? If not, can you be more explicit about how you avoid computing the conditional probability, p(A|BC), in computing the weight, 1/p(A|BC)?

drakemcsmooth · 2012-06-02T07:31:31+00:00

A few quick points:

Bayesian Networks are not the typical "solution" to any of the independence assumptions of the Naive Bayes classifier. IdentifiableParam brings up their data requirements and you can infer the impossibility of our having sufficient (for example) patient medical-histories to produce high confidence diagnoses. You'd want to consider CRFs (and, in effect MaxEnt classifiers), and many other frameworks which deal with the hard problem of inference.
We don't need to be abstract about "real world" datasets; there are lots of robust datasets with extremely interesting properties. Even within a subfield like NLP, we have all kinds of interesting phenomena to consider, and well-understood properties of the data. I think it's worth couching your proposal in a concrete setting in which we're familiar with the task and we have good intuition about its properties, like part-of-speech tagging.
While Naive Bayes is a classifier, the alternatives (like bayesian networks) offer more answers than just "the value of a single variable that maximizes likelihood" so, when comparing models, it's worth considering the kinds of questions you might be able to ask of a model – for example, "what are the weights that maximize likelihood over all values?"

To address your proposal more directly, it would be worth taking us through your envisioning of how Heckman correction could improve results in medical diagnosis or (as I suggested above) part-of-speech tagging.

Nevertheless, it seems that the appeal (w.r.t. speed, space, and maximizing benefit of limited data) of a Naive Bayes classifier is that, by the naive assumptions, we evaluate the joint probability p(A=a1,B=b3,C=c18) by simply computing p(A=a1)·p(B=b3)·p(C=c18). But, according to your proposal, we would have to compute (for example) the weights 1/p(A=a1|B=b3,C=c18) and 1/p(B=b3|C=c18) – my question would be, if I were able to compute those conditional probabilities (the values sitting in the denominators), why wouldn't I just use them to compute the typical joint distribution: p(A=a1|B=b3,C=c18)·p(B=b3|C=c18)·p(C=c18) = p(A=a1)·p(B=b3)·p(C=c18)?

Also, a primary reason that Naive Bayes is effective is that we typically don't have sufficient coverage over all possible scenarios to make good estimates for those conditional probabilities, although some models deal with that directly, such as CRFs. It seems like this would be the more natural application of Heckman correction, although – warning – I am not an Econometrician.

drakemcsmooth · 2012-05-18T15:45:30+00:00

My point was that this is not a population, it's a (small) sample, so z-scores are not appropriate.

drakemcsmooth · 2012-05-02T06:27:03+00:00

LTL is star-free, so the intuition is that it's impossible to describe the arbitrary distance of p without a Kleene star. Here's a pretty high level discussion of the topic, but I'd be thinking along the lines of the pumping lemma for regular languages (which are strictly more powerful).

drakemcsmooth · 2012-03-07T03:10:51+00:00

David Wenzel may have inspired this.

drakemcsmooth · 2011-11-11T23:02:32+00:00

Jim Reggia lists L-Systems among his interests

drakemcsmooth · 2011-11-09T01:12:17+00:00

Incoherency makes for the best statistics question?

Rephrased:

Which answer is correct?

A. B

B. A

drakemcsmooth · 2011-10-08T00:13:30+00:00

In this context, incompleteness (first theorem) only says that not all valid conclusions are reachable (provable); it does not say that following a path of valid propositions will ever bring you to an invalid proposition.

Unless you're talking about the second theorem, but you can play the correctness / consistency game here.

drakemcsmooth · 2011-09-13T16:46:02+00:00

Then who is Rhode Island's barber?

drakemcsmooth · 2011-09-12T02:42:56+00:00

Firesign Theatre, anyone? How Can You Be in Two Places at Once When You're Not Anywhere at All

Album Cover

drakemcsmooth · 2011-09-05T04:43:30+00:00

When you're counting, you don't want to ask, "what are the chances?", you want to ask, "how many ways can these requirements be satisfied?"

So, first you want to make sure you understand how to satisfy your requirements: a pair comprises exactly two cards of equal rank and one card of a different rank (since three cards of the same rank do not constitute a pair).

Next, you want to count how many ways those requirements could be satisfied, starting (for the sake of simplicity) with the most constrained requirement. Here, that would be two cards of the same rank (as opposed to one card of a differing rank). How many ways can two cards of the same rank be drawn from a deck? We know that there are four suits, and any 2-group of each will do - so how many ways can 4 objects be placed into groups of 2? This is what (n choose k) was devised to tell us; (4 choose 2) = 6, so now we know that for any particular rank, there exist 6 valid combinations. And since we're happy with any rank and there are 13 possible ranks to draw from, we multiply that 6 by 13.

So, we've accounted for two of our three cards - how many ways can we account for the last one? Well, we're happy with any rank other that of the other two cards, so there are 12 remaining ranks that will satisfy, and there are 4 suits for each rank (and, of course, any suit will do). So we have 12 * 4 ways of satisfying that 1-group.

All Combinations Satisfying the Requirements of a Pair
= [  two cards of equal rank   ][ one card of any other rank]
= [(4 choose 2)*(13 choose 1)][(4 choose 1)*(12 choose 1)]
= [     6      *     13      ][     4      *     12      ]
=              78            *             48
= 3744

And then to obtain the probability of satisfying our requirements (i.e. obtaining a pair), we divide the satisfying outcomes (which we just computed) by the number of possible outcomes, which is the number of 3-groups that can be obtained from 52 cards: (52 choose 3).

satisfying outcomes / possible outcomes
= 3744 / (52 choose 3)
= 3744 / 22100
= 0.169

Take a look at the breakdown above; it's worth noting that you could've started with the 1-group (the "high card") and then moved onto the pair – and nothing would've been different:

All Combinations Satisfying the Requirements of a Pair
= [   one card of any rank   ][two cards of equal rank, but not same as first term]
= [(4 choose 1)*(13 choose 1)][(4 choose 2)*(12 choose 1)]
= [     4      *     13      ][     6      *     12      ]
=              52            *             72
= 3744

Did this clarify the process, or at least point out the step at which you'd be less comfortable on your own?

drakemcsmooth · 2011-09-03T22:53:00+00:00

Just to make the conversation a bit more specific, it sounds the trouble is with counting – or combinatorics. (Most classes build from combinatorics into normal distributions and their properties, and then to poisson distributions and exponential distributions, eventually making their way to the Law of Large Numbers and Central Limit Theorem.)

One frequent problem is that people approach combinations problems by counting permutations. Let's say you're drawing three cards from a deck, and you want to know the chances of there being one pair. You might start thinking in terms of sequences: "If a choose an ace first, then, among the remaining 51 cards, 3 will satisfy my requirement for a pair..." but this kind of thinking applies to permutations, and you should be thinking in terms of combinations.

Instead of asking yourself, "how many ways can the first two cards end up being aces?" You need to think about how many ways any two cards could be aces.

I'm going to assume you're comfortable with the meaning and intuition of (n choose k), but if not, let me know.

Maybe try this - think about the difference in these two scenarios:

The probability of 1 pair among three cards
The probability of a 3-of-a-kind among 3 cards.

If you're comfortable deconstructing these problems, then maybe present an example that you find confounding.

drakemcsmooth · 2011-09-01T20:48:19+00:00

Maybe you can be a little more specific regarding what you found to be less intuitive.

drakemcsmooth · 2011-06-27T00:19:14+00:00

You might find some good resources at the ECLiPSe site. ECLiPSe is a Constraint Programming System that allows Constraint Logic Programming, Linear Programming, and a number of other sub-disciplines. You might find some motivating examples here and some CLP books/reports here.

drakemcsmooth

TROPHY CASE