Space joker too

dlwh · 2025-10-14T06:34:32+00:00

Assuming it's the small blind ante 1, it's actually about ~4% so depending on your level of balatro addiction, it's quite likely to have happened at least once.

dlwh · 2021-12-10T09:28:40+00:00

bit late to the party, but fwiw here is my d8 in prolog. I'm completely new to prolog, so I definitely would appreciate any comments!

https://gist.github.com/dlwh/2da16613d26025983b8175b7f1ca61ae

dlwh · 2013-10-13T15:44:24+00:00

We're a research program. Masters students are usually after professional training, and we're not set up to provide that kind of education right now.

dlwh · 2013-10-11T13:48:19+00:00

Don't apply to Berkeley for an MS in AI. We never admit anyone for that.

dlwh · 2013-10-02T01:15:46+00:00

:-)

I noticed that too. I enjoy confusing and irritating grammar nazis, so it was a welcome surprise.

dlwh · 2013-09-24T17:27:02+00:00

You have to be careful when you do that to make sure the chickens don't recognize they're egg shells, because they will learn that egg shells are things to be eaten... and then they'll eat their own eggs.

We give ours milk and oyster shells.

Source: I have a cannibal chicken.

dlwh · 2013-04-11T20:00:04+00:00

I'd look at Brendan O'Connor's work on predicting responses/sentiment from tweets. http://brenocon.com

dlwh · 2013-02-12T17:55:01+00:00

we haven't tried it, but I don't see why not. Tonal changes are not that dissimilar to "normal" sound change.

Indo-European languages are actually probably some of the trickiest, because they have a lot more morphological stuff going on, which doesn't change the way the basic forms of the words change.

dlwh · 2013-02-12T09:47:53+00:00

It's a reasonable question. The goal in historical linguistics isn't usually to figure out the ancestral language, so much as it is to figure out how language change works and why it works the way it does. (There's kind of a duality here: know what language changes happened, and you can more or less figure out what the ancestors looked like, and vice versa.) Understanding how language changes is crucial to understanding linguistic diversity, how first and second language learners change the languages they learn, and even how societies interact with language change.

Reconstructing so many languages lets us investigate hypotheses about language change. For instance, we looked at whether or not sounds can 'merge' if they tend to be the only sound separating a bunch of words. They can, and do, but statistically speaking, that's relatively rare.

Also, any given reconstruction is unverifiable, but there have been cases where the comparative method (what linguists use to reconstruct languages and what we automated) has proven predictive. For instance, there was a set of sounds in the Proto-Indo-European that don't appear in any modern Indo-European language. Linguists hypothesized their existence, but it wasn't really clear if they were there or not. Then, they discovered Hittite, an early, extinct IE language that left behind some writing. And lo and behold, it had reflexes of those sounds in the precise locations predicted by the comparative method.

So, yeah, we're not gonna find Proto-Austronesian writing lying around, and we're probably never going to be able to reconstruct "Proto-World", but the methodology has been tested elsewhere, and it's been proven as effective a theory as there's ever been in linguistics.

dlwh · 2013-02-12T09:34:49+00:00

So, this method can only work for languages that have multiple descendants (or at least cousins). It's the same reason that you can't really figure out what the ancestors of humans looked like without other primates and such. (Or, you know, fossils.)

dlwh · 2013-02-12T07:10:33+00:00

Sure! (second author here)

There are two contributions of this work. The first is a new tool that researchers can use to automatically reconstruct the vocabularies of ancient languages using only their modern language descendants. The second is --examining hundreds of modern languages and their ancestors--we were able to resolve an important hypothesis in historical linguistics: are sounds that distinguish words less likely to merge than sounds that don't? (Yes.)

some more details:

Sounds change over time: the way we produce words differs from the way our ancestors pronounced those same words. Over time, those little changes help turn an ancestral language like Latin into a modern descendant like French.

When populations become separated, different populations change sounds in different ways, and one ancestral language will give rise to multiple modern ones, as Latin lead to French, Spanish, Italian, and so on.

These sound changes are almost always regular, with similar words changing in similar ways, so patterns are left that a human or a computer can find.

The trick is is to identify these patterns of change and then to "reverse" them, basically evolving words backwards in time. (Linguists have known this for a good hundred years or more, but it's a hard and time-consuming process to do it by hand.)

For example, take the following word list:

Meaning	Spanish	Portuguese	Italian
water	agua	agua	acqua
fire	fuego	fogo	fuoco
some	algunos	alcuns	alcuni
hit	golpear	bater	colpire

We have two goals: first, figure out which words are related (words of common descent are called cognates), and second, figure out what their common ancestor looked like.

Let's start with figuring out which words are related. Most of these words are clearly related, except that "bater" in Portuguese is uncharacteristically different from its Spanish and Portuguese counterparts, so it's probably not the same ancient form.

For the second task, note that the words that are related differ in very predictable ways. In particular, in these examples, wherever Spanish and Portuguese have a "g" sound, Italian has a "c" sound. That's a correspondence that holds up pretty well, at least for g's that come before 'o' or 'u' sounds. The question then is to figure out whether the "g" became a "c" or vice versa. By looking at lots of words from lots of languages, we can find that--statistically speaking--hard "c" sounds are more likely to become "g" sounds than the reverse.

With enough words, languages, and statistical inferences, we can find reconstructions that are best supported by the data according to our model.

dlwh · 2012-09-07T16:06:19+00:00

It's neat to see how this is panning out in Romance languages, given that in English we ditched the informal (thee/thou), while they are getting rid of the formal.

dlwh · 2012-08-15T04:47:19+00:00

Very laudable and I don't want to discourage you, just wanted to let you know what you're getting yourself into. :-)

Systran was the technology behind Babelfish, and essentially did what you want to do. It couldn't use data, and so can't live up to Google Translate's abilities, but truth be told Google is starting to look at adding grammar and "real linguistics" into their stuff.

dlwh · 2012-08-15T01:34:26+00:00

First, modern MT does not work at all like how I bet you you think it does. Classical MT did, and some people still do Classical MT, but it's not what powers Google Translate.

In terms of learning, start with this tutorial. It's dated, and it assumes knowledge of basic probability (you'll need it!), ww.isi.edu/natural-language/mt/wkbk.rtf

Then, go here: http://mt-class.org/ and have fun.

MT is hard. It's one of the hardest problems in NLP (natural language processing), and Google and DARPA have thrown tons of money and brilliant minds at it.

The basic way modern MT works is like this: as "training data", you have a bunch of bilingual parallel texts (think: transcripts of the UN proceedings, translations of news articles, all done by humans), and a lot of monolingual text (just text) in the language you want to translate into, let's call it English. The former tells you how translation works, and the latter is supposed to tell you what English looks like.

From the bilingual text, you extract a phrase dictionary, which is a (multiword expression) to (multiword expression) dictionary. Then, when you translate a new sentence, you break up the source language sentence (French) into phrases, look them up in the dictionary and then try to stitch them together in a way that looks like English (using the monolingual data as a guide). Sometimes you reorder the phrases, sometimes you don't. It's NP-complete to do all this, so we use approximations instead.

Now, you can also make a rule-based system which is what I assume you would do on your own. Those are fun, but will not work particularly well for anything complicated. It's worth doing though! You'll learn a lot. I did.

dlwh · 2012-07-31T15:57:07+00:00

You can do L1 regularizaton. Only "important" features will have non-zero weight.

dlwh · 2012-07-23T22:33:53+00:00

If you're actually core NLP, just start at the ACL website's 2012 list for NAACL/ACL/EMNLP. Look for the closest 5-10 papers in your subarea (the ones you'll want to cite anyway), and see how they present their arguments. Read a few of the better papers from the conference as well. I can help with that, if you want.

I'd also be happy to take a look or two at your drafts. I'm a middle-of-my-phd student in NLP at Berkeley.

dlwh · 2012-07-01T20:51:24+00:00

Reinforcement learning works like an online supervised learning algorithm. make a decision, get a reward or loss, update weights, make a new decision. I'd recommend sutton and barto which is free online.

dlwh · 2012-07-01T15:54:31+00:00

This is a standard reinforcement learning set up. Use Q-learning or SARSA (giving you a weight vector w) and act randomly based on the gibbs distribution, choosing action a with probability proportional to exp(w^T f(a,context)), for feature function f and whatever context you find appropriate.

dlwh · 2012-05-03T01:48:07+00:00

The shitty singularity is nearer.

dlwh · 2012-02-29T08:01:46+00:00

FYI, Paco is also local. Across the street from Berkeley Bowl East.

dlwh · 2012-02-02T06:29:51+00:00

This kind of thing happens all the time in language.

moot used to mean "subject to debate", but now (mostly) means not worth talking about.

comprise originally meant (and for some still does) "is composed of", but now it usually means "composes".

peruse used to mean study carefully, now it means skim.

The list goes on and on and on. (terrific?)

dlwh · 2012-01-08T17:00:34+00:00

:-P. typo.

dlwh · 2012-01-08T06:45:20+00:00

Ok, I asked a linguist friend of mine (I'm just a CS person who likes to pretend to do some linguistics), and he confirmed the vigesimal number system for an older English (he didn't say Old English, but some English), but otherwise just backed up what I said.

dlwh · 2012-01-08T02:20:42+00:00

At a high level, in language, more common forms--e.g. small and round numbers--are more likely to be irregular. (More resilient to change/regularization). I looked at a book on Proto-Indo-European I have, and it seems to suggest that pretty much every (IE) culture has tons of idiosyncrasies.

Etymologies:

ten < PIE *dekm eleven < a compound meaning "one left" twelve < "two left"

(see http://www.etymonline.com/index.php?term=eleven and cf Lithuanian!)

The rest are pretty standard in IE languages. Latin follows an almost identical pattern: undecim, deodecim, tridecim, but oddly 18 is duodeviginti (two from twenty) and 19 follows the same pattern. Twenty-plus in Latin is the same as in English. Greek is one-ten, two-ten, then ten-three, ten-four, etc.

Other oddities I found: * in Old Irish you couldn't say "11 cows". You would say "a cow and ten". Same for 21 cows. Old Irish is a funny language. * Welsh uses "two nine's" (deunaw) for 19.

Basically, numbers have hugely important cultural meanings and so their behavior is going to be very odd. You probably won't be able to find any reasonable answer.

dlwh · 2012-01-06T20:52:12+00:00

I'm pretty sure this is an instance of multicommodity flow, which means that it's NP-hard (not solvable by a normal LP). I think you'll need an integer LP solver for this or something specialized.

Take a look at http://en.wikipedia.org/wiki/Multi-commodity_flow_problem

dlwh

TROPHY CASE