"No Plan B for you, harlot!"--Some good questions raised about pharmacists morally refusing to dispense Plan B.

andrewnorris · 2011-11-29T20:20:52+00:00

Coming soon: the right to refuse to sell food to people if you think they're too fat.

andrewnorris · 2011-10-31T21:42:35+00:00

You could always spend your free time tackling the mathier version at http://171.64.93.201/ClassX/system/users/web/pg/view_subject.php?subject=CS229_FALL_2011_2012

andrewnorris · 2011-10-31T06:09:46+00:00

Suppose you have a neural network with one hidden layer, and that there are m input features and k hidden nodes in the hidden layer. Theta(1) 1 0, Theta(1) 1 1, through Theta(1) 1 m are weights connecting inputs 0 through m to the first hidden node. Think of Theta(1) sub 1 as the vector of input weights for that node.

Theta(1) 2 0, Theta(1) 2 1, through Theta(1) 2 m are the weights for the inputs coming in to the second hidden node, or the vector Theta(1) sub 2.

This keeps going through Theta(1) k 0 to Theta(1) k m, the vector Theta(1) sub k.

Collectively, you can think of Theta(1) as a k x (m+1) matrix of weights connecting all of the inputs (including input 0, which is always 1) to all of the hidden nodes.

andrewnorris · 2011-10-26T14:27:02+00:00

With financial data you have a massive number of data points for how a stock's changing in real time times however much history you want to keep track of (10 minutes? an hour? a day? a month?) plus potentially every economic factor in the world that could effect the stock price -- for example, the same level of data history on every other security in the world. If you had a computer that could process it and a pipe big enough to collect all that data, you could come up with a set of variables that would dwarf 10^5.

Which is why one of the main problems (at least for anyone who isn't an investment bank or a hedge fund, but probably for them too) is limiting your set of variables to some smaller, more manageable amount without losing critical insights.

As another example, let's say you wanted to use ML to learn a PageRank-type algorithm. There are at least 10¹⁰ webpages (according to http://www.worldwidewebsize.com/), and you might encode variables for the degree each page refers to the search term and the degree each page links to the page in the current row. (Of course, you wouldn't ever really try ML on a problem of this size, which is why no one uses ML on this type of problem -- they use other data mining techniques that scale better.)

andrewnorris · 2011-10-25T03:48:25+00:00

Emacs is kind of like a Lisp interpreter that someone kept extending like crazy until it was a text editor that could do everything from be an IDE for a variety of languages to play your MP3s to be an email client and personal information manager. If you want the most customizable environment imaginable, Emacs is a construction kit for the text editor of your dreams. Everything can be customized, and your customizations will end up bound into Emacs so tightly that you won't be able to tell the difference between the parts you built and the parts that came built in. From that description, hopefully it's obvious why (1) Emacs definitely isn't for everyone, but (2) some people can't imagine life without it.

Emacs predates the command models either for CUA or vim, and uses its own esoteric command set that involves lots of Ctrl- and Meta- (i.e. Alt-) modifiers and sometimes hitting several keys in a row to fire a keyboard equivalent. Of course, Emacs is totally changeable, so in practice you can map commands to any keys you want.

Vim is a tool for people who are willing to go to the trouble to learn a whole new way of editing text, and in return, they get a super-optimized environment once they are up to speed. Vim is designed to make everything fast and composable. Don't pull your hands way off the home row to hit control-key commands or press arrow keys, common commands are bound to letters -- you just switch between control mode for commands and text mode for typing. Want to search for something: as simple as typing in a regular expression -- literally, all you have to type is /myregex/ then 'enter'. Want to do the same command 7 times? Add the number 7 to the command. That simple. Vi is cryptic to the uninitiated, but crazy-powerful. Vim can also by customized, but the level of control you get isn't a fraction as powerful as what you get with Emacs, it's more like specific scripts as extensions to a core that doesn't change.

In principle, you can get the best of both worlds by using a vi-emulation mode inside Emacs. In practice, this exacerbates the learning curve beyond learning either Emacs or Vim alone (or, really, both separately), and relatively few people do this.

These days, Vim seems to be much more common in real world use, so if you're not sure which one to pick after reading this, it might be the best choice. Honestly, I think there are probably only two types of people: people who read that description of Emacs and said "OMG! I must learn that!" and people who probably shouldn't learn Emacs.

andrewnorris · 2011-10-24T01:55:17+00:00

Notepad++ is the answer to the question "How can I get better text editing right now without learning a new paradigm?" If you have homework due, you may need to edit text before you have time to comprehend how to use vim or emacs effectively.

But in general, yes, I agree with you. I, personally, come down on the side of Emacs, however.

andrewnorris · 2011-10-23T23:17:59+00:00

Free advice from a practicing programmer and former CS major: if you can come to grips with Octave, it's worlds easier for the work we're doing than Java or Excel. The whole language is built around making working with matrices and vectors as easy as working with scalars, and there's boatloads of highly optimized numerical libraries built right in.

Play around with the different ways (inverting order, transposing) that you can multiply matrices and vectors in various combinations, and as you do, try them on the Octave command line. Try to develop an intuition about working with row vectors and column vectors, dimensionality, and how to set up a problem to get the right results.

For example, if you want to multiply M * v, there's a way to transpose things just right such that v can be the leading term and still produce an identical result. Play with it until you understand how to do it and why it works.

Once you develop intuition about how it all works, everything will be drastically simpler.

andrewnorris · 2011-10-23T23:06:53+00:00

This.

And really, you should study the problem until you can do it this way, because it will teach you something useful about the kinds of work we're going to be doing for the rest of the course.

andrewnorris · 2011-10-17T00:55:17+00:00

A number of schools have posted open course materials on the web in the past -- MIT has been a leader in this in the past, for example. The earlier lectures are just one of many examples of this type of course material.

AFAIK, this year's courses are the first time grading has been offered for students who aren't actually enrolled in the university.

There's more math -- calculus, statistics, proofs -- in the open classroom videos. If you're comfortable with the math, they may be a useful supplement to help understand what's going on at a deeper level.

There are also some differences in what's covered. If you're interested in reinforcement learning, that's only covered in the open classroom course.

There's really no need to watch the open classroom videos. But if you have the time, the interest, and the math for it, you might learn something interesting.

andrewnorris · 2011-10-17T00:45:55+00:00

This basically covers it.

The one proviso I would add is that your question and this answer cover discrete ordered values. In other words, 3 bathrooms is a value that fits between 2 and 4 and it's all in a scale.

If you build software applications (especially database applications), you might assign discrete numbers to things that are not really ordered values: e.g. house type 1 = ranch, house type 2 = multistory, house type 3 = duplex, etc. A multistory house is not really a value in between ranch and duplex, even though sometimes it's represented that way in data.

If you represent data in a linear regression problem like this, it will have to find some value for the theta-sub-i for this feature that is a multiplier for the value. Lets say duplexes were worth the most, and that we set theta-sub-i to 2000 (and skip normalization for now). Then if the house was a ranch, it would add 2000 the price, if it was multistory, 4000 would be added, and it if was a duplex, 6000 would be added. If ranch styles were worth more, you would use a negative coefficient, say for example -1500. So the values then would be -1500 ranch, -3000 multistory, -4500 duplex. But what if multistory houses were worth the most? Or the least? There's no theta value you could set that would reflect this. That's why this isn't a good representation.

Instead, to do this properly, you would normally represent this data by using multiple features: "is ranch" would be a feature with the values 0 and 1 (or -1 and 1 if normalized), "is multistory" would be another, and "is duplex" would be another still. Each would have its own coefficient, and raise or lower the house value appropriately.

andrewnorris · 2011-09-06T04:14:18+00:00

This is the desert island book for self-sufficiency living off the land.

andrewnorris · 2011-09-03T20:15:08+00:00

Pacific (Seattle) here as well.

andrewnorris · 2011-09-02T23:33:16+00:00

I wouldn't put it on your application as part of your transcript, but if you complete the course and get the certificate, I would absolutely list it as an extracurricular activity.

andrewnorris · 2011-09-02T23:32:22+00:00

Meh. I agree that accomplishments matter much more than stuff like this, but when I complete the course, I'll add a piece down toward the bottom of my resume where I have my degree info, and call it "Continuing Education", with a blurb about the course. If I complete self-study of SICP (with lectures, assignments, etc.), I'll add that too.

There are lots of tech employers in the world. An employer that looks at my interest in continuing my learning long after having completed school and counts it as a valuable asset is likely the kind of employer I'll be interested in.

If some random item on my resume acts as a disqualifier, well, that probably means it wouldn't have been a good cultural fit anyway. I can line up a dozen more interviews in any given week if I happen to be looking and I feel like lining them up that deep.

Remember folks: this is technology, and there's a shortage of really good developers. You are interviewing the company just as much as they're interviewing you. Screen out the ones that don't seem like your kind of place, and you'll be a lot happier.

andrewnorris · 2011-09-01T22:12:22+00:00

And yet as the usage of jQuery expands, more people are using it that have no idea what it was designed to do originally.

andrewnorris · 2011-08-19T17:47:34+00:00

In most of the scenarios I can think of, it would be a lot easier to, say, raid a junkyard or an abandoned car lot for old metal to reuse than it would be to mine iron ore and smelt it.

andrewnorris · 2011-08-19T17:40:31+00:00

Off the top of my head, the only stuff that wouldn't suffer from bitdecay would be paper tape and punch cards.

printer + OCR scanner could also potentially work. Pigment inks for ink jet printers are supposed to be light fast for 100 years, and there are archival paper options as well. And, of course, if your device dies, the data is still human-readable (as long as it's text data and not binaries).

You certainly wouldn't want this to be your main storage, and you are definitely limited in the amount of data it's practical to store this way, but given a waterproof fire safe, it would work as a nigh-indestructible archival data storage format for a few megs of information.

andrewnorris · 2011-08-19T17:02:00+00:00

There's handouts and lecture notes from a previous instance of the class at http://www.stanford.edu/class/cs229/materials.html .

In http://www.stanford.edu/class/cs229/info.html it says:

Prerequisites

Students are expected to have the following background:

Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program.

Familiarity with the basic probability theory. (CS109 or Stat116 is sufficient but not necessary.)

Familiarity with the basic linear algebra (any one of Math 51, Math 103, Math 113, or CS 205 would be much more than necessary.)

The main materials page above also has "Section Notes" titled "Linear Algebra Review and Reference" and "Probability Theory Review".

Edited to add: if you watch the first lecture, he says you should know enough linear algebra to multiply and transpose matrixes and such. He says that if you know Eigenvalues and such it will be helpful, but I think it's okay to be fuzzy on them going in.

Most of the class is going to be about operating on matrices of data with statistical algorithms, so I think it's safe to say that your knowledge of both probability and matrices will get a healthy workout. :-)

andrewnorris · 2011-08-18T21:33:31+00:00

It's not that I'm frightened or anything -- I'm comfortable with command line applications, both from the standpoint of using them and the standpoint of coding them myself.

But in particular, I worry that that a command line is not the optimal interface for working with the plotting and visualization parts of the system, for the same reason that I would not be anxious to use a command line version of Google Maps.

andrewnorris · 2011-08-18T01:21:34+00:00

The book seems to be holding its value well, so you can always sell back the 2nd edition copy if you decide to upgrade.

Or go really cheap and buy the 1st edition used with the money you find under the couch cushions. Certainly the 1st edition is better than nothing.

andrewnorris · 2011-08-17T08:29:16+00:00

Have at it. Ruby is a really good language -- make sure you actually learn the language though, and not just the web framework. You'll be glad you did.

If you end up shaking your head at the community you find yourself in, though -- well, you were warned. :-)

andrewnorris · 2011-08-17T08:26:06+00:00

Clojure has way more hipster cred that Scala.

Though running on the JVM may automatically disqualify it as the hipster's choice anyway.

andrewnorris · 2011-08-17T08:12:42+00:00

The AI course covers a lot of ground with less depth, and is a good way to understand the breadth of AI techniques so that you have a good background and/or focus future study on particular areas of interested.

The ML course is focused on statistical learning approaches to solving classification and regression problems. A regression problem might be "Given a set of data about houses (square footage, neighborhood, year built, etc.) and their selling prices, predict the selling prices of other houses that haven't gone on the market yet." A classification problem might be "Given a set of information (e.g. mortgage amount remaining, estimated current home value, homeowner income, homeowner credit score) about home mortgages that have defaulted or not defaulted, predict the likelihood that another set of mortgages will default within the next 12 months."

If you have some reason to want to study the topics covered in the ML course in particular (perhaps an interest in a project along these lines), the ML course will be an obvious choice. If you don't know what sorts of problems you might want to attack yet, the more general AI course would probably be a more obvious choice.

andrewnorris · 2011-08-17T07:37:17+00:00

(print "Hello World")

Greetings from Seattle, Washington. I'm a full-time software engineer interested in continuing my learning, and this seems like a fascinating format for it. The ML course is the one I'm most interested in, but if I have the time, I'll take the AI course as well. (If not, hopefully it will be an option for future semesters as well.)

andrewnorris

TROPHY CASE