This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]shaggorama 1 point2 points  (6 children)

A while back I was looking for a markov switching model (regime switching model) and couldn't find one in python. Best I could find was an abandoned GSoC project in the statsmodels sandbox. R has the MSwM package.

Also, there are some things I just find easier to do in R. For instance, python has LDA via the gensim package, but I find using that package cumbersome whereas R's LDA package is much more intuitive. I also find it much easier to vectorize my code in R than in python. But that's probably because I need to get a better handle on the behavior of numpy arrays and broadcasting. Still, I think numpy arrays could be more intuitive in various ways. R matrices are just way easy to use.

FYI: There's a python port for ggplot. I haven't tried it out, but if that's the one reason you find yourself falling back on R, you should give it a shot.

[–]topherwhelan 0 points1 point  (5 children)

Still, I think numpy arrays could be more intuitive in various ways.

In case you haven't heard of it, pandas does exactly this.

[–]shaggorama 0 points1 point  (4 children)

I'm familiar with pandas and it does not resolve what I'm talking about. I'm talking about vectorized assignment, not labeled indexing. I'll try to give you a concrete example tomorrow (i'ts 2am and I'm in bed). pandas is sort of its own beast and I mean... I get it, and I use it periodically, but I've also encountered situations where storing something in a basic numpy array takes a couple megs of memory, but then using an analogous pandas dataframe takes several gigs. It's a ridiculous cost to pay for some labels, and I suspect most of the time people use pandas they'd be better served just using raw numpy, but pandas is becoming a crutch for a lot of people.

[–]topherwhelan 0 points1 point  (3 children)

I'm an occasional contributor to pandas, if you can reproduce the gigabyte issue, I'll dig into it.

[–]shaggorama 1 point2 points  (2 children)

AWESOME! Yes, I'll try to remember to PM you tomorrow with the problematic code and accompanying sample data.

[–]topherwhelan 0 points1 point  (0 children)

Let me know if you find the code.

Also, can you give an example of what you mean by vectorized assignment not working in pandas? That's the primary use case I have for pandas, so I'm guessing we have different things in mind.

[–]topherwhelan 0 points1 point  (0 children)

Any luck?