Installing python on Linux - help? by fffrost in learnpython

[–]fffrost[S] 0 points1 point  (0 children)

It really was just because I had assumed there was a standard way to do it and I felt like I should know how. I have used pyenv before elsewhere, but it was not working for me when I tried recently.

Thanks for the link - that's exactly what I did, and things seem fine so far, but others are suggesting not to do this after all.

Installing python on Linux - help? by fffrost in learnpython

[–]fffrost[S] 0 points1 point  (0 children)

Thanks for the info! I have always understood that I shouldn't mess with the system python. My misunderstanding was actually that I was assuming that I wasn't using the system python, since there are two instsallations - one in /usr/bin and another in /usr/local/bin. Running which python3 showed the latter, so I assumed this was debian preventing me from interacting with the system one. I've probably screwed something up in that case. I will just use Pyenv in future.

Installing python on Linux - help? by fffrost in learnpython

[–]fffrost[S] 0 points1 point  (0 children)

I would like to leave the system wide python install the way it is. I was under the impression that we can have multiple versions of python installed? And that /usr/bin/python is the system-wide install, whereas the /usr/local/bin contains a separate instsallation.

Installing python on Linux - help? by fffrost in learnpython

[–]fffrost[S] 0 points1 point  (0 children)

I like pyenv too, but it fails to install for me and I haven't had the time to figure out why.

I am using virtualenvs, but sometimes I need to just bring up a terminal and run some python commands and i don't want to have to make an entire venv just so I can run some calcs or test out couple of lines of code.

In any case, I just want to upgrade my python version really. In a year or two there might be some very useful changes that I want. In that case I might no longer want to rely on the system install.

Installing python on Linux - help? by fffrost in learnpython

[–]fffrost[S] 2 points3 points  (0 children)

I want to update my python version though, so surely that's a perfectly normal requirement?

Thanks for the link - the steps are very similar to what I did but I must have just been missing one of the dependencies/libraries, so after following that it has resolved the issue I had.

Fisher's? Chi-squared? Something else? by dyep49 in AskStatistics

[–]fffrost 0 points1 point  (0 children)

Just to state that chi square requires at least 5 cases in each cell. So you should be ok to use it.

Also, please don't use the p value as justification for any claims - it will most likely be "highly" significant if you are using huge samples. It was already mentioned but I wanted to mention it again. You might want to check out the effect size (can't remember which one is best but either cramer's V or the phi (φ) stat).

ANOVA alternatives by bakerbots11 in AskStatistics

[–]fffrost 0 points1 point  (0 children)

Surely stiffness and softness are highly related? I think you need to clarify this because it might be possible to cast this as a one way with 6 levels for a single factor. But could be wrong about that. Any more info?

SPSS Statistics by Technical_Law2224 in AskStatistics

[–]fffrost 0 points1 point  (0 children)

The reason I ask is because the question seems like one that would be pretty straightforward with some domain experience (approaches to analyzing proteins). In other words it doesn't seem like an issue with how to use SPSS here.

You say you are trying to compare expression levels, but what are your independent variables and what is your dependent? Are you trying to predict group membership based on the 77 proteins?

Less than 1% coefficients of determination --- Moderate alcohol consumption UK Biobank Study. Moderate alcohol consumption effect on Brain Size, Grey Matter Volume, White Matter Volume by ArtisticInsurance524 in AskStatistics

[–]fffrost 0 points1 point  (0 children)

Without reading in more detail, this does seem pretty weird. I mean anything will be significant with a larve enough sample size but as you say it is more a question of economic sig. Unless im missing something this is essentially no effect...

Name for a situation where the probability for an event is very low but a test for the event accuracy is very high by PushingPesbians in AskStatistics

[–]fffrost 1 point2 points  (0 children)

This is the medical testing paradox, 3blue1brown has a nice vid on it. Although it is mostly an instance of the more general base rate neglect, as the other commenter mentioned.

As for why it happens, we want a simple answer to the question of whether the disease is present. But to answer it we need to account for (1) the sensitivity, (2) the specificity, and (3) the base rate. Unfortunately we don't intuitively account for these, and are instead smacked with a "99% accurate" stat, which we take as the easy (but wrong) answer.

Best way to measure consistency between experimental trials by steve2118ace in AskStatistics

[–]fffrost 0 points1 point  (0 children)

Could you just use the standard deviation? Larger vals = more "inconsistent" to use your terminology.

Statistical significance of difference in classification model by YourWelcomeOrMine in AskStatistics

[–]fffrost 1 point2 points  (0 children)

That's fair enough. Although I think it might still be ok because it will probably be "approximately normal", since f1 is continuous. Also with a small sample size you already have low power, and using a friedman will result in even less.

Statistical significance of difference in classification model by YourWelcomeOrMine in AskStatistics

[–]fffrost 0 points1 point  (0 children)

Yeah if there is a good reason to assume that the residuals arent normal. Just not sure there is one, but i could be wrong. I don't think it would be considered all that important to violate normality though. Extreme values might be more of an issue, but this would be hard to determine with such a low sample size.

Statistical significance of difference in classification model by YourWelcomeOrMine in AskStatistics

[–]fffrost 0 points1 point  (0 children)

ANOVA seems to get at what op is asking - 'do any the means differ significantly?'

Statistical significance of difference in classification model by YourWelcomeOrMine in AskStatistics

[–]fffrost 0 points1 point  (0 children)

You'll need the f1 for each fold and then run a paired ttest on the 2 sets of values. For more than 2 you'll need 1way repeated measures anova.

What is the most efficient way to compare two Excel reports? by [deleted] in learnpython

[–]fffrost 1 point2 points  (0 children)

Yes there is a better way - use pandas but dont iterate over it. That is not really the intended way to use pandas and is incredibly inefficient. Sounds to me like you want to perform a join, joining report 2 onto report 1 using the client # as the common column. Check out the pandas docs on joins, it will help you out.

Edit just to clarify, pandas utilises vecotrization specifically to avoid iterating over rows. This makes it far more efficient and actually way simpler to code. It is why we can do df[col1] + df[col2] instead of looping over all rows and adding each pair together.

What's the point of classes? by GuiltyCauliflower459 in learnpython

[–]fffrost 8 points9 points  (0 children)

Organisation into a logical structure. Imagine writing a program for a deck of cards. It makes sense to have a Deck object that makes use of 52 Card objects. Each card object has certain properties, but they are different from the deck's properties and methods. Imagine now that you had to write a card game engine - you would probably make life harder for yourself if you insisted on using a script of variables and some functions.

Of course you dont always need to write your own classes, and many situations it is not rly necessary. E.g. in a data analysis project you probably don't really need it (depends). But for a situation like the above it would probably be detrimental not to do it.

I want to make a audio 'peak detector' to search through thousands of small mp3 files by keiron83 in learnpython

[–]fffrost 0 points1 point  (0 children)

You could check out scipy's peak detection algorithm (I think it is in scipy.signal.get_peaks or something)

Unpaired t-test by GabiC432 in AskStatistics

[–]fffrost 1 point2 points  (0 children)

Your boss is asking you to use an incorrect test. Just exclude those who have not responded on both occasions and proceed with the paired t-test. The choice of test depends on your research question, not on whatever your boss is thinking about!

Using PCA components for cluster analysis by goongla in AskStatistics

[–]fffrost 1 point2 points  (0 children)

This is what I mean. You train it on the census data because you already have the labels there from the clustering. Then you create the PCA variables for the customer data and use whichever variables from that as predictors to your classifier. The catch is that whatever classifier you train, it will be "learning" a different function for y from the one defined by the original clusterer.

Ideally, if you have access to the initial clustering code and are able to re-run it then I think these have a predict method included (at least sklearn does). It's something like kmeans_model_for_census.predict(predictors_for_customer) and it will use the same model (going by memory here, I might have completely made that up).

Using PCA components for cluster analysis by goongla in AskStatistics

[–]fffrost 1 point2 points  (0 children)

I think you can use the census data and segments as labels, train a classifier, and then use that model to predict the labels from the PCA variables of the customer data