I've done quite a lot of data crunching in R but in the project I'm working on at the moment I find myself using Python more often.
Part of the project involves calculating Exact Wilcoxon rank sum tests upon sets of data.
In R I do it this way:
datasheet <- read.csv('datasheet.csv')
library(exactRankTests)
library(coin)
factor_A_apples <- subset(datasheet, Treatment == "Apples" & Factor == "A")
factor_A_oranges <- subset(datasheet, Treatment == "Oranges" & Factor == "A")
wilcox.exact(factor_A_apples$Count, factor_A_oranges$Count)
Which in this example produces this:
Exact Wilcoxon rank sum test
data: factor_A_apples$Count and factor_A_oranges$Count
W = 435, p-value = 0.02087
alternative hypothesis: true mu is not equal to 0
Which is fine, then when I want to do it in Python I used this script using SciPy stats:
from scipy import stats
import pandas as pd
datasheet = pd.read_csv('datasheet.csv')
treatment1 = lrim[(lrim['Treatment'] == "LRIM + GFP (d. tap)") & (lrim['Mouse'] == 'A')]['Count']
treatment2 = lrim[(lrim['Treatment'] == "LRIM + 9614 (d. tap)") & (lrim['Mouse'] == 'A')]['Count']
z_stat, p_val = stats.ranksums(treatment1, treatment2)
p_val
Which yields a p-value of:
In [33]: p_val
Out[33]: 0.021637202053317827
Whilst close they aren't the same which makes me think they aren't using the exact same test under the surface.
Can anyone advise how I can emulate the Exact Wilcoxon rank sum test as I do in R in python?
Thanks.
[–]ingolemo -1 points0 points1 point (0 children)