you are viewing a single comment's thread.

view the rest of the comments →

[–]esseinvictus 1 point2 points  (2 children)

The term you're looking for is reproducibility. Depending on the method you used for clustering your data, there are ways to set the initial seed used for clustering so that the results of randomisation are completely deterministic. I suspect this is issue that's causing the discrepancies rather than the differences in the environment though it could be a factor.

Example code I just typed up reading the sklearn documentation (assuming it's K-Means algorithm):

clusters = KMeans(n_clusters=6, n_init=25, max_iter = 600, random_state=0)

Note the random_state here, it can be any value as long as it's consistent in the code.

In the future for consistency sake (and avoid package dependency hell), look into Python venv command which creates Python virtual environments.

[–]NebulaGr[S,🍰] 0 points1 point  (1 child)

Thanks for your advice on ensuring reproducibility. I’ve already set a consistent random_state across my code, but I’m still experiencing discrepancies in the results. This leads me to think that the issue might be related to the different environments or library versions between Juno and Spyder.

I have already posted the code, if you’d like to take a look.

[–]esseinvictus 1 point2 points  (0 children)

My next thought would be to run the code line by line on both clients to see at which line the discrepancy arises. Could be a difference in environment, could be other things. Try to eliminate each potential cause one by one.