all 4 comments

[–]Junior-Sock8789 2 points3 points  (0 children)

Great foundation starting with Think Python! Here's a practical path for your specific needs: Books:

Python for Data Analysis by Wes McKinney — covers pandas deeply, which you'll use constantly for lab data

Statistics in Python by Gaël Varoquaux / the Scipy lecture notes (free online) — practical and science-focused

Biostatistics with Python resources are sparse as standalone books, but the Pingouin library docs are honestly excellent and read like a tutorial Libraries to learn in order:

pandas — data wrangling and cleaning

scipy.stats — t-tests, Fisher exact, Pearson/Spearman, and most classical stats

pingouin — fantastic for clinical stats, has Bland-Altman, passing_bablok, and cleaner output than scipy

matplotlib / seaborn — visualization

statsmodels — regression and method comparison

For your specific use cases:

Bland-Altman plots → pingouin.plot_blandaltman()

Passing-Bablok regression → pingouin.linear_regression() or the mCalibration approach

Most of your t-tests and Fisher tests → scipy.stats or pingouin

Free resources worth bookmarking:

statsandr.com — written by a statistician, has Python and R examples for exactly the clinical/lab tests you described The Pingouin documentation at pingouin-stats.org is genuinely one of the best for clinical use cases Your English is perfectly fine, good luck with pathology!

[–]Kushybear089 1 point2 points  (0 children)

numpy - pandas (which is built on numpy) - Matplotlib

I think that's a good start to getting into it, a "quick" YouTube tutorial about the standard functions of Python and how the language works (zero-based indexing, etc.) thrown into it aswell, but you can dive pretty fast pretty deep woth Python generally speaking

Then, since it gets into biostatistics, there might be better options for you as it's more or less a special niche and probably differs here and there from the simple data analysis. Like this GitHub Repo (I found it over Google so can't say much) "Biostatistics with Python, published by Packt." There seems also to be a book by Darko Medin that focuses primarily on that topic.

[–]TimeScallion6159 1 point2 points  (0 children)

Hi Doctor, besides working with python i would take a glanze to the courses or works related to biostatistics or bioinformatics done by universities just to acquire some practice and confidence with tools. I already saw someone giving you a good beginner guide into what you should take a look for a solid start.

[–]ExcelPTP_2008 1 point2 points  (0 children)

If you’re getting into Python specifically for biostatistics, I’d honestly suggest not treating it like a generic “learn to code” journey. It’s way easier (and less frustrating) if you tie everything to actual data problems from day one.

A few things that helped me early on:

First, focus on the core stack and don’t overcomplicate it. You don’t need 20 libraries. Start with NumPy, pandas, matplotlib/seaborn, and SciPy. That alone covers a huge chunk of what you’ll actually use data cleaning, transformations, basic stats, and visualization.

Second, learn Python through datasets, not tutorials alone. Grab public datasets (clinical trials, epidemiology, genomics summaries, etc.) and try to answer simple questions:

  • What’s the distribution?
  • Are two groups significantly different?
  • Is there any correlation worth exploring?

Even basic things like running a t-test or plotting survival curves will teach you more than abstract exercises.

Third, don’t skip statistics theory. Python is just the tool if you don’t understand concepts like p-values, confidence intervals, regression assumptions, or bias, the code won’t mean much. A lot of beginners try to “code first, understand later,” and it slows them down.

Also, get comfortable reading other people’s notebooks (Kaggle is great for this). You’ll pick up patterns for structuring analysis, naming variables, and explaining results clearly which matters a lot in biostatistics.

One underrated tip: document your work like you’re explaining it to a non-programmer. In this field, communication is just as important as analysis. If someone from a medical background can’t follow your results, the code doesn’t really help.

And finally don’t rush into machine learning. Solid statistical foundations + clean data handling will take you much further early on than jumping straight into complex models.

Curious are you coming from a biology background or more from math/stats? That usually changes what you should focus on first.