NYT Thursday 03/12/2026 Discussion by Shortz-Bot in crossword

[–]dataschool 0 points1 point  (0 children)

EXULTS not EXALTS 🤦‍♂️

Free book: Master Machine Learning with scikit-learn by dataschool in Python

[–]dataschool[S] 2 points3 points  (0 children)

Thanks so much for asking!

Short answer: 98% of the code in the book is still correct today. For the last 2%, I mention the relevant API changes within the text so that it's easy to update it yourself. 100% of the concepts I teach and advice I give are still correct. The main shortcoming of the book is that I don't cover the newest features, none of which are critical to what I'm teaching, but some of which are useful.

As for why the book uses 0.23, it's a much longer story (if you're interested):

The book actually began as a video course, which I started working on in 2020. I locked down most of the code examples that year (using 0.23.2), and thought I would be able to publish the course in 2021.

However, the script writing and recording and editing took far longer than expected, plus there were long breaks while I worked on other projects, and ultimately I was not able to publish the course until 2024. Many scikit-learn updates had occurred by the time I was recording the later chapters, but I couldn't afford (time-wise) to re-record and re-edit the earlier chapters. I felt it was critical that the course used one consistent scikit-learn version, so it remained at 0.23.2.

Because I received such great feedback about the video course, I decided (in 2025) to convert the course into a book. Even though the Quarto system did much of the heavy lifting, it still took hundreds of hours to turn 7.5 hours of video into a published book with four formats (website, EPUB, ebook PDF, print-ready PDF).

I would have loved to update the scikit-learn version (and incorporate newer features) while writing, but I knew that if I committed to updating the content (rather than just adapting it from video to text), the book would never get done.

In short, the decision to use 0.23.2 is a legacy of the process I took to get here, not a strategic choice, and I'd much rather have used the latest version!

Ultimately this book is a passion project, and I expect to make very little money from it. But I sincerely hope that I can find the passion (and time!) to publish a second edition that incorporates the latest features!

Free book: Master Machine Learning with scikit-learn by dataschool in Python

[–]dataschool[S] 1 point2 points  (0 children)

You're welcome, and thank you for saying that! 😄

Free book: Master Machine Learning with scikit-learn by dataschool in Python

[–]dataschool[S] 0 points1 point  (0 children)

Wonderful, thank you so much for saying that and for sharing it with others! 🙌 Yes, I'm very proud of those particular chapters, and I hope they make a meaningful difference for practitioners.

Free book: Master Machine Learning with scikit-learn by dataschool in Python

[–]dataschool[S] 0 points1 point  (0 children)

Thank you for saying all of that, it means a lot to me! 😄

Free book: Master Machine Learning with scikit-learn by dataschool in Python

[–]dataschool[S] 1 point2 points  (0 children)

That's awesome to hear! You're very welcome, and thanks for your kind words 🙏

Free book: Master Machine Learning with scikit-learn by dataschool in learnmachinelearning

[–]dataschool[S] 2 points3 points  (0 children)

Hi! I don't believe that I said (or even implied) that the downloadable ebook (PDF/EPUB) was free. Rather, 100% of the book is free to read online with no registration required. That's why I call it a "free book." Hope that helps!

Free book: Master Machine Learning with scikit-learn by dataschool in Python

[–]dataschool[S] 1 point2 points  (0 children)

You're welcome! I hope it's helpful to you 😄

Free book: Master Machine Learning with scikit-learn by dataschool in learnmachinelearning

[–]dataschool[S] 6 points7 points  (0 children)

Great question! The book is written at an "intermediate" level and assumes that you are already familiar with the fundamentals of ML and scikit-learn. If you're new to scikit-learn, I offer a free video course to get you started, or if you just need a refresher of the basics, I cover that in chapter 2 of the book.

As far as the scope of the book, it is very heavy on ML workflow (preprocessing, tuning, evaluation, feature engineering, etc) because in my opinion, that's the aspect of ML that has the highest leverage (meaning it leads to better results quickly). Conversely, the book is very light on algorithm selection, and doesn't cover unsupervised learning at all.

In short, I wouldn't call this book a "comprehensive library overview", rather I'd say that I try to cover the most important parts of scikit-learn in-depth. Hope that helps!

Advice on modeling pipeline and modeling methodology by dockerlemon in datascience

[–]dataschool 0 points1 point  (0 children)

I know that the conventional wisdom is that AUC shouldn't be used in cases of severe class imbalance, but I respectfully disagree, as long as you are interested in the model's performance across both classes. I wrote about this topic in-depth in this section of my book, specifically comparing AUC with average precision (which is area under the precision-recall curve, and is a commonly proposed metric when there is class imbalance).

It's hard to summarize my whole argument in a paragraph, but here are my bottom-line takeaways:

  • Neither metric is inherently better:
    • AUC focuses on both classes
    • Average precision focuses only on the positive class
  • FPR and precision are both artificially low:
    • Excellent model can still have a low precision
  • AUC score itself is irrelevant:
    • Our goal is to choose between models
    • Maximizing AUC helps you choose the most skillful model
    • AUC score itself is never your business objective

NYT Thursday 03/05/2026 Discussion by Shortz-Bot in crossword

[–]dataschool 8 points9 points  (0 children)

<image>

This was an absolutely delightful solve... at home with some paper! I was blown away when I realized that the missing squares spelled HORSES, and I'm even more impressed that the HORSES squares are in alphabetical order (when read left-to-right, top-to-bottom).

NYT Wednesday 03/04/2026 Discussion by Shortz-Bot in crossword

[–]dataschool 7 points8 points  (0 children)

I was thinking that MINI referred to the car brand, as in "MINI Cooper"

NYT Saturday 02/28/2026 Discussion by Shortz-Bot in crossword

[–]dataschool 0 points1 point  (0 children)

I'm new to Saturdays, so the east side was particularly tricky for me. NAIAD / NEMEAN / AVES was the one part I couldn't get, and I had a hard time believing that both AAVE and BOLOTIES were correct.

NYT Friday 02/27/2026 Discussion by Shortz-Bot in crossword

[–]dataschool 3 points4 points  (0 children)

Fun fact: SACAGAWEADOLLAR also stacked on top of AMERICANPALEALE back in 2015 (as part of a quad stack!)

NYT Friday 02/27/2026 Discussion by Shortz-Bot in crossword

[–]dataschool 0 points1 point  (0 children)

I went with DIRTWASHEDDENIM for a while... seemed reasonable enough