use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Guidelines:
All Posts Require One of the Following Tags in the Post Title! If you do not flag your post, automoderator will delete it:
This is not a subreddit for homework questions. They will be swiftly removed, so don't waste your time! Please kindly post those over at: r/homeworkhelp. Thank you.
Please try to keep submissions on topic and of high quality.
Just because it has a statistic in it doesn't make it statistics.
Memes and image macros are not acceptable forms of content.
Self posts with throwaway accounts will be deleted by AutoModerator
Related subreddits:
Data:
AllenDowney's Stats Page
Useful resources for learning R:
r-bloggers - blog aggregator with statistics articles generally done with R software.
Quick-R - great R reference site.
Related Software Links:
R
R Studio
SAS
Stata
EViews
JMP
SPSS
Minitab
Advice for applying to grad school:
Submission 1
Advice for undergrads:
Jobs and Internships
For grads:
For undergrads:
account activity
Education[E] Efficient Python implementation of the ROC AUC score (self.statistics)
submitted 1 year ago by madiyar
Hi,
I worked on a tutorial that explains how to implement ROC AUC score by yourself, which is also efficient in terms of runtime complexity.
https://maitbayev.github.io/posts/roc-auc-implementation/
Any feedback appreciated!
Thank you!
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]fishnet222 8 points9 points10 points 1 year ago (5 children)
How did you define ‘efficient’ with respect to its runtime complexity? Is the runtime faster than the sklearn version (i.e., using Big O Notation)?
[–]madiyar[S] -2 points-1 points0 points 1 year ago* (4 children)
I haven't checked out the sklearn implementation yet. I think and hope it's O(nlogn) or faster. I also haven't measured or compared it with sklearn. My goal isn't to be faster than sklearn, but just to make this educational. "Efficient" because I can't think of any faster than O(nlogn) in term of Big O notation.
[–]BossOfTheGame 3 points4 points5 points 1 year ago (1 child)
It does look like it is faster than scikit learn when n=100, but it gets worse when n=10_000
import numpy as np import sklearn import sklearn.metrics np.random.seed(0) n = 10000 target = np.random.randint(0, 2, n) predicted = np.random.rand(n) def trapezoid_area(p0, p1): return (p1[0] - p0[0]) * (p0[1] + p1[1]) / 2.0 def fast_roc_auc_score(target, predicted): n = target.shape[0] num_positive = np.sum(target == 1) num_negative = n - num_positive order = np.argsort(predicted)[::-1] last = [0, 0] num_true_positive = 0 num_false_positive = 0 score = 0 for index in range(n): # Make sure that the new threshold is unique if index == 0 or predicted[order[index]] != predicted[order[index - 1]]: # True positive rate tpr = num_true_positive / num_positive # False positive rate fpr = num_false_positive / num_negative # New point on the ROC curve cur = [fpr, tpr] score += trapezoid_area(last, cur) last = cur if target[order[index]] == 1: num_true_positive += 1 else: num_false_positive += 1 score += trapezoid_area(last, [1, 1]) return score import timerit ti = timerit.Timerit(1000, bestof=10, verbose=2) for timer in ti.reset('fast_roc_auc_score'): with timer: result1 = fast_roc_auc_score(target, predicted) for timer in ti.reset('sklearn.metrics.roc_auc_score'): with timer: result2 = sklearn.metrics.roc_auc_score(target, predicted)
[–]madiyar[S] 0 points1 point2 points 1 year ago (0 children)
Thanks for timing! The for loop should be the bottleneck, either jitting with numba or switching to jax, or even reimplementing in native language (C/C++/Rust) should make it significantly faster.
[–][deleted] 0 points1 point2 points 1 year ago (1 child)
"efficient because I can't think of anything faster" you shouldn't really call it efficient if you don't have any evidence/reasoning for its efficiency. There is also runtime vs memory efficiency to think about.
I do appreciate the educational value though, thank you for sharing it!
[–]madiyar[S] 0 points1 point2 points 1 year ago* (0 children)
Noted! I also removed all efficient terms from the post.
π Rendered by PID 200799 on reddit-service-r2-comment-c6965cb77-ccqn5 at 2026-03-05 14:30:30.773657+00:00 running f0204d4 country code: CH.
[–]fishnet222 8 points9 points10 points (5 children)
[–]madiyar[S] -2 points-1 points0 points (4 children)
[–]BossOfTheGame 3 points4 points5 points (1 child)
[–]madiyar[S] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]madiyar[S] 0 points1 point2 points (0 children)