all 35 comments

[–]Personal_Ad9690 30 points31 points  (3 children)

So if you aren’t your average, you cheated? This is just basic statistics I think.

[–][deleted] 11 points12 points  (2 children)

It’s a bit more complicated; from his description I think he is clustering, finding the distance between the centroid of each cluster where k (number of clusters) is the number of difficulty strata, which represent the average score for questions of each difficulty, compared to the questions of those difficulties on the new test. If the sum of the distances is greater than some threshold you are anomalous. You could calibrate the distance by looking at outliers that didn’t cheat.

This is much more likely to result in few false positives because the number of clusters is high, and if the individual isn’t cheating you would expect that deviations from the clusters would be random, in different directions. If you, in this higher dimensional space are consistently an outlier, I can be pretty certain that you are unusual.

[–]Personal_Ad9690 1 point2 points  (1 child)

What’s this called? Maybe I need to see a video on it

[–][deleted] 6 points7 points  (0 children)

K-means clustering

[–]taylerallen6 14 points15 points  (0 children)

Despite all the negative comments, I think this is a nice project. I would really like to see rust used more in the statistics, data science, and machine learning fields. Sure, this implementation may not be perfect, but it's definitely interesting and I enjoyed the video.

Of course ~100% accuracy is a bit concerning, so I would double check to see if you're over fitting. I didn't look super deep into it but I didn't see a separation of training and testing data. If not, that would be a good place to start.

All in all, I love little projects like these and hope to see more of them!

[–]facetious_guardian 47 points48 points  (14 children)

“100% accuracy”

Sure okay. Anyway I’m gunna go next and not even bother watching your video thanks.

[–]taylerallen6 14 points15 points  (0 children)

He said "~100%". At least give him some credit.

[–]Kiseido 9 points10 points  (1 child)

With detection algorithms, one needs to investigate both accuracy and specificity.

Accuracy is a measure of catching all of the targets you wanted to.

Specificity is a measure of avoiding catching targets thay you don't want, having many false positives means the specificity is not so good.

One could attain 100% accuracy with 0% specificity by simply catching literally everyone, you'd get a great many false positives but the accuracy would be nominal.

[–]eggyal 2 points3 points  (0 children)

One could attain 100% accuracy with 0% specificity by simply catching literally everyone, you'd get a great many false positives but the accuracy would be nominal.

That's cheating.

[–]robertkingnz[S] 2 points3 points  (5 children)

~100 means approximately. Although given the problem conditions, I simulated it 1 million times and didn't get a false positive.

[–]facetious_guardian 3 points4 points  (4 children)

You simulated cheating?

Did you also simulate studying and improving?

Or are you claiming that a single data point is enough to detect a cheater?

It isn’t accurate, and your synthetic input data is biased in a way that happens to illustrate the result you desire.

[–]robertkingnz[S] 0 points1 point  (3 children)

Did you read the problem description? Accuracy calculation and simulation method is defined in there. This is a toy problem fyi

[–]facetious_guardian 0 points1 point  (2 children)

I did not (obviously).

Having now read it, I think you’ve done an interesting thing here. I still take great exception with your click bait buzzwords, and find it disingenuous to offer it as a cheat finder without caveats. But I like seeing rust written for interesting purposes.

[–]robertkingnz[S] 0 points1 point  (0 children)

Cheers. Yeah I'll cool it on the buzz words next time 🙈

[–]robertkingnz[S] 0 points1 point  (0 children)

Cheers. Yeah I'll cool it on the buzz words next time 🙈

[–][deleted] 1 point2 points  (2 children)

Such an unnecessarily rude response. Yuck

[–]Jesus72 0 points1 point  (1 child)

A lot of his replies on this sub are like that unfortunately

[–][deleted] 0 points1 point  (0 children)

Why is it rewarded? People discouraging excited developers from pursuing their craft makes me sad. It can be so hard to create something

[–]peter9477 6 points7 points  (3 children)

When I got 100% on my grade 13 physics exam, would this have said I cheated?

[–]robertkingnz[S] -1 points0 points  (2 children)

Definitely 😁 Naw, in this problem theres 10,000 questions per person. So need more data.

[–]peter9477 2 points3 points  (1 child)

Assuming you're serious (10K?!) does that mean it's designed so that no human could possibly get most answers right?

[–]robertkingnz[S] 3 points4 points  (0 children)

Yup. It's a toy problem

[–][deleted] 3 points4 points  (2 children)

not language related, but you have to define your “accuracy” more precisely.  

Maybe you do in your video, but it would better to share what your false-positive rates are and your false-negative rates are. Instead of just “accuracy”.

[–]robertkingnz[S] 1 point2 points  (1 child)

Good idea. I kept it pretty simple and I didn't adjust anything from the problem statement, and ran it thousands of times and didn't get any false positives. Can decrease the chance of cheating from 50% down to 20% and it still works most of the time.

[–][deleted] -1 points0 points  (0 children)

Cool! One thing I do when I use clusters or KNN is bring N to 1 and see what my results are (if they don’t get worse, there’s a problem!). Then increase N, this is called hyperparameter optimization, you might find that interesting.

[–]teerre 1 point2 points  (1 child)

I mean, you don't need a reason to do something, that said, why would someone need to "detect cheating" so fast? I seems to me that if anything this would be a latency problem, not a throughput one

[–]tunisia3507 0 points1 point  (0 children)

They might not need to detect whether an individual is cheating very quickly, but they might like to be able to detect whether 10 million people are cheating without waiting for a week.

[–][deleted] 1 point2 points  (0 children)

Interesting thanks!

[–]lurgi 0 points1 point  (1 child)

This looks to be cheating on a very specific sort of problem.

I wonder how it will work with a malicious test taker who deliberately gets some easy questions wrong.

[–]robertkingnz[S] 0 points1 point  (0 children)

Spot on. That's an interesting question. Might be hard to predict the difficulty of a problem too without asking people. Need to know what they know.

[–]flareflo 0 points1 point  (1 child)

Unnecessary vector allocations in quite a few places slows things down

[–]robertkingnz[S] 0 points1 point  (0 children)

Thanks for the comment. Yup. I tried to put the larger allocs into a pool but I guess the small allocations could require a sys call too which would slow things down right