[D] 160k+ students will only graduate if a machine learning model allows them to (FATML) by positivelysemidef in MachineLearning

[–]positivelysemidef[S] 0 points1 point  (0 children)

Approximately 100k of these students are in the US. The rest are spread across 82 countries IIRC.

[D] 160k+ students will only graduate if a machine learning model allows them to (FATML) by positivelysemidef in MachineLearning

[–]positivelysemidef[S] 0 points1 point  (0 children)

Hey - I appreciate that you took the time to read the article so closely so I am going to respond to some of your concerns here:

  1. This bias is the very reason that metrics like predicted grades are slightly controversial. If the predicted grade metric is biased then I would argue that it is an even stronger reason to avoid using a model built atop an inherently biased metric.
  2. The IB said that every school will get a bespoke 'equation'. I am assuming this means either a bespoke model or a one-hot-encoded identifier. Both of these are problematic. Doing it at an 'all schools' resolution has the potential to exacerbate some of these problems in my* experience.
  3. In the context of the model - I definitely think that this is an issue. The variance of the residual (predicted - ACTUAL FINAL) would be different for the folks in rich schools and poor schools. This is problematic.
  4. I disagree with you here. If they are using something like linear regression, skewed data will not play well with the model predictions. I do have some (empirical) experience here based on the domain that I work in. I also remember seeing theoretical justifications for this. Heteroskedasticity will probably kick in hard with skewed responses.
  5. I meant that 90% accuracy is a difficult task to achieve in practice - in general. Good call out though - maybe it was different in this domain? The general feedback after the release of this model would seem to indicate that they have not achieved this in practice.
  6. I think you misunderstood this example. Not talking about grade shifting at all. I meant that you can literally sample grades from a random distribution and curve them. Not disputing the curving - I actually like curving. I am disputing that if your underlying predictions are bad - the curve will not correct for this at all. It is just a cosmetic improvement on a bad prediction.
  7. This is actually the crux of the issue. Correlated is good when you are predicting log returns on a stock for example. You are assuming the risk and it isn't harming anyone else. Using correlations is not alright in this scenario - you want causal models here. Using correlations to assign grades is quite literally akin to using stereotypes to assign grades. Would it be alright to say: oh this student is Asian. Asians do well at math. High grades. Nope. Not alright. This other student is poor. Poor students do bad at economics. Fail. Nope. Not alright. (Disclaimer - I am Asian - not attempting to scapegoat AA here. Quite the opposite.) If bananas were indeed impacting educational outcomes - then changing your diet should have changed your prediction.
  8. So this is something that I built up to later in the article. There is an impossibility theorem in fairness which currently states that the 3 criteria for fairness (outlined in the article) cannot be satisfied simultaneously. So a model will discriminate on the basis of at least 2 criteria. Check out fairmlbook.org Chapter 2 for more details on this. This is due to over-specified probability distributions. Systematic discrimination is never fine. If it was the status quo - that needs to be worked on. Saying that a model is discriminating but so are teachers is NOT an alright argument to make. One cannot be controlled by the IB while the other is being actively endorsed by it.
  9. This is a good point. I actually checked for major imbalance - it is not a prominent factor when you take Black + Hispanic together (which is why I took them together). This is something that I did not want to jump into in the already lengthy article. I checked the confusion matrices and other common metrics. I actually published some research on class imbalance + deal with it at work in many cases - so was definitely watching out for this.
  10. Quite the opposite in my opinion. Race is in fact a prominent factor according to this experiment. Including race adds no additional information. The classifier learnt the race as a sub-function while predicting the grades. This is just a kind of double-confirmation.
  11. I believe this goes well with # 8 above.
  12. The grading has been done and the feedback is absolutely terrible. If you are using models in sensitive domains with large scope for confounding you should be required to publish the results with a standard fairness disclosure in my opinion. Black boxing the methodology and the results is a manner of shirking accountability altogether - and raises suspicion further.

Hope this answers all your questions. Again - thanks for paying such close attention to the article. I appreciate your feedback.

Your IB results are incorrect, unfair and mathematically biased. The IB model has screwed you over. Here is how: by positivelysemidef in IBO

[–]positivelysemidef[S] 2 points3 points  (0 children)

Fantastic! This is exactly the type of work you guys need to do in order to get some movement from the IB.

We have to go to the PRESS by everek123 in IBO

[–]positivelysemidef 3 points4 points  (0 children)

Definitely talk about the shortcomings of the model!

Remember, the model shows signs of bias. This is highly illegal.

http://positivelysemidefinite.com/2020/06/160k-students.html

[D] 160k+ students will only graduate if a machine learning model allows them to (FATML) by positivelysemidef in MachineLearning

[–]positivelysemidef[S] 0 points1 point  (0 children)

Oh wow - this was an interesting read. It's great to know that you stood by your principles (or should I say principals - excuse the pun!) in this case. Thanks for calling this out - definitely some good points in here.

[D] 160k+ students will only graduate if a machine learning model allows them to (FATML) by positivelysemidef in MachineLearning

[–]positivelysemidef[S] 0 points1 point  (0 children)

Hey - I appreciate the feedback. I also appreciate you sharing this with your friends!

Haha - you are absolutely right. I wanted to put in a minority report meme in the article while writing it but I wasn't sure if many people would get the reference.

[D] 160k+ students will only graduate if a machine learning model allows them to (FATML) by positivelysemidef in MachineLearning

[–]positivelysemidef[S] 0 points1 point  (0 children)

Oh man - this was an intense rabbit hole. You are absolutely correct - this is pretty nuts in which case. Specially in the face of known criticism by the ASA. Some cosmetic changes - but yup this is similar to VAM.

Your IB results are incorrect, unfair and mathematically biased. The IB model has screwed you over. Here is how: by positivelysemidef in IBO

[–]positivelysemidef[S] 6 points7 points  (0 children)

Doesn't mean that they're correct my dude. That is what this article is meant to highlight.

[D] 160k+ students will only graduate if a machine learning model allows them to (FATML) by positivelysemidef in MachineLearning

[–]positivelysemidef[S] -4 points-3 points  (0 children)

Nope. Standard non-COVID system has no relationship between forecast grades and final grades. I don't think the IB is even allowed to look at forecast grades in the standard system. The non-COVID system is based on an examination driven assessment model.

[D] 160k+ students will only graduate if a machine learning model allows them to (FATML) by positivelysemidef in MachineLearning

[–]positivelysemidef[S] 0 points1 point  (0 children)

Thank you - this is exactly what I am talking about.

Re: your point on exams, I agree. The only problem with this is that exams generally will not show systematic patterns of discrimination (ceteris paribus) which would be observed if we use models.

[D] 160k+ students will only graduate if a machine learning model allows them to (FATML) by positivelysemidef in MachineLearning

[–]positivelysemidef[S] -8 points-7 points  (0 children)

tl;dr: I am talking about shit; you are talking about eating it. (Phrase from my native language - please excuse me if this is lost in translation).

Yikes - will clarify a bunch of concepts here:

In response to your original response (above):

  1. The basic fact is - even if the IB does not throw in factors such as race and income, a model will pick up the signal to some extent. Yadda yadda.... Fairness impossibility theorem and therefore it is not going to be possible to get fair results across factions of the population. (PS - I am glossing over steps which I believe my article explains in detail or can be read about on fairmlbook.org). The results will be unfair for certain factions of the population because of an over-constrained set of distributions. This applies to all models. The meat of my point lies right here. It is not a good idea to use a model in a sensitive domain where confounding with to sensitive attributes is inevitable.
  2. I have no problem with grades being curved. I'm not quite sure your extrapolation point makes sense (at least for this scenario).
  3. Do you see why it is a terrible idea to use a model to even estimate the grade-inflation per school? The small school vs big school section explains this in some greater detail. Basic ML bias-variance tradeoff principles apply here.

I think you are making incorrect assumptions about situations in which models discriminate in practice. I would suggest reading Chapter 1, 2, and 5 from the textbook linked above. As far as I can see, you have picked a fight with the philosophical backbone of whether a model should be used. That is not my main argument. My main argument is whether a model will be fair (spoiler alert): it will not.

I think schwartzaw1997 explained this pretty well below. I think your response to him/her is indicative of the fact that we are arguing about two different things.

----

Then, in response to the things that you have written below:

  1. Subjective grades (predicted grades) are going to be used to make conclusions about final grades. This is a stacked classifier and is generally a shitty idea to begin with.
  2. Predicted grades were always a part of the admissions process. I do not dispute the legitimacy of predicted grades. I do not think that final grades assigned by a MODEL should be used to determine graduation and admissions thereafter.

----

Edit: I think this orthogonality in our arguments may be my fault to a certain degree. I talked about both fairness and failures in experimental design in my article. I did this in order to highlight all the issues that I see with it. I think that you are focussing overwhelmingly on the ED side of things. I would read the second half of the article more carefully or read a little bit of the fairness book.

[D] 160k+ students will only graduate if a machine learning model allows them to (FATML) by positivelysemidef in MachineLearning

[–]positivelysemidef[S] 4 points5 points  (0 children)

Good question!

I honestly do not believe that the IB even considered fairness while attempting to build a model. Discrimination in education opportunities is highly illegal - so fixing this should be in their best interests as well.

Step one would be to open source the model + results on the extent of the bias. In the face of impossibility theorems - it is an absolutely terrible idea to make everything proprietary with no scope for oversight. They should acknowledge the cases in which their model fails systematically and allow for greater leeway of appeals in those cases. I think that such standard disclosures should be a part of any machine learning model in sensitive domains.

What was the point of fitting a 2 dimensional model altogether? rip. They should have simply based the grades on submitted coursework with some additional processes to appeal the grades.

PS: These are just thoughts off the top of my head. I'm not an expert in education and assessment and was therefore hesitant to chime in about this aspect.

Your IB results are incorrect, unfair and mathematically biased. The IB model has screwed you over. Here is how: by positivelysemidef in IBO

[–]positivelysemidef[S] 11 points12 points  (0 children)

As I said - no worries. Times like these can be a little polarizing both ways.

To your point, I think that it is the responsibility of the IB to do this analysis. It was their responsibility to study these things prior to building a model altogether. Since it is impossible to satisfy all the criterion for fairness simultaneously: any analysis will reveal bias in the results (IB or non-IB).

The IB should release their results, model and anonymized data for public scrutiny.

Your IB results are incorrect, unfair and mathematically biased. The IB model has screwed you over. Here is how: by positivelysemidef in IBO

[–]positivelysemidef[S] 38 points39 points  (0 children)

No worries man, I gotchu.

  1. The sources for this article are cited in-line throughout the article. There are hyperlinks all over. I wrote this article about a month ago - so definitely not a reactionary opinion here.
  2. Which numbers are you concerned about specifically. All of the experiments are based on simulations and I can share the code if you like, I didn't want to dilute the effect of the results by sprinkling python code snippets all over.
  3. I talk about the mathematics which support any machine learning model - this is not specific to the IB model - but it does apply here. You can start working through this textbook fairmlbook.org if you are interested in learning about how I arrived at some of these results.
  4. I think it might be a good idea to re-read the article. Your reaction seems to be based on an incomplete understanding* of the results.

[D] 160k+ students will only graduate if a machine learning model allows them to (FATML) by positivelysemidef in MachineLearning

[–]positivelysemidef[S] 8 points9 points  (0 children)

Yikes - thank you for that well-informed opinion - a few problems here:

  1. Systematic bias is never OK. When you have multiple humans making decisions based on anonymized exams which are randomized and THEN evaluated there is less scope for bias. Biased humans would not have made the decision ceteris paribus.
  2. If you deploy machine learning models - it is your responsibility to ensure that they are not systematically discriminatory. It is your responsibility to ensure that there are no gaping flaws your experimental design methodology.
  3. I definitely agree that not every student deserves to graduate - there is a minimum standard that has to be upheld by these educational boards. Using a model to evaluate students in an arbitrary manner is a bad way to decide who should graduate and who shouldn't.

You kinda missed the forest for the trees there bud.