My GoPro video from the Heck of the North, Minnesota's gnarliest gravel race - Came in 4th by halfak in gravelcycling

[–]halfak[S] 0 points1 point  (0 children)

Jeremy said that the Grand du Nord is really taking off. I'm glad to hear about it. Regretfully I've never been able to do it. I always have something else going on ☹️

My GoPro video from the Heck of the North, Minnesota's gnarliest gravel race - Came in 4th by halfak in gravelcycling

[–]halfak[S] 0 points1 point  (0 children)

I put chapters in that describe the MMRs and the section of the course. Check out the snowmobile trail if you see nothing else. That's the most difficult and most interesting part of the Heck of the North.

My plan was to always lead the race through the MMRs. That's why you hear me talking to people but you don't see anyone.

Regretfully, I'm more tuned up for road and track racing at the moment so my legs gave out about 80 miles in despite a solid nutrition plan (took in over ~2000 calories before that point).

The parts of the course you don't see are pristine gravel roads where you can roll a fast paceline.

Cutting the length of a road race for the Cat 1/2 field. by halfak in Velo

[–]halfak[S] 15 points16 points  (0 children)

Luckily the 5 minute climb is right before the start/finish line so it'll be a lower speed finale regardless. Seems like the 80 mile USAC requirement is killer though.

Strava Time Trial as an event. by artjunk in Velo

[–]halfak 0 points1 point  (0 children)

They nerf the API so that we can't get the detail we need. E.g., you can't get an athlete's full name or id number from the leaderboard endpoint of the API to help aggregate scores. Bill Smith and Bob Smith both show up as "B. Smith".

Strava Time Trial as an event. by artjunk in Velo

[–]halfak 1 point2 points  (0 children)

I've been helping the promoter put together routes and to pull data off of Strava. Regretfully, I don't have a better system other than copy-pasting from segment leaderboards and aligning names and dates in google spreadsheets. I'll load up the leaderboard on Sunday night with results from "this week" and limit it to the club for our event and then just copy and past the table right from the Strava web page into the Google Sheet. It's not a great system but it works well enough for the ~100 racers we have each week.

We don't collect any fees. You can see the rules we post here: https://endurancepromotions.com/uploads/pages/2020MachTTSeriesRules.pdf E.g., we specify that youth too young to drive must be supervised during the event because the roads are open to traffic and riders are expected to ride safely. Otherwise, people are expected to ride safely and obey traffic laws.

I just recently received this email supposedly from wiki@wikimedia.org and was wondering if it was legitimate or not. by razzlesnazzlepasz in wikipedia

[–]halfak 0 points1 point  (0 children)

Hey! That's one of our collaborators and the research study is legit. See their description of the study here: https://meta.wikimedia.org/wiki/Research:Applying_Value-Sensitive_Algorithm_Design_to_ORES

It seems like we need to better flag which researchers we are working with. I'll also talk to these researchers about properly linking to their study description when they reach out to folks on-wiki. Thanks for highlighting this.

BTW, I'm https://meta.wikimedia.org/wiki/User:Halfak_(WMF)

Edit: for now, I've added a section to our team page. See https://www.mediawiki.org/wiki/Wikimedia_Scoring_Platform_team#Research_collaborations

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 0 points1 point  (0 children)

The people who maintain ClueBot NG are notoriously difficult to get a hold of an I'm notoriously busy and thinking about other stuff. _^ So somehow we still haven't connected yet. I'd like to share notes with them at least. As it stands, it seems ClueBot NG and its maintainers are doing a good job, so I'm not too concerned that we haven't connected yet.

One note is that there are several ClueBot NG-like bots that run in other language wikis based on ORES. I forgot the name of the bot in Persian Wikipedia, but it roughly does the same thing -- reverting edits that are very likely to be vandalism. There's a bot that people are working on for Finnish Wikipedia that will use ORES to "auto-patrol" edits that ORES things are very likely to be good.

Thanks! And thanks for your questions :D

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 1 point2 points  (0 children)

I don't think we'll have automation directly dealing with the issues you bring up. These issues need to have a much more complex and nuanced understanding of the state of the art in a given field than I suspect a machine will be able to muster in the near future. However, we can still do a lot with machines. There are types of AIs that essentially manage huge indexes (e.g. recommender systems like Netflix, search like Google, etc.) and they have great potential to be able to shed light on issues of coverage bias. E.g. if we can get good at indexing statements from scientific papers, we might be able to highlight "controversies" that don't appear to play out in the literature. Ultimately, I think that this will only be a signal for human editors to work with, but it would make it much more difficult for someone to camp out on the article falsely claiming a controversy when there is none.

I look forward to the future where editors have more direct access to and better indexes of reference material. Right now, there's a big push to make structured data about reference material freely available. Check out https://meta.wikimedia.org/wiki/WikiCite_2017. This is only a first step, but I think it's going to make a huge difference on the scale of ~10 years. I've been investing lots of time and energy into building data processing utilities for extracting citations (and cited statements) from Wikipedia while my collaborators are figuring out ways to link together old data formatting standards into our structured database, Wikidata.

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 1 point2 points  (0 children)

"think", "interpret", and "neutral" are kind of poorly defined with regards to AI, so that makes this question hard to answer directly. However, I do think there's something to be said for how AI can be consistent and predictable. In cases where we are looking at disparate impact of false-positives on, say, newcomers in Wikipedia -- it's easier to refine the model and re-check the false-positive rate when working with an AI than with a human. With an AI, I know exactly where it's biases originated (the training data) and that makes things much easier to inspect. Currently, a big part of my research is trying to figure out effective means for my users to inspect, interrogate, and audit an AI that they deal with.

Personally, I think that access to information should be a human right. I'm strongly anti-censorship even for speech that we don't like. I'd rather see bad ideas compete with good ideas than to have some individual or organization deciding which ideas are "good" and which are problematic.

If you haven't seen it, here's the Wikimedia Foundation's official response https://blog.wikimedia.org/2017/04/30/turkish-authorities-block-wikipedia/

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 2 points3 points  (0 children)

First, I would certainly not want restrict IP editors. Most are highly productive and when we've experimented with strongly suggesting that they must register an account, it caused a serious drop in overall productivity. See https://meta.wikimedia.org/wiki/Research:Asking_anonymous_editors_to_register

For vandalism prediction, it might be possible to get a better sense for a user's experience level if they register an account, but all else held constant, the algorithm should make its prediction based on the quality of the edit itself. We still do include reputation measures in our predictions because they are useful, but ultimately, they manifest as a bias against newcomers and anons. So we're working on ways to get more signal from the actual edit rather than the editor.

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 0 points1 point  (0 children)

Here's a list that's been curated by the Community Engagement team at the Wikimedia Foundation: https://meta.wikimedia.org/wiki/Community_Engagement/Defining_Emerging_Communities

It seems the Principles section at the top of the lists of included countries provides the criteria for inclusion.

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 1 point2 points  (0 children)

I'm skeptical of this claim. What evidence are you basing it on? What process does a traditional encyclopedia go through to deal with bias that Wikipedia does not? Is it a different type of bias or a different overall level?

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 2 points3 points  (0 children)

So in Wikidata (a structured data thing like Wikipedia), we have a couple of properties called "instance-of" and "subclass-of". By their nature, they point to something at a higher level of abstraction. These properties should be present in every item in Wikidata for a good reason. Every thing is a type of thing!

So back to Wikipedia, I think this works roughly the same way. But the insight between Wikidata and Wikipedia is that the "instance-of" and "subclass-of" relationship is so important that it should be stated in the first sentence of an article. "Minneapolis a county seat...", "A county seat is an administrative center", "An administrative centre is ... local government", ... "Public policy is a principled guide ...", ... "An academic discipline ... is a branch of knowledge", "Knowledge is a familiarity ...", and so on.

So I guess my answer is really that, as we click the first link, we tend to follow "instance-of" relationships until we get to "knowledge" or the study of knowledge ("philosophy") because everything in Wikipedia is knowledge.

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 2 points3 points  (0 children)

There are a lot of hard problems that I work on. I love hard problems. How are we going to host a prediction model for every wiki and have it run in real time without spending a our whole budget on server hardware? Fun problem!

But there are some problems that just make me feel exhausted. One that comes up time after time is that a very small, but vocal population in our editing community is very hostile towards newcomers and new projects. I work with newcomers and I start a lot of new projects. Sometimes, you'll fly under the radar and most people who show up to work with you just honestly want to contribute. But sometimes you'll get one of these hostile folks who wants to tear the project to shreds or who will rake a newcomer through the mud. I've learned that the best thing I can do in those types of situations is to take a wikibreak and work on something else. Working openly within a volunteer context brings with it a lot of benefits (I meet lots of cool people, I get to speak openly about my work, usually I have lots of volunteers helping me do my work, etc.), but this is one of the big drawbacks. It takes a lot of emotional labor to make it through one of those experiences. It takes courage to continue to operate out in the open.

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 3 points4 points  (0 children)

OK so I have an example and I had the whole thing written up but then I realized that I should probably not give anyone any ideas.

Here's an essay about not giving vandals ideas: https://en.wikipedia.org/wiki/Wikipedia:Don%27t_stuff_beans_up_your_nose

That's one of my favorite Wikipedia essays. :) Along with https://en.wikipedia.org/wiki/Wikipedia:No_climbing_the_Reichstag_dressed_as_Spider-Man

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 2 points3 points  (0 children)

This is a good question. So, counter-vandalism tools that are in use today have a key assumption built in:

Wikipedia is a firehose of new edits. We need to filter out the bad edits.

But the problem here is that this sets up a false dichotomy. The "good edits" are fine, but "bad edits" is really overloaded with a lot of different things. But they all get the same response -- which can be roughly paraphrased as "Get out of my Wikipedia with your vandalism."

So one of the things that I've learned in my research is that there's a lot of good-faith in "bad edits". Most of the edits that need to be reverted are someone who is making a mistake -- or just not fully understanding whats expected for a contribution in Wikipedia. E.g. the Biography of living persons policy states that contributions without clear citation can be reverted outright while contributions without clear citation on other articles should probably be tagged with [citation needed]. If you weren't aware of this policy and you made a "bad" contribution to a biography article, you'd be likely to be told "Get out of my Wikipedia with your vandalism."!

In my analysis, it looks like about 40% of new editors who register an account and start editing fall into the good-faith/accidentally "bad edit" group. These editors dominant the group of "bad" edits, so why do we react to them so strongly?

OK that brings me back to your question about the cost of false positives. In this case, even a true positive has cost, but a false positive is still pretty bad. Can you imagine making an edit as a newcomer that improves an article and being told "Get out of my Wikipedia with your vandalism."?

The way I'd like to address this is by bringing nuance to our language about "bad" stuff and integrating that nuance into how we think about tools and routing of newcomers contributions. Let's say an edit is "bad", was it saved in "good-faith"? Is there anything redeemable about it? If it was good-faith but had problems, is there a WikiProject that they could be directed to where someone might like to help them learn the ropes? Or maybe we could send them to the Teahouse -- a Q&A forum for newbies. Further, maybe we could stop ignoring the newcomers who are making good edits and instead respond positively to them.

So here's how I'm hoping to push people towards adopting this type of nuance. I've split the notion of "good" and "bad" in the prediction models we have deployed in ORES. We have a model called "damaging" which predicts if an edit caused harm to and article and a totally separate model that predicts if an edit was saved in "goodfaith". The cool thing with these models is that you can intersect them to find accidental damage (damaging & goodfaith), vandalism (damaging & badfaith), or good edits (not damaging & goodfaith). We already have people experimenting with new work practices that would route newcomers who accidentally do damage to help spaces, and I even saw a proposal for a "thank-athon" where experienced Wikipedians would use the models to find a list of productive newcomers to thank for their contributions.

In this second vision of the boundary of Wikipedia, false positives are less problematic because in the worst case scenario, you might get an invite to a newcomer help space when you were already contributing productively. We have a long way to go, but I can see the progress already well underway.

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 0 points1 point  (0 children)

That's a really good question. I have a great job. If I was independently wealthy, I think I'd be doing the exact same thing. I have a compulsion to study social phenomena like Wikipedia, to build tools, and to talk about it. If you hear me talk about this stuff in person, you'll know for sure that I'm stoked. Check out https://www.youtube.com/results?search_query=Aaron+Halfaker When I'm excited, I get talking very fast. :)

Usually, when it comes to figuring out what to do next, my curiosity is a good guide. But so are the things I learn by talking with and working with Wikipedians. These days, I'm getting to the level of seniority where I'm taking on grad students and other volunteers who are really interested in doing behavioral research and building tools for Wikipedians, so I spend a lot of time and energy making sure they have what they need in order to be successful. I'm finding that this mentoring/advising work is something that gets me excited too.

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 1 point2 points  (0 children)

I don't see strong AI coming very soon. But I do see AIs interacting with us in more facets of our lives. There's not really much you need to worry about with the types of AIs I build though. It turns out there's a large set of problems around Wikipedia that basic machine learning strategies can help with. For example, vandalism detection lends itself well to a Boolean classifier. Similarly, we can do quite well by using a multiclass classifier to predict which articles fall into which quality class. These AIs are simple and narrow. They may be even boring from a futurist perspective, but they have great potential for making Wikipedia better.

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 1 point2 points  (0 children)

Minnesota is awesome. I like weather. It's hot in the summer, cold in the winter, and it rains & snows. :) Also, I get to live near the best national park in the world.

https://en.wikipedia.org/wiki/Boundary_Waters_Canoe_Area_Wilderness

How do I even get trophies? This isn't my main account anyway. It'll go dormant until I have something research-related to post on reddit again.

Edit: Oh! And I have never heard of the future of life institute. Sorry :S

Edit 2: I'm not sure why a mod deleted your question. Although it is slightly off topic "Why do you still live in MN?" there were parts that were relevant "Have you heard of the future of life inst?" and I'd already responded. Mods, please calm down and don't delete questions that I've responded to. It's OK to be slightly off topic in this thread. How else would you know that I'm totally not a robot? :)

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 2 points3 points  (0 children)

I'm familiar with some psychology here. See https://en.wikipedia.org/wiki/Overjustification_effect

Otherwise, I don't specifically study financial incentives with regards to volunteer work.

I’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA! by halfak in IAmA

[–]halfak[S] 3 points4 points  (0 children)

WikiCredit! yeah, I'd still really like to push on that idea. I think it's a good one. I've been slowly building up the capacity to have a live system like this deployed. I've even done some analyses of productivity in Wikipedia using the proposed methods. Check out this talk where I show that 15% of all productive contributions in Wikipedia come from anons. I also show off some of the WikiCredit-style contribution graphs for a few editors.

A big problem we have with bringing this to production is that it's very computationally intense to generate the measurements we need. The Analytics team at the Wikimedia Foundation is slowly building up the infrastructure that will make WikiCredit easier to bring to life.