TechnologyI’m the principal research scientist at the nonprofit behind Wikipedia. I study AIs and community dynamics. AMA!

Jun 1st 2017 by halfak • 9 Questions • 99 Points

Hello, I’m Aaron Halfaker, and I’m a principal research scientist at the Wikimedia Foundation, the nonprofit that supports and operates Wikipedia and several other free knowledge projects.

I’m here today to talk about the work I do with wiki knowledge communities. I study the work patterns of the volunteers who build Wikipedia and I build artificial intelligences and other tools to support them. I’ve studying how crowds of volunteers build massive, high quality information resources like Wikipedia for over ten years.

A little background about me: I have a PhD degree in computer science from the Grouplens Research lab at the University of Minnesota. I research the design technologies that make it easier to spot vandalism and support goodfaith newcomers in Wikipedia. I think a lot about the dynamics between communities and new users—and ways to make communities inviting.

I’m very excited to be doing an AMA today and sharing more details about how Wikipedia functions, how the community scaled their process to handle an internet’s worth of contribution and how we can use artificial intelligence to support open knowledge production work.

I’ll try to answer any questions you have about community dynamics, the ethics of AI and how we think about artificial intelligence on Wikipedia, and ways we’re working to counteract vandalism on the world’s largest crowdsourced source of knowledge. One of the nice things about working at Wikipedia is that we make almost all of our work public, so if you’re interested in learning more about this stuff, you can read about the team I run or the activities of the larger research team.

My Proof:

Edit 1: I confirm that I work with this /u/Ladsgroup_Wiki guy. He's awesome. :)

Edit 2: Alright folks, it's been a great time. Thanks for asking some great questions and engaging some awesome discussion. I've got to go do some other things with my evening now, but I'll be back tomorrow morning (16 hours or so from now) to answer any more questions that come in. o/


Hi Aaron, thanks for doing the AMA. How does the ORES quality assurance service currently fit in with other vandalism detection methods, such as CluebotNG on English Wikipedia?


Good Q. So, all of the vandal fighting systems in Wikipedia rely on a machine learning model that predicts which edits are likely problematic. There's ClueBot NG that automatically reverts very very bad edits and tools like Huggle/STiki that prioritize likely bad edits for human review. Before ORES, each of these tools used their own machine learning model. This would have been fine, but it's actually quite a lot of work to stand one of those models and maintain it so that it runs in real time. I think if it weren't so difficult, we'd see a lot more vandal fighting tools that use an AI. That's where ORES comes in.

ORES centralizes the problem of machine prediction so that tool/bot developers can think about the problem space and use interaction that they want to support rather than having to do the heavy lifting of the machine learning modeling stuff. Instead, developers only need to figure out how to use a simple API in order to get predictions in their tools. Currently, Huggle has switched over to using ORES, but I don't think ClueBot NG has. The developer of STiki was one of our key collaborators during the development of ORES. There are now many new tools that have come out in the past few years that use ORES.


Hey, I'm Amir and I work with Aaron in the scoring platform team. This is mostly related to each Wiki's different policies and topic of the article. For example for English Wikipedia there is a page explaining on when the page needs to be protected by volunteers (to clarify, protection is done by volunteers and staff won't do this directly unless in extreme cases). As a rule of thumb, when I have my volunteer hat on, I usually protect after three vandalizing edits.


Ladsgroup_Wiki's answer is great, but I just wanted to take the opportunity to share my favorite example of a protected page: Elephant

This page has been protected since 2006 when Colbert vandalized it on-air. Check out this awesome Wikipedia article: (Because of course there's a Wikipedia article about that)


Hi Aaron, sometimes part of the community could be "a bit reluctant" to embrace new technologies (i.e. edit interface, discussion, etc). In your experience, is there a similar behavior about studies? Have you ever faced a "no, that's not true!" comment from part of the community as a reaction to your finding? Thanks!


Yeah! Good question. My most seminal study The Rise and Decline, my collaborators and I highlight how the strong, negative reaction to newcomers in Wikipedia is causing a serious crisis for the editing community. I've been doing a lot of outreach to get the word out over the last 5 years, but I still come across people who are hard to convince. Still, I find that a healthy dose of empiricism (see the paper) is useful when talking to anyone who I want to convince. Still I often end up learning new things through conversations. Sometimes a study misses the point and researchers aren't Right(TM). I like to do trace ethnography and interviews in combination with my analyses to make sure I'm not getting things totally wrong.

Edit: I should mention that the strong negative reaction towards newcomers wasn't malicious but rather a response to huge quality control needs that Wikipedians needed. In a followup paper I talk about how it was that Wikipedia's quality control processes got so aggressive. A big part of my work today is trying to figure out how to have both efficient quality control and good newcomer socialization.


Mr. Halfaker, how often are you on Reddit and what are your favorite subreddits?


Hey! So I've been on reddit since ... 2006ish. I'm a daily user. Obviously I'm using this account so I don't mix my personal stuff with work stuff. _^ My favorite subreddit is /r/youtubehaiku. I think /r/HumansBeingBros is pretty awesome too.


What? I totally thought you would be a fan of r/totallynotrobots/


Oh! I like that sub too. :)


Hi Aaron, it's nice yo find you here! From your point of view, what is missing in the Natural Language Processing or Artificial Intelligence area? I mean, are there any particular needs or tasks at Wikimedia Foundation that you have not been able to solve due to the unreliable, or missing, state of the art in these science areas?


One of my main frustrations with the AI literature around Wikipedia/Wikidata/etc. is that the models that people build are not intended to work in realtime. There are a lot of interesting and difficult engineering problems involved in detecting vandalism in real time that disappear when your only goal is to make a higher fitness model that can take any finite amount of time to train and test. I often review research papers about an exciting new strategy <foo>, but there's either no discussion of performance or I find out at the end that scoring a single edit takes several minutes. :S

I guess one thing that I'd like is a nice way to process natural language into parse trees for more than just a couple of language. E.g. spaCy only works for English and German. I need to be able to support Tamil and Korean too! It's hard to invest in a technology that's only going to help a small subset of our volunteers.


Hi Aaron, thanks for taking the time to speak with redditors!

I'm curious about how you, Wikimedia, and Wikimedians evaluate ORES. Based on your work with Huggle/Snuggle, it seems like there are two kinds of evaluation for any human-facing machine learning system: (a) recognition and (b) action.

The first way involves the classic machine learning questions of precision/recall: how well can the system detect what the Wikimedians consider to be abuse, and does it have unfair biases? As I understand it, you've designed ORES in such a way that community members can contest and reshape this part of the system.

The second way to evaluate a system has less to do with what it can recognize and much more to do with what people and computers do with that knowledge: as you wrote in your Snuggle paper, one could deploy all sorts of interventions at the end of an AI system that recognizes vandalism or abuse: one could ban people, remove their comments, or offer them mentorship. These interventions also need to be evaluated, but this evaluation requires stepping back from the question "did this make the right decision here" to ask "what should we do when we recognize a situation of a certain kind, and do those interventions achieve what we hope?"

As AI ethics becomes more of a conversation, it seems to me that almost all of the focus is on the first kind of evaluation: the inner workings and recognition of AI systems rather than the use that then follows that recognition. Is that also true for ORES? When you evaluate Wikimedia's work, do you think of evaluation in these two ways, and if so, what are your thoughts on evaluating the ethics of the outcomes of AI-steered interventions?


Hi Nate! Great to see you here and awesome question as expected.

So, re. the first way, we're investing a lot here, but I think we're very different from the rest of the field. Right now, I'm focused on improving auditing strategies that our users will be able to use to discover and highlight what predictions are working and what predictions are not working at all (e.g. false-positives in vandal fighting). Experience has shown that Wikipedians will essentially do grounded theory to figure out what types of things ORES does wrong. This is extremely valuable for repairing our training and testing process WRT what people discover in the field -- when they are using predictions within their tools.

For the second way -- I'd like to aim to be a bit less patriarchal. How should people use an AI? Well, the thing I'm worried about is that I'm not the right person to answer that. I don't know if any individual or group is. So my next thought is, what kinds of social infrastructures should we have to manage the way people use an AI? This is where I like to draw from standpoint epistemology and ask myself who gets access to the "conversation" and who doesn't. We do have conversations about the kind of technologies we do and do not want to operate around Wikipedia, but I'd like to extend the notion of "conversation" to the technology development as well. Who isn't involved in technology develop but should be? Wikipedia is an incredible space to be exploring how this works out because it's open and there's no (or little) money to be made. We can openly discuss what techs we want, which ones we don't want, and we can experiment with new things. My bet is that when you lower barriers to both the human-language and technology conversations, we'll all become more articulate in discussing the technologies we want and don't want. And that out of this will come ethical systems. As a scientist, if I'm wrong, that'll be very interesting and I look forward to learning from that.


Hi Aaron, did you mean to post in the wrong kill the keg thread?


I didn't, but I'm still going to be at Stub and Herbs tonight. :D


Hi Aaron! I've wondered a lot lately, how big do you think Wikipedia will get? We're currently at 5.4 million articles on English Wikipedia, do you see that going to 50, or 500 million? If so, how do you think Wikipedia can get there?

Relatedly, do you think that the old camps of deletionist/inclusionist are still prevalent and relevant today? Do you tend toward one side or the other?


Fun question. Here's an essay I wrote on the subject.

Based on my estimates of labor hours into Wikipedia and how many articles we have yet to write, I'm guessing it'll take us about 5000 years to finish Wikipedia assuming we don't have a big population increase in the meantime.

See also this great essay by Emijrp:

Generally, I'm an inclusion. A big part of my push towards building article quality prediction models is to make a clear differentiation between new articles that are clearly problematic (spam, vandalism, personal attacks) and new articles that are just of borderline notability. I think we do better when we focus on trimming the bad and less on trimming the not-notable-enough.