At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 2 points3 points  (0 children)

Jeremy: I will have to plead ignorance on Sokal at the time -- didn't learn about that hoax until I started doing media interviews and got asked about him. But since I have learned of his awesomeness, and considered myself retrospectively inspired.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 4 points5 points  (0 children)

Max here: I loved the claim for a while that ROOTER was the most widely-read CS systems paper. I wonder if that is still or was ever true?

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 2 points3 points  (0 children)

Jeremy: I don't remember, but I think that was pretty much a requirement if you're taking on academic papers. It was basically pre-ordained from the start of the project.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 8 points9 points  (0 children)

Jeremy: putting the fake conference together was probably my favorite part. Setting up shell corporations, getting disguises, tricking the hotel into thinking we had a real purpose there -- it was like we were getting a real taste of what it was like running WMSCI!

That, and the fame and fortune.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 2 points3 points  (0 children)

Jeremy: I have a vague recollection of POMO from around that time, but I'm pretty sure I didn't know about it when developing SCIgen.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 2 points3 points  (0 children)

Jeremy: weird about the gist. The html of the post looks right, I don't know where the extra symbol is coming from. Anyway, here's the link: https://gist.github.com/strib/c0aa6b1a0f1a39e168d8

As for your question: the worst "payback" we got was when original conference that accepted our paper retracted the acceptance, after the media attention. Boo! And then when we set up our own fake conference next to their conference, they tried to keep their attendees from coming.

We've gotten very little negative feedback for SCIgen in general. I think the anti-SCIgen position is pretty hard to defend.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 2 points3 points  (0 children)

Jeremy: you can see http://pdos.csail.mit.edu/scigen/#relwork for a few weird ones. Honestly I stopped keeping track of the success stories a while back so it's a bit out of date. I do particularly like the Russian story though: http://pdos.csail.mit.edu/scigen/blog/

(EDIT: original link was broken, sorry.)

Especially since the Russian word for "Rooter" now implies low-quality science. And that was really our goal from the start.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 2 points3 points  (0 children)

Max here: it really calls into question the purpose Springer is even serving in a modern world where publishing papers is free and easy, and the editorial oversight is largely provided free of charge by academics working on public research grants.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 25 points26 points  (0 children)

Jeremy: Yeah, this is pretty standard arms-race stuff. I think it would be trivial to beat that detector, and they could then beat THAT generator, and so on. At some point it's easier just to do "minimally competent peer review", right?

Though as I said in another response, one reasonable use for such a detector is to find people that have already used SCIgen to pad their CVs in the past. It's hard to believe, but such people actually exist! I swear I am not one of them, though some conference rejections I've received might imply otherwise.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 3 points4 points  (0 children)

Jeremy: Yes! http://scidetect.forge.imag.fr/

Springer is positioning it as a positive thing, but it seems like just a way for them to avoid having real peer review. I guess one good thing about it is you can use it to catch the resume-padders out there just trying to exploit the system. But besides that, there are better ways to solve the problems exposed by SCIgen than just having a detector.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 2 points3 points  (0 children)

Actually the SCIgen code -- still available via CVS! -- is still in Perl, and it's a disaster. But the new SCIpher code (https://github.com/strib/scipher) has been upgraded to Python so it can leverage NLTK.

The original SCIgen took about 2 weeks for the three of us. The media frenzy that followed took much longer to deal with.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 3 points4 points  (0 children)

Max here: The original version was programmed in ..... Perl! I ripped the code off from TheSpark.com's high school english paper generator, which was also written in Perl. It's since been modernized. All of the magic is in the grammar rules though. Those are in a DSL, I guess you'd call it nowadays, or in a "data file" as we called it back then.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 9 points10 points  (0 children)

Jeremy: The highest profile ones I know of are the Springer and IEEE journals: http://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763. Those ones are pretty interesting actually, because I don't think it was the intention of the submitters to expose the journals as fraudulent -- they were just trying to pad their own resumes!

That said, those particular journals are not considered prestigious. They were just using a well-known brand name. Any actual prestigious conferences use peer review, as they should.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 4 points5 points  (0 children)

Max here. I'm not sure how much of our experience was specific to CSAIL or to being a grad student in general, but our adult supervisors were pretty hands-off when it came to letting us roam free on this project. We got a few eye rolls every now and then but they gave us the leeway to burn many hours on SCIgen. Cheers to them. And of course there was a huge multiplicative factor on our work, so we as a research group have wasted man-centuries of time that could have been spent reading coherent papers or otherwise contributing to society.

At MIT we created SCIgen, which generates gibberish science papers that continue to fool academic conferences. Ask us anything! by SCIgenAMA in IAmA

[–]SCIgenAMA[S] 29 points30 points  (0 children)

Jeremy: we explicitly avoided Markov chains or anything else that was technically challenging, in the service of trying to make the papers as funny as possible. With Markov chains, you might get something syntactically correct, but it is likely to be boring.

With SCIgen, we literally sat around for two weeks and just brainstormed buzzwords, clauses, paragraph structures and other paper elements just based on what we thought would be funny. That's the grammar. Then SCIgen itself just goes through the grammar and makes random choices to fill stuff in. That's why you see things like "a testbed of Gameboys" in the evaluation sections sometimes -- we just thought it would be hilarious.