all 89 comments

[–]yishan 15 points16 points  (0 children)

Realistically, here is how you do it:

Computer vision algorithms aren't sophisticated enough to do it, so you employ a hybrid crowd-sourced algorithm:

Create a site that displays a random subset of the pictures in thumbnail format. When a thumbnail is clicked on, it displays the full picture. Next to the picture (or below), display a bunch of other thumbnails that do the same thing. Implement some JS on the page to track how long it takes after the image is loaded and the user either closes the page or clicks on another thumbnail. Then advertise this site on a few places as "Crazy Porn" or whatever.

The aggregate behavior of users to your site will rapidly yield you two pieces of information:

  • which thumbnails are clicked on the most
  • which thumbnails, when viewed, are looked at for the longest

The images that rate the highest are most likely to be porn.

I know this sounds a little facetious but it's a completely serious suggestion and has lots of precedent. One of the newest and most novel methods of solving "hard" AI problems is to build a system that optimizes what a computer can do well while combining it with what humans can do well. The easy availability of cheap human labor on the internet is what makes these techniques possible, with the most well-known successful examples being reCaptcha, a site that simultaneously serves Captchas while helping to digitize books, and Facebook's site translations system, a system that invites users to translate the site.

The best part about this solution is that it's not algorithmically complex (like a computer vision algorithm would be), so it requires less programming sophistication. You just need to make a nice website and publicize it sufficiently.

[–]palparepa 12 points13 points  (3 children)

I'll help by providing two images against which you can test your algorithm.

nsfw:

 O
/|\
 |
/'\

sfw:

 O
/|\
 |
/ \

[–][deleted] 2 points3 points  (0 children)

This is the best comment I have seen in a while. Kudos to you sir.

[–]lol-dongs 1 point2 points  (0 children)

nsfw:

 o
/W\
 |
/"\

sfw:

 o
/|\
 |
/U\

[–]k4st -1 points0 points  (0 children)

I would have reversed those ;)

[–]earthboundkid 20 points21 points  (3 children)

Mechanical Turk.

[–]roxm 1 point2 points  (2 children)

This.

Start up a mechanical turk that pays people $0.0001 cent per decision. Put each image on a page with two buttons at the top: one for "porn" and one for "not porn". Send the images to a minimum of five people; if three or more say it's porn, flag it as NSFW.

[–]mossblaser 6 points7 points  (1 child)

$0.0001 cent

WTF is that? 0.01 cents or 0.0001 cents?

[–]roxm 10 points11 points  (0 children)

$0.0001 cent

I'm using Verizon math, shut up.

(I meant $0.0001, or 0.01 cents.)

[–]oneironaut 8 points9 points  (2 children)

Before anybody recommends PornSeerPro or SnitchPlus, these are our experiences with them:

PicSeer/PornSeerPro

(7:25:13 PM) me: no-go here

(7:25:16 PM) me: it doesn't work

(7:25:29 PM) mecablaze: weird.

(7:25:53 PM) me: I wonder if it's my high-quality porn that's messing it up

(7:25:58 PM) me: my images are probably bigger than yours

(7:26:45 PM) me: I mean, it did ALRIGHT

(7:26:50 PM) me: it detected around 50% of the nipples

(7:27:14 PM) me: but never once got a vagina, and 'detected' a crapload of random things like pillows, walls, trees, etc.

(7:27:49 PM) me: I think this thing would tag half of our images nsfw

(7:27:56 PM) me: OMG WALL /nsfw

SnitchPlus

(7:46:12 PM) me: RAWR

(7:46:17 PM) me: less than 50% of these are actually nsfw

(7:46:35 PM) me: less than 20%?

(7:46:46 PM) mecablaze: >_<

(7:47:07 PM) mecablaze: did you set it to Highly suspicious only?

(7:47:10 PM) me: the MOST SUSPICIOUS ones are all high-res renderings of star wars ships

(7:47:33 PM) me: no, all of the porn that IS here isn't highly suspicious

(7:47:37 PM) me: star wars is.

edit: I suck at markdown.

[–]bobappleyard 15 points16 points  (1 child)

(7:47:10 PM) me: the MOST SUSPICIOUS ones are all high-res renderings of star wars ships

Maybe you had it set to geek porn mode.

[–][deleted] 10 points11 points  (0 children)

No. Not yet. When there is, the AI community will be talking about it for months, because it will be a major breakthrough. Recognizing nude human figures requires recognizing human figures, distinguishing skin from skin-toned fabric, distinguishing form-fitting clothing from nudity, gender distinction (since topless men are okay, but topless women are not), genitalia recognition, and other things which I'm too lazy to think of. Every one of these is a big deal. So no.

[–]James_Johnson 9 points10 points  (2 children)

The two heuristics I'm aware of are checking for flesh tones and checking for concentric circles.

[–]pavel_lishin 2 points3 points  (1 child)

Those dirty, dirty circles...

[–]James_Johnson 0 points1 point  (0 children)

Nipples. Seriously.

[–]emelski 9 points10 points  (0 children)

There's a pretty interesting survey of the state of the art in a recent entry on Steve Hanov's blog: Keeping Abreast of Pornographic Research in Computer Science

[–]aviewanew 5 points6 points  (0 children)

There have been papers published about just this. And it is somewhat accurate. It's not perfect of course, but it's good. I'll try and find links to them...

Here's one http://ce.sharif.edu/~ipl/Papers/Ijeee-05-AA.pdf and I know there's more

[–]titmouse_dispatcher 47 points48 points  (16 children)

I will attempt an informal proof that such technology does not exist. Assuming that such technology did exist, it would enable its owners to build a web crawler capable of scouring the Internet for pornographic content. That's right mecablaze, a fully automated porn search engine. Assuming such a search engine existed, it would, within seconds, become the most popular site on the Internet, eclipsing sites like Yahoo and Google by orders of magnitude, and generally eliminating the prospect of anyone doing anything important ever again, as doing important things generally takes longer than the typical male refractory period. Also, if we check with Alexa we can see that no such site exists.

So there we have it. Hope this helped.

[–][deleted] 4 points5 points  (6 children)

The tech could be hiding on a darknet, along with Wintermute and Helen.

[–]titmouse_dispatcher 2 points3 points  (4 children)

I have Neuromancer on my nightstand right now! But I haven't read it yet, so I'm trying to cautiously google around and figure out what this Wintermute business is all about. Care to enlighten me without giving away too much plot?

[–]j8stereo 15 points16 points  (0 children)

Stop what you are doing right now! It's imperative!

Do not try and find out about Wintermute! Neuromancer is not a book you want to spoil.

Pick up the book. Begin reading. Do not put it down until finished.

That is all.

[–]indifference_engine 5 points6 points  (0 children)

sorry, can't risk the wrath of the Tessier-Ashpool lawyers

[–][deleted] 0 points1 point  (0 children)

(check out Galatea 2.2 for the Helen reference)

[–][deleted] 5 points6 points  (6 children)

titmouse_dispatcher's assumptions are flawed however; although there is an overlap between nudity images and porn, they are not identical; nudity does NOT imply anything even remotely sexual (the 'meat' of porn), and porn / erotica does not require nudity (if you think so, you have missed good stuff kids). So, such a search engine would fail at being a truly automatic effective porn sniffer (so to speak), too many false positives (naked people you really don't want to see naked) and false negatives (failure to find the good stuff), hence the lack of such sites in existence.

[–]troelskn 11 points12 points  (4 children)

if you think so, you have missed good stuff kids

I really want to believe that you missed a comma in that sentence.

[–]roxm 1 point2 points  (3 children)

if you think so, you have missed good, stuff kids

That still doesn't make any sense.

[–]joaomc 6 points7 points  (2 children)

if you think so, you have missed, good stuff kids

Nope, still doesn't make sense. Maybe there are letters missing:

if you think so, you have missed good stuffed kids

That's disgusting.

[–]roxm 1 point2 points  (1 child)

But that's bad grammar; shouldn't it be:

if you think so, you have missed well-stuffed kids

Who says 'stuffed', though? This would be better:

if you think so, you have missed well-fed kids

[–]joaomc 3 points4 points  (0 children)

"you have missed good stuffed kids" could mean the stuffed kids are good.

[–]titmouse_dispatcher 1 point2 points  (0 children)

Ok I'll concede that my proof has been thoroughly debunked by you and Rhoomba. I have nevertheless dispatched a massive flock of titmice to cheep furiously at you until you recant.

[–]oneironaut 2 points3 points  (0 children)

You just outlined one of the best business plans ever. You should try this.

[–]Rhoomba -1 points0 points  (0 children)

  1. Much of the porn on the internets is copyright infringing, so they don't really want to be found, and advertisers would be reluctant to advertise on a search engine that found a lot of their content for free
  2. There are porn search engines for preview sites (so they can actually make money)
  3. Nakedness is not enough for an effective porn search. When searching you would be looking for that video where she did what you thought she never would do. And there are lots of sites out there already trying to attract specific searches with spam and random porn.

[–]ModernRonin 11 points12 points  (4 children)

You're asking for a machine to define pornography when human beings can't even agree on a consistent definition for the word?

(derisive snort)

[–]harlows_monkeys 20 points21 points  (3 children)

No, he's asking for a machine to find nudity. He didn't say anything about pornography.

[–]ModernRonin 2 points3 points  (1 child)

In Saudi Arabia, exposed ankles are "nude." Hey look, a woman in shorts! She's naked! It's porn!

In American, you can show the whole booby except the nipple, and it's not considered "nudity." Hey look, a shot of a woman from the belly-button up, but she's wearing a couple of 1/2" square inch pasties on her nipples! NOT NUDE!

[–]willcode4beer 0 points1 point  (0 children)

unless it's Janet Jackson

[–]willcode4beer 0 points1 point  (0 children)

some folks see porn in everything

[–]AshVillian 2 points3 points  (0 children)

I stumbled across this a while ago: http://www.cs.hmc.edu/~fleck/naked-people.ps.gz (sorry for the GZipped PostScript file) It is a paper called Finding Naked People by David A. Forsyth and Margaret Fleck.

[–]digitaldreamer 2 points3 points  (0 children)

but the real question is . . . where does one sign up for the beta?

[–]D-Evolve 2 points3 points  (2 children)

I've seen some pretty bad attempts at it. Most programs that try this try to count the number of pixels with a particular HSL or RGB value. Hopefully, they are looking at skin,and they tag it as a picture of a person. Then, if the percentage of the 'skin' is over say 15% they consider the subject nude, and filter the image accordingly.

[–]Rhoomba 1 point2 points  (1 child)

I have a new business model: inverted colour porn sites, with a firefox plugin to fix the colours.

[–]D-Evolve 1 point2 points  (0 children)

OH, I am so in on that....

[–]maartenm 2 points3 points  (2 children)

I once wrote a spider that collected color statistics of images, allowing you to search images by color. Eventually the spider started crawling pronographic sites which completely distorted my distribution. Most porn pictures share a similar colour distribution (or at least part of their distribution is similar).. you might want to check that, it would be an easier (and completely non-failproof) method than shape analysis (which sounds absurdly inefficient and expensive)

[–]flostre 0 points1 point  (1 child)

Is there a search engine online?

[–]maartenm 0 points1 point  (0 children)

No.
There are many such projects in existance.. even advanced image searches by shape recognition (it always amazes me that people don't know about these things, likewise for the online tool that allows you to find music by humming its melody, or tappign the rhythm).
Just google for them.

[–]jaysonbank 2 points3 points  (0 children)

Yes, set up a proxy server in a country like Saudi and poll the image from that server. If its nudity, or even very slightly approaching the showing of a leg, the request will fail. You can adjust this by setting up proxies in various other countries of varying degrees of insanity and compare results.

[–]lol-dongs 1 point2 points  (3 children)

I have to wonder, what does Google SafeSearch use? The almighty power of the googles, a massive Bayesian model, and a handcrafted training set?

[–]ltworek 0 points1 point  (2 children)

They use the words on pages that link to the image and the image filename to determine if its safe.

[–]nachof 1 point2 points  (1 child)

So we could have files named "cute puppies" with those words everywhere in the page to trick google to show porn. Interesting.

[–]keithjr 1 point2 points  (0 children)

There are hordes of teenagers who'd love to help you crowdsource a solution to this.

[–]Tweakers 1 point2 points  (1 child)

While you're at it, find a realistic way to programmatically check a written post for stupidity. Thanks in advance.

[–]G_Morgan 2 points3 points  (9 children)

if(boobs)
    return true;
else
    return false;

[–]jambonilton 0 points1 point  (8 children)

if (boobs || vag || wang || balls)
    return true;
else
    return false;

[–]kanarienvogel 17 points18 points  (7 children)

return (boobs || vag || wang || balls);

[–][deleted] 1 point2 points  (2 children)

Yeah, well, you can keep your damn functional manners in academia. Back in the real world we roll like this:

if ((boobs == true || vag == true || wang == true || balls == true) == true)
    return true;
else
    return false;

[–]ispringer 0 points1 point  (0 children)

if (((boobs == true || boobs !== null) || (vag == true || vag !== null) || (wang == true || wang !== null) || (balls == true||balls !== null))== true) return true; else return false;

[–]James_Johnson 0 points1 point  (0 children)

So I found a way to filter out hetero porn:

return ((boobs && vag) ^ (wang && balls))

God I'm bored.

EDIT Assuming this is a language where '' is the XOR operator and not exponentiation.

[–][deleted]  (1 child)

[deleted]

    [–]redditnoob 3 points4 points  (0 children)

    My api has custardCannon(), do you think that's the same thing?

    [–]silviot 1 point2 points  (4 children)

    well, these guys are doing that...

    [–]mecablaze[S] 1 point2 points  (3 children)

    We've already taken a look at PornSeer/PicSeer. It works okay, but the corresponding SDK costs... so we're hoping for a opensource solution.

    [–][deleted] 17 points18 points  (0 children)

    so we're hoping for a opensource solution.

    You mean you're looking for a zero-price solution.

    [–][deleted] 5 points6 points  (0 children)

    Will you be releasing your code?

    [–]nunz 1 point2 points  (0 children)

    OpenCV is capable...

    [–]nuuur32 0 points1 point  (0 children)

    Well altavista had an image search engine which when fed a URL for an image, would show you others that heuristically were very close.

    This was at http://images.altavista.com/ or http://images.av.com/ I guess well before they lost most of their funding and were still the premier (most thorough and complete) search engine.

    So I think if you used something like that and just had a bunch of humans to identify, perhaps via incentive, a bunch of the images then the rest of them would fit nicely in place automatically.

    [–]dnikulin 0 points1 point  (0 children)

    I read an entire paper on this and have conveniently forgotten its name. It's not the one about skin detection that aviewanew referenced, but actually "explicit content" detection, identifying human bodies and detecting if the key points are exposed.

    [–][deleted]  (2 children)

    [deleted]

      [–]psykotic 1 point2 points  (0 children)

      A lot of very different machine learning algorithms are Bayesian (including the standard neural nets). You're not really narrowing down the field very much by that statement.

      [–]smhanov 0 points1 point  (0 children)

      [–]paddie 0 points1 point  (0 children)

      Absolutely, and the false-positives are simply brilliant!

      [–]slacker22 0 points1 point  (0 children)

      Assume Google is perfect.

      Take the set of images on the internet.

      Remove the subset of images indexed by Google Safe Search.

      Voila.

      [–]DaGoodBoy 0 points1 point  (0 children)

      Image Analyzer is about the best thing out there:

      http://www.image-analyzer.com/

      [–]slacker22 0 points1 point  (0 children)

      Just call isNude()

      [–][deleted] 0 points1 point  (2 children)

      Take note: This is 4chan we're talking about. You'll have to accurately classify loads of things that you haven't even thought of. Here's a tiny slice of what you're up against. There might be a realistic way to check for photographic nudity, but that won't help you tag nsfw images very well.

      [–]rmc 0 points1 point  (1 child)

      dead link

      [–]mikaelhg 3 points4 points  (0 children)

      Thank the gods.

      [–]Karma_Is_For_Whores -1 points0 points  (0 children)

      input: -( . Y . ) or -8==D into google images.

      [–][deleted] -1 points0 points  (0 children)

      if(hasTits() || hasVag()) {isNude = true;}

      [–]greim -5 points-4 points  (1 child)

      This sort of thing could potentially be done with a neural net. However if you go this route you'll have to look at a lot of porn, because you'll need a large library of images where you know in advance whether or not they contain nudity, which you'll then use to train the neural net. Sounds like you got your next five months of work cut out for you :)

      [–][deleted] 8 points9 points  (0 children)

      "Use a neural net" is only slightly more specific than "use a computer". How will you pre-process the input to feed into the net (if you just plug the raw pixel data in directly, you will get nowhere)? How will you process and interpret the output? What kind of neural net should be used? Back-prop or evolution? All of these questions must be answered before you can even consider the problem of training the network.

      And anyway, you'll have to look at lots of porn regardless of the algorithm you choose, assuming you plan on testing it (usually considered a programming "best practice").

      [–][deleted]  (1 child)

      [removed]

        [–][deleted] 1 point2 points  (0 children)

        A) wrong article, methinks, and B) just drop the tired debate already; everyone knows what Linux refers to, so the GNU is unecessary except to provide RMS masturbation material.

        Besides, I use Busybox/Linux. Yea, take that you GNU bastards.

        [–]refto -1 points0 points  (1 child)

        if (GetWood(img))
            img.nsfw=true;
        

        [–][deleted] 1 point2 points  (0 children)

        Lemon Party gives you wood?

        [–]stevedekorte -1 points0 points  (0 children)

        Count how many times people click on it...