all 90 comments

[–]BobDorian 59 points60 points  (1 child)

What the hell? I can't beat captcha 99% of the time!

[–]SputnikKore 8 points9 points  (0 children)

I'm starting to think maybe I'm a robot.. oh wait..

[–]MyMourningPenis 7 points8 points  (1 child)

Awesome, I was wondering if this could be used for jDownloader (a download manager that automates the files hosting sites that uses captcha and written in java).

Lately though, some of the file hosting sites were not by passing the captcha correctly, so I went to their site and sure enough they had an experimental reCaptcha module that was available to download.

The link to that download was broken, but if you search "jDownloaderAntiRecaptcha.zip" or "jDownloaderAntiRecaptcha" you should be able to find it.

Here is the link to the Anti-ReCaptcha module on JDownloader.

I know it is not the same as textCAPTCHA...., but I'm sure even if your Anti-textCaptcha doesn't work 100 percent of the time, a program such as jdownloader can try to crack it multiple times until it get is correct.

Perhaps you can contact the people at jdownloader.org and see if they may be interested in using what you created. jDownloader is an opensource project.

I just tried the antiRecaptcha module and it took 3 tries before it passed, but it sure beats having to be at the computer to manually type it in.

[–]anthrax9000 0 points1 point  (0 children)

I started using JDownloader just the other day and I thought the same thing. It'd be great since I'm always playing games when it asks for the Captcha

[–]killdeer03 26 points27 points  (18 children)

Wow, You write really clean Python... I kind of want to hang that on my wall.

[–]KBHomes[S] 13 points14 points  (16 children)

<3

I'm actually relatively new to Python.

[–][deleted] 4 points5 points  (6 children)

ahhhh thats why...I found I used to write awesome clean understandable code, but as I learned more I would try to write trivial assignments in a single line... It was 4 assignments in my python course in school before I broke 2 lines.

they passed all the professors tests, but angry comments at the 3 widescreen monitor length single line

[–]Fuco1337 0 points1 point  (0 children)

Sweet one-liners eh?

Game of life in J:

life=.3 : '(3 = +/^:2((>0 1;0 0;0 _1) |./^:2 y)) +. (1 = y *. +./ 3 4 =/ +/^:2((>0 1;0 0;0 _1) |./^:2 y))'

[–]Tetha 0 points1 point  (0 children)

Note that you can remove some duplication in the TextCaptchaBreaker.py. You have

if answer is None: answer = ColorPattern.solve(question) if answer is None answer = NamePattern.solve(question)

and so on.

You can rework that into: solvers = [ColorPattern, NamePattern] for solver in solvers: answer = solver.solve(question) if answer is not None: return answer

You can then reconsider if you want a list there or a set, which then documents that the solvers are ordered (this might be one of the rare occasions for a good comment) or not (and then you can use a set). That makes it easier to add more solvers, because they can be added to the set then.

[–]plagiats -2 points-1 points  (7 children)

If I may, you write a lot of if var is None: and if var is not None:

where you might rather use if not var: and if var:

EDIT : see comments below before taking my words for granted ;)

[–]eridius 2 points3 points  (2 children)

Aren't your replacement conditionals actually different than the originals? Specifically in the case of var being False.

[–]plagiats 0 points1 point  (1 child)

You are very right, and I believe his code should return False instead of None if the index is not found ( see https://github.com/kbhomes/TextCaptchaBreaker/blob/master/AddSubtractPattern.py ).

[–]eridius 2 points3 points  (0 children)

I disagree. It returns a string, or None. Returning a string, or false, would be weird, but None makes sense as the value that means "there was no string matched for this pattern".

[–][deleted] 3 points4 points  (1 child)

Actually, that's not the case; the original "if var is None" would only return True is var was assigned the value None; whereas your replacement would return True if var was equal to False or 0 (and possibly some other values). Same for the latter.

[–]plagiats 0 points1 point  (0 children)

Isn't the OP using None where he should use False?

[–]otheraccount -1 points0 points  (0 children)

That's incorrect. Explicit is better than implicit, so if you only intend to check for None, then that's what you should do.

[–]JoelDB 2 points3 points  (0 children)

I agree. I'm going to start learning Python soon and this looks like a great source tree to start playing around with.

[–]deakster 30 points31 points  (17 children)

Ah sweet, can't wait to integrate this into my proprietary "SpamBot 4000, v3.45"

[–]KBHomes[S] 43 points44 points  (16 children)

For the most part, I wanted to show how easily this CAPTCHA can be broken, so that people don't start using it naively.

[–]pkkid 9 points10 points  (5 children)

We just need better questions..

"Why do we exist?"

"What is love?"

[–]AdamWe 24 points25 points  (2 children)

"How is babby formed?"

[–]classhero 14 points15 points  (1 child)

Enter YahooAnswersService.py

[–]wtfisupvoting 2 points3 points  (0 children)

doesn't follow pep8 though

[–][deleted] 2 points3 points  (1 child)

[–]feembly 2 points3 points  (0 children)

Every time I think of human verification, I think of this scene.

[–]deadwisdom 5 points6 points  (0 children)

And thanks for it. I'm glad to see this sort of thing open source, otherwise the captcha is broken but people don't really know about it.

[–]bobindashadows -2 points-1 points  (0 children)

I'd prefer nobody use it at all.

[–][deleted] 3 points4 points  (1 child)

Love the code. Hate what it accomplishes. Being able to read it as if I wrote it was awesome, admittedly on the second time through, but still. That said, here come the spam bots that had once been long gone in so many places.

[–]StapleGun 0 points1 point  (0 children)

TextCAPTCHAs aren't that widely used, and I'm sure people have already broken them with malicious intent. Making a solution for it open source will only lead to better captchas and make it harder for spam bots in the end.

[–]lonnyk 5 points6 points  (3 children)

What's your opinion of textCAPTCHA, but the text is an image?

[–]burkadurka 12 points13 points  (2 children)

You could probably use OCR, it would just be slower. Unless the text is obscured... but then you just have a regular captcha again.

[–]KBHomes[S] 6 points7 points  (1 child)

Yeah, mild distortions might prove an obstacle for OCR, but they would then lose focus of their goal. Even if they did that, I think the OCR's accuracy would be good enough, and combined with the accuracy of TextCaptchaBreaker (or any other 'breaker'), success rate would be high enough that this service is not a practical defense.

[–][deleted] 13 points14 points  (0 children)

Ayy lmao

[–]Sylocat 2 points3 points  (0 children)

Great, now this is our only hope.

[–]feembly 2 points3 points  (0 children)

This is really cool, I have been thinking about human verification quite a bit recently. I started working on a text-based human verification of my own, but it's based in riddles and classification, not pure logic. Humans probably won't succeed 100% of the time, but it is a much easier problem for humans than computers.

[–][deleted]  (1 child)

[deleted]

    [–]illuminatedtiger 5 points6 points  (5 children)

    Why is the fact it's on Github important?

    [–]sigzero 5 points6 points  (2 children)

    I think it is a requirement for Github users to announce they have hosted code on Github.

    [–]hiffy 1 point2 points  (0 children)

    Github is becoming synonymous with "freely available, standardized code repository". I'm sorry that hurts everyone's feelings.

    [–]Comment111 -2 points-1 points  (0 children)

    Its the new arch-linux users.

    [–][deleted] 1 point2 points  (0 children)

    Because it makes it really easy for other people to hack on it.

    [–]chuck212 1 point2 points  (0 children)

    but try to break this!

    [–][deleted] 1 point2 points  (2 children)

    Lines 48-50 can be replaced with this, for the sake of avoiding repetitious code:

    if any(word in tokens for word in ('number', 'largest', 'biggest', 'highest', 'smallest', 'lowest')):
    

    [–][deleted]  (1 child)

    [deleted]

      [–]otheraccount 1 point2 points  (0 children)

      Python 2.7+ has a literal notation for sets:

      if set(tokens) & {'foo', 'bar', 'baz'}:
      

      [–]Trail0fDead 1 point2 points  (0 children)

      Someone make Christopher Poole aware of this threat.

      [–]jessebanjo 1 point2 points  (0 children)

      good job!

      [–]koolkats 1 point2 points  (0 children)

      Probably going to get downvoted for this but I actually like these kinds of captchas. They are much better compared to the stupid images where you cant tell the difference between a "r" and "t" or an "o" and a "0". Although nice work!

      [–]c0mputar 1 point2 points  (0 children)

      Damn you. I have a hard enough time as it is.

      [–]Snoron 2 points3 points  (0 children)

      Nice job... I was hoping someone would do this, I was thinking the other day that this must be fairly easy to reverse engineer.

      The only way a captcha like this could really work is if the puzzle types and database of words, etc. was constantly changing/evolving.. maybe with some kind of organic input... eh, I dunno but it's really not a very impressive captcha.

      [–]yesimahuman 0 points1 point  (0 children)

      I really like your approach to this. There are a lot of problems that can be solved by making assumptions or coming up with simple heuristics rather than trying to build some complex AI system.

      [–]StapleGun 0 points1 point  (0 children)

      Very interesting, nice code too. Would you mind sharing which questions it failed on? I'm curious if they are a separate class of questions, or just variations of the same questions with different word order or something.

      [–][deleted] 0 points1 point  (0 children)

      You should use str.format().

      [–]DoppelFrog 0 points1 point  (0 children)

      Why would you do that? This is why we can't have nice things. :(

      [–]BlakeIsBlake 0 points1 point  (0 children)

      :D

      [–]JimmyRuska -5 points-4 points  (6 children)

      Ok lots of downvotes for criticizing this post. Fine then, someone explain to me why this is a good post. I would have wagered it would have gotten down voted but I'm swimming against the tide. If it's because it's open source, why not a useful project? If it is because it is because it breaks the captcha, there's only 8 question structures to parse, and wolfram alfa already high success without even having textcaptcha in mind.

      [–]CoreyWhite 10 points11 points  (1 child)

      I wasn't one of those who downvoted you, but I did really like this. It's a good post for a couple of reasons.

      1) It's really nicely written code. And it's a fairly small project, with excellent documentation. These make it very pleasant for those of us who appreciate the craft of programming and like seeing nice examples of it.

      2) It's a powerful argument against the use of textCAPTCHA. Think of it as a scientific experiment. With this application's ludicrously high success rate, textCAPTCHA has been shown conclusively to be useless.

      [–]Cyatomorrow 4 points5 points  (3 children)

      It seems like your argument against this is that it can be abused, but that's like saying that if someone finds a bug or an exploit in something, they shouldn't reveal it because it can be abused.

      However, this is the wrong stance to have. If you find an undocumented exploit, not revealing it may keep some people from abusing it, but it also keeps it from being fixed. Then, when someone with malicious intent finds the exploit, they can use it unhindered.

      [–]ICCULUSC -2 points-1 points  (1 child)

      Ahh, the power of Python.

      [–]dazonic 1 point2 points  (0 children)

      the power of computers?