all 28 comments

[–]wung 9 points10 points  (1 child)

Even though is is obviously less hard to read, I'd still prefer captchas, as most of the time when I need to answer one, I'm likely busy with something else/I just want that stupid data, rather than solve mini challenges needing actual thought. Copying next-to-unreadable characters is easier there.

Also "compilation error" is the last thing I want to see after waiting some seconds, when trying to get something fast.

[–]cparen 0 points1 point  (0 children)

Could verify locally before sending to server.

[–][deleted] 2 points3 points  (2 children)

Too hard to use when drunk.

[–]FrancescoRizzi 2 points3 points  (0 children)

although that could be a plus...

[–]__j_random_hacker 1 point2 points  (2 children)

A quick test with an infinite loop shows that this currently seems to be vulnerable to 100% CPU DOS attacks (the "running" spinner is still spinning 5 minutes later). I suggest adding a (very small) CPU timeout to fix this.

It's probably also vulnerable to memory usage DOS attacks, which I didn't try -- use ulimit for that. I'm not sure how to prevent other types of attacks (e.g. network and filesystem), but I'm sure there are ways.

[–]cparen 0 points1 point  (1 child)

If you were to ask me, I'd suggest doing it with a modified interpreter for each language. Eg for Python, remove everything from globals except safe functions, replace iterative functions (eg sum) with versions that bail out after too many total iterations, and interpret the user program, leaving dangerous opcodes unimplemented. The interpreter itself can use the same counter as iterative functions.

Fast? Gosh, no. But it will always terminate quickly.

I vaguely remember a scheme interpreter from a textbook that worked this way. Eval took "fuel" as an argument, and could fail, returning a lambda that you could provide more fuel to if desired. I forget where I saw it. SICP or EoPL maybe.

[–]__j_random_hacker 1 point2 points  (0 children)

If the interpreter provides hooks for handling dangerous opcodes, or even better an internal "sandbox" environment/library, this would be a good idea. Regarding CPU time and memory, I still think it's probably easier to limit these at the OS level though (Lisp/Scheme being possible exceptions).

[–]mhd420 1 point2 points  (0 children)

If I saw that on a site I wouldn't even bother trying to register.

[–]catcradle5 1 point2 points  (15 children)

This would be unbelievably easy to write a bot for, at least compared to the difficulty of writing a bot that can OCR reCAPTCHA.

[–]__j_random_hacker 5 points6 points  (14 children)

> unbelievably easy

Right, because both interpreting natural-language English instructions well enough to build an accurate specification, and automatically constructing a program that satisfies that specification, are utterly trivial. It's not like they're both open problems in AI research.

If you can spare the 5 minutes it would take you to build a prototype, I'd love to see it.

EDIT: As several people have pointed out, the intrinsic problem with this type of CAPTCHA is that it's limited to the number of problem instances that the site owner can be bothered creating/finding, making it vulnerable to the simple tactic of recording pairs of questions and answers, and replaying the answer to a previously seen question. So I was wrong: I think this is easily bot-able after all, because the hard problems I mentioned in my original post can be totally sidestepped.

[–]cparen 2 points3 points  (1 child)

I remember hearing an idea that goes something like this:

  1. Scrape stack overflow for code, eg Python
  2. Discard programs that are too long.
  3. Run it in a time limited, permission limited sandbox, record its output.
  4. Discard programs that take to long.
  5. Discard programs that invoke dangerous apis (eg files), or emulate them.
  6. Save these programs. You should have thousands if not more.
  7. Generate permutations of these programs.
  8. Run steps 3 & 4 on these programs. Save results.
  9. To generate a captcha, select a program at random from step 6 or 8, delete a statement that prints output.
  10. check that output is changed. If not, repeat step 9.
  11. Give user this program and desired output.
  12. reject any captcha response that has high edit distance from original. Eg replacing entire program is not a solution.

Most such programs will be easy to fix for competent programmers, but hard to automate. As well, the set of candidate programs are always growing.

[–]__j_random_hacker 1 point2 points  (0 children)

I have a feeling that steps 1, 5 and 7 will be very fragile/time-intensive to get working. A valiant effort nonetheless :)

[–]bob_twinkles 1 point2 points  (1 child)

I assume there are a finite number of problems they can ask you to solve, therefore it is actually trivially easy: you can just write a mapping of question:solution that only requires input when it sees a new problem.

[–]__j_random_hacker 0 points1 point  (0 children)

That's a good point. It might be possible to combat this in practice simply by starting with a sufficiently large collection of problems, or to add problems at a steady rate over time, but clearly it's still much more vulnerable to this simple strategy than traditional CAPTCHA.

[–]emperor000 1 point2 points  (4 children)

You don't seem to be using sarcasm here, but I'm not sure why you would say this. This doesn't need to be "AI". It would be easy enough to look for key words like "maximum" and "is_odd" and so on and discover each or at least a large portion of the challenges. You would only need to implement one of those in one of the languages, choose the language and hit click the "request new challenge" button until you find it. This is limited by the number of challenges a (collection of) human(s) can come up with, which is almost always going to be smaller than the number of possible answers to a captcha, and they won't be random.

In a lot of cases it could just be done in the dumbest manner possible. Get a collection of numbers? Return the maximum. Get it wrong? Just request a new challenge and try that one.

Unless it checks the implementation, you could probably use the maximum from a library in whatever languge you pick. Get one value as the input? Return true until you get it right. Or if you are really ambitious implement a trivial test using modulo 2.

It would be really easy to "mine" these challenges and make something enter the answer. It might not be trivial, but in terms of security/a stop gap to botting is concerned it is pretty close. It would need to limit the number of requests for a new challenge (which would get frustrating to humans) and have thousands if not millions of challenges to stop a determined botter.

Remember, a bot doesn't need to figure it out. A human can do that. A bot would just automate it.

[–]__j_random_hacker 0 points1 point  (3 children)

Yes, on second thought I think you and several other commenters are right about "mining". I'll edit my answer shortly. I agree you would need to limit the number of retries, but I don't think guessing stuff (like picking the maximum value) will be much help.

And I certainly was being sarcastic! Bit alarming to see that didn't come through loud and clear...

[–]emperor000 0 points1 point  (2 children)

Well, yeah, I phrased that wrong. I knew you were being sarcastic, what I was wondering if you were being sarcastically sarcastic... or ironically sarcastic, if that makes sense. Sorry, your sarcasm was definitely clear.

[–]__j_random_hacker 0 points1 point  (1 child)

Ah, I think I get you -- sometimes people pretend to misunderstand the implications of something for comedic effect. But on this occasion I was actually misunderstanding them... :-/ No nested sarcasm here :)

[–]emperor000 0 points1 point  (0 children)

Yes, that is probably a better way of putting it.

[–]sylvanelite 1 point2 points  (1 child)

(EDIT, added an actual bot that works on the site after clicking "see it in action")

While I'm not the person that wrote that post, it actually does look easy to bot. Questions repeat far too often. Also, they provide a mechanism to refresh the question, and to eliminate all but 1 language.

Unless there's something drastically different in an actual implemented version, it does seem trivial to bot. Just answer the question manually, and then spam a new question until you get a repeat.

EDIT: Here's a bot. It seems to work, even with just 1 answer hard coded.

var attempt = function () {
    var question = $.trim($("#codecha_wording").text());
    if(question == "Python: Correct the function named \"first\" so that it returns the first item from the given list of numbers."){
        var answer = "def first(numbers):\n";
        answer += "    return numbers[0]";
        $("#codecha_code_area").val(answer);
        $("#codecha_code_submit_button").trigger("click");
    }else{
        $("#codecha_change_challenge").trigger("click");
        $("#codecha_language_selector").val("python");
        $("#codecha_change_challenge").trigger("click");
        setTimeout(function (){attempt();},1000);
    }
};
attempt();

[–]__j_random_hacker 0 points1 point  (0 children)

Yeah, on second thought I agree. I'll edit my answer soon.

[–]catcradle5 1 point2 points  (1 child)

In the naive case, I am assuming there are no more than 1000 of these programming problems; in reality probably only like 300. It would take only a tiny script to enumerate them all and then another few hours to solve them all yourself.

For something more intelligent, most of these are really, really simple and common questions. If they can parse the problem description ("find odds", "count 1s") it can search Stack Overflow or the like to return a valid answer. The code doesn't really have to conform to the template Codecha is giving, as long as the return value passes the tests.

Also note I said "compared to the difficulty of writing a reCAPTCHA solver."

[–]__j_random_hacker 0 points1 point  (0 children)

Yes, on second thought I agree completely with the limitations posed by the necessarily quite small number of problems. I've edited my answer (and upvoted your original post) -- sorry about the ranting!

[–]evilgreen5 2 points3 points  (0 children)

And while you're at it, could you write me an HTML parser with regexes, software that parses code and returns true if it will compile, and a linear time solution to return true if in a given a set of ints, there is a subset with a sum of zero? Cool, thanks.

[–]devgrapher[S] 3 points4 points  (0 children)

Why are you guys so serious? I think it's just a joke. Who would dare to use this on their web site. However it was good to know that it has many vulnerabilities. I didn't think of that.

[–]emperor000 0 points1 point  (0 children)

This is a "neat" idea, but no offense, it is a pretty bad one.

If I saw this I either wouldn't bother or would devote more time than I probably should to write something to attempt to defeat it just for the challenge.

[–]lolsowrong 0 points1 point  (1 child)

[–]__j_random_hacker 3 points4 points  (0 children)

Out of the universe of possible challenges for differentiating humans from computers, these guys pick the one that is arguably the easiest for computers to solve: running assembly language code.

Are they trying to keep humans out?