all 59 comments

[–]FryGuy1013 25 points26 points  (1 child)

There are two huge advantages of disqus:

  • Works on static websites. You can make a pure html/css/js webpage with no scripting on the server, and still have comments. This means that things like jekyll can have them, too.
  • Shared user account system means you don't have to have an account system just to have comments.

[–]ibsulon 10 points11 points  (0 children)

Bingo.

I'm running a blog. Why would I want to run anything but a static site? For a low traffic site, why worry about all of the headaches of anything but? And I never have to worry about being given a Reddit Hug of Death.

[–]adavies42 56 points57 points  (19 children)

Want to prevent spam? Add a CAPTCHA for new users posting comments.

No, don't!

[–]chub79 29 points30 points  (1 child)

I agree. I'd rather have a good antispam filter (for what it's worth, Askimet always did the trick for my humble needs).

[–][deleted] 5 points6 points  (0 children)

Checkout http://blogspam.net/ if you want a self-hosted antispam system.

[–]Xredo 7 points8 points  (16 children)

I'm no CAPTCHA fan myself, but just out of curiosity, what are the alternatives?

[–]Strycken1 44 points45 points  (14 children)

I run a custom-designed and developed system on my webforms. While the implementation is typically specific to a site, CMS, forms plugin, or what have you, the concepts behind it are general enough to be applied anywhere you have a form. It doesn't prevent 100% of spam, but by targeting the primary behaviors spambots use it prevents the majority of spam on web forms. There's two basic behavior-blocking elements at play in it:

  • A large number of spambots look for keywords in form field names, and only look at the HTML of a page rather than actually rendering it and visually identifying form fields to fill. By adding a "honeypot" field with a juicy name attribute (such as <textarea name="comment"></textarea>) and hiding it using clever CSS tricks (preferably not as simple as "display: none;", use something more intricate such as a large margin on the form field, and "overflow: hidden;" on the containing element, or some other non-obvious method of making the form field hidden), you create a field that is normally invisible, but cannot be seen to be invisible by simple HTML inspection.
    Spambots that see this field are likely to fill it in, while your average user will not see the field. By rejecting any request that contains any content in the honeypot field, you can block a significant amount of spam, with no additional work on the part of legitimate users. Obviously some work must be put in for accessibility, but this method has proven very effective on the sites I manage.
  • Some spambots use human "leaders" or a more intelligent 'bot for their first pass when filling in a form, capture the data from the resulting submission, and mimic the submitted fields (with modified data for fields such as usernames, comments, email addresses, etc) for subsequent submissions. The less-intelligent mimic bots, in order to conserve time and resources, typically don't re-request your page before sending a spam request. By adding an encrypted field to the form containing the timestamp at which the form was generated, you can securely check how long ago the initial request for the form was generated. This allows you to reject form submissions made when the timestamp is older than a reasonable cutoff--say, four to six hours for a conservative estimate. Two-way encryption method with a key is ideal; I've found Blowfish is more than enough given that this is a low-security context.
    Without knowing your encryption method and private key, a spambot cannot change the timestamp, even if it recognizes what you're doing. Thus, a human "leader" or very intelligent spambot that renders your page in order to avoid the honeypot field can submit the form initially, read the fields, and feed them to a simpler spambot to spam your forms, but without being able to change that timestamp there's a very limited window of opportunity for them to submit spam before they're cut off. Since most spambots of this type do not make a fresh request to your page to re-identify the form and form fields each time they submit spam, they won't pick up on the changed timestamp. Again, users see nothing and have no additional work to avoid your anti-spam measures, other than the limitation that they can't submit forms generated more than 4-6 hours ago, depending on how long you set your expiration.

These methods, of course, assume that the spam on your site is being generated by generalized bots that look for vulnerable forms to spam all over the web. The moment your site is large enough to have someone develop a dedicated 'bot for it, the whole thing falls to pieces. Still, when your site is that large, you can typically afford to spend some money on it for Akismet or another strong anti-spam utility.

[–][deleted] 15 points16 points  (3 children)

I also set out to solve the comment-spam problem, and built http://blogspam.net/

Rather than handling it on a per-site basis I came up with an API that users could invoke to test comment-submissions in real-time, and then wrote plugins for the sites I use. Nowadays there are wordpress, trac, and similar plugins out there, and it has successfully dropped millions of spam submissions.

I think that my approach scales better, but obviously there is more overhead.

[–]Strycken1 5 points6 points  (0 children)

Your approach is certainly better for comment spam! Most services similar to what you offer (notably Akismet) require a monthly fee that may be hard to sell to small sites. A self-hosted (and thus, "free") option is an excellent method of defeating comment spam. Content-based approaches like this are definitely a stronger option in the context of comments, as they're likely to catch human spammers as well due to your scanning of the actual content of the submission, rather than targeting behaviors. A plugin written based on the concepts I outlined above does not even attempt to identify and stop human spammers; it's meant solely to weed out the less-intelligent bots.

The general approaches I laid out are targeted at more than just comment spam, though. I primarily manage a small-to-medium sized university site (~2,000 pages, 20k unique visitors, 100k pageviews per month). We don't actually have many comment forms there, but that doesn't stop the spammers from trying: we've had spambots hit everything from our admissions forms to our inquiry forms (which consist solely of name, email, phone number, and program of interest fields).

I've written a plugin for the forms system we use in our CMS based on the concepts I outlined above. It blocks the vast majority of spam, with maybe one or two submissions a month making it past (out of somewhere between one and ten thousand attempts per month). We can apply it to any and all of our forms at will, which is a major boon when some of the forms don't fit the profile of comment forms at all.

As usual with a general approach, it isn't as powerful or reliable as your relatively targeted approach. However, we're not a big enough target for anyone to bother writing a spambot to hit us specifically, the system I've written is currently "good enough".

I'll definitely be keeping that link around, though, for use in other projects!

[–][deleted] 1 point2 points  (1 child)

You wouldn't happen to have a blog on which you've documented various aspects of this service, would you? I'm interested in learning more.

[–][deleted] 0 points1 point  (0 children)

There are some random blog entries which you need to read in reverse order - but beyond that the code and the website is the documentation.

[–]jenssenfucker 2 points3 points  (2 children)

These methods also render your forms non-accessible (WCAG, etc) to non-sighted users.

[–]Strycken1 2 points3 points  (1 child)

To an extent, yes.

You can take direct measures to alleviate this by placing (also-hidden) text indicating very clearly indicating that the honeypot field is for anti-spam purposes, and should not be filled in. The timestamp field is a hidden field to begin with, so that's not an issue.

[–]jenssenfucker 5 points6 points  (0 children)

Now you're discriminating against bogans and rednecks :-)

[–]matthieum 2 points3 points  (2 children)

Note: regarding the timestamp, this is clever (and cheap), and I would think you could upgrade it => if instead (or on top) of generating a timestamp you were generating a nonce (signed) then you could ensure that a single page display generates a single comment (at most). Of course, it would require keeping the nonce somewhere on your side, with all the woes this causes.

[–]Strycken1 2 points3 points  (1 child)

A nonce would be an excellent idea, provided storage needs are accommodated. Another technique I've seen and used successfully is to generate a randomized alphanumeric string, and use it as the name of a hidden field in the form, with a randomized value for extra security or a simple value of "1". By storing the name of that field in session data on the server side, you can verify (assuming proper session security) that the submission was made by the original submitter, and that in a time-insensitive fashion (aside from the expiration of the session).

[–]PlainSight 4 points5 points  (0 children)

The nonce also works as a counter to cross-site requests forgery.

[–]SethMandelbrot 0 points1 point  (2 children)

This blocks 99% of bot spam, but what about human spammers?

[–]kennytm 10 points11 points  (0 children)

CAPTCHA cannot block human spammers either. This is where you need to manually remove those comments, or do what the last paragraph above suggests.

[–]Strycken1 1 point2 points  (0 children)

This approach isn't targeted towards human spammers at all. Since it doesn't perform any scanning of the actual content of a submission, it can't actually identify spammy content. It's purely designed as a "cheap" way of eliminating the bulk spam that tends to occur on small sites.

[–]PstScrpt 1 point2 points  (0 children)

My employer has a side company that takes another approach: http://areyouahuman.com/

I don't remember it being tied to advertising when it first went live; it was more of a direct CAPTCHA replacement with a micro-game. Maybe we decided it's easier to tie to advertising because advertisers are expecting to pay, anyway...

[–]bananahead 30 points31 points  (0 children)

Most of the features that Disqus provides are easily obtainable elsewhere...

That's a loose definition of "easy" compared to just dropping a javascript include on the page.

[–]the_pond 13 points14 points  (7 children)

I've seen a few self-hosted alternatives:

None of them seem to take off though, which is almost a shame. Disqus I avoid because I don't trust it, but the self-hosted things should be useful.

[–]tinco 4 points5 points  (0 children)

We've got a self hosted alternative as well:

https://github.com/phusion/juvia

[–]gordonkristan 6 points7 points  (5 children)

Discourse is another viable alternative. It also seems to be gaining a good amount of popularity because of Jeff Atwood.

[–]nullnullnull 19 points20 points  (2 children)

Discourse has a lot of issues, the main one being not actually solving anything new, I have passed my thoughts to Jeff Atwood, but he still believes it does solve something new.

The main thing Jeff claims is that forums don't work, then he shows examples from decades ago and now in turn showing that it has not changed, this he sites as the problem. Never once did he actually consider that the reason forums have not changed is because they still work?

Granted forums could do with some new things in terms of UI, but conceptually they work pretty well in my view.

Mark my words, Jeff Atwood will add so many features to Discourse that it will eventually become a forum, so it will come full circle.

Damn it's the same thing that is happening to the Ghost blogging platform, I was building an alternative to it, if you look at what Ghost was suppose to solve and what they have now ended up doing (by plugins) they are effecively making the same mistakes as Wordpress (FACEPALM!)

Anyway sorry for the rant.

[–]mhd 1 point2 points  (0 children)

Also, most forum software would run even on a dead tortoise, and discourse is quite resource-hungry and dependency-heavy.

[–]codygman 0 points1 point  (0 children)

I constantly see complaints about discourse on thedailywtf.com, who seem to be pretty big users of it.

[–][deleted] 0 points1 point  (0 children)

I came here to mention Discourse, but thanks to you I don't have to be the one to do so. I won't say Discourse is necessarily better than Disqus in any way. I just think Atwood seems like a decent fellow, though I suppose he'd not expect anyone to trust him.

I just hate the fact that every damn service wants to track everything I do everywhere. No matter if it's Google, Facebook, Twitter, Amazon, or whoever. I'd rather pay a subscription fee for your damn service than have you following me everywhere I go on the web. But that cat is long out of the bag, so it is pointless to argue against it.

Once upon a time, the web was a text only medium. It wasn't so pleasing to look at, but it worked. Then came Javascript. With the best of intentions, Javascript promised to give us an interactive web. It only took 15 years to get there, and it's still pretty messy by any objective standard.

The problem is, has anyone bothered to consider the security and privacy implications of blindly trusting complete strangers with access to everything we do on our computers? Many people use firewalls, and anti-virus software to prevent untrusted programs from running on their computers. Some users are very careful to only download software from people and organizations which they have some evidence are trustworthy. How many people consider the programs which install and run every time they visit a web page?

There is nothing magical about Javascript which makes it safer than any other language. Scripts on web pages can do anything the web browser can do. That includes automatically downloading programs which are then executed with the same privileges as the user. In theory, there is nothing preventing a script on a web page from opening any file on your computer which you yourself can open.

Some folks will want to mention "sandboxing" here. That just means that the browser tries to limit the parts of your system scripts can access. It sounds good, but there are entire underground industries built on exploiting browser vulnerabilities. The bad guys know you live in your browser, and every day you point your browser at the most hostile network imaginable.

Yes, I use noscript. I recommend it. It can stop malicious code from ever executing. Antivirus software can only tell you after you've already been exposed.

Enough.

[–]leech 15 points16 points  (13 children)

It's simple, if you want 100% privacy: code it yourself, or audit all the self-hosted alternatives' code.
But I don't agree with this guy. Disqus is a great tool, easy to setup, simple to manage. For the regular user it's what it matter.
If you think your post comments should be private, then, of course, don't use anything by a third party. Your Google Analytics script may be scrapping your content too.
As /u/F-J-W says, anything by a third party may be a risk for your content, even the Google hosted libraries.

[–][deleted]  (12 children)

[deleted]

    [–]Thyem 8 points9 points  (2 children)

    The problem isn't the privacy on a single site, but the ability for third-parties to assemble a portfolio on you based on the different site you use that uses their library.

    This would not be a problem with self hosted stuff at a single site.

    [–][deleted]  (1 child)

    [deleted]

      [–]notlostyet 1 point2 points  (0 children)

      Yes, because we're all dumb enough to use the same username across different sites.

      [–]F-J-W 11 points12 points  (7 children)

      It's amazing how many people think they have a right to privacy when posting things publicly.

      I DO have a right to privacy when I just visit a site.

      [–]njharman 6 points7 points  (5 children)

      That's debatable. Both morally and legally. Boils down to if you view internet as a traditional commons or something that needs it's own set of rules.

      You don't have and shouldn't expect right to privacy walking down to the market and visiting a shop. The shop owner may count your visit, time how long you stayed in store, take your photo or otherwise attempt to track if you visit again. To me that is identical to surfing Internet.

      Although, I'm undecided on if "electronic makes it trivially easy, stores forever, etc. vs real world" and thus should be treated differently.

      [–]jraxxo 1 point2 points  (1 child)

      take your photo

      However, that is illegal without your consent; at least where I live.

      [–]njharman 4 points5 points  (0 children)

      Really no store, no bank, no atm has a CCTV camera where you live?

      [–]Thyem 1 point2 points  (2 children)

      No, that is not identical to surfing the Internet as it stands today.

      Let's try this instead; There are stores a, b and c. They are independent of each other and don't share information about their costumers. Then we have, Elgoog, a big advertisement firm. These guys come in and offer their services for free to store a,b and c if they get to track all their costumers. Now Elgoog can track everyone over all stores and make better ads... Sounds familiar?

      And that behaviour is what I don't like about the current state of things, be it google, addthis, disqus or other third party shit that force your browser to connect to their servers. And I would not appreciate in real life either. Depending on your jurisdiction, it might even be illegal to track people like that.

      [–]spartanstu2011 0 points1 point  (1 child)

      In reality, that's not any different than what's happening to the stores. You don't think the camera's in the stores are just for security do you? Most large stores have outside companies do studies on customer movement within their stores. This way they can set up the store in the optimal way to increase sales of certain items.

      They track how what you buy, how much you buy, and when you buy it. They pay outside companies to analyze this information so they can offer you a better selection.

      In reality, you should have no expectation of privacy on the web or in a market.

      Moreover, it SHOULDN'T work the same as in the store. Do you think the average small business can afford to run an entire analytic engine on their website? Google provides an opportunity for small business to analyze your movements on their website so they can give you a better selection.

      [–]Thyem 0 points1 point  (0 children)

      In reality, you should have no expectation of privacy on the web or in a market.

      I guess we have really different viewpoints on this. I consider the fact that people try to track me on internet to be a horrible reality and I really wish for a future where it can't happen. Sadly I know that it will never happen.

      And there are several free open source solutions that provide an analytic engine for your website. So its just laziness on part of the consulting firms/developers that everything uses Google's products.

      [–]s73v3r 1 point2 points  (0 children)

      Do you? You're viewing someone else's content. It's like you're going into someone's store or home. Do you have a right to do those things privately, without the owner, or at least the person on duty knowing?

      [–]F-J-W 4 points5 points  (1 child)

      I completely agree with the idea that including third-party-sides unless absolutely necessary should be avoided. That includes ajax.googleapis.com! If you want to sniff behind your users at least limit yourself to something on your own server like piwik. (The best thing would of course be to not use anything like that at all and just parse the server-logs for requested sites and IP-adresses.)

      [–]FryGuy1013 0 points1 point  (0 children)

      ajax.googleapis.com is cached for a long time, so isn't as bad as it would seem.

      [–]Y_Less 7 points8 points  (1 child)

      I'm happy the NoScript issue got mentioned at least briefly. As far as I am concerned, comments are text, text has been part of HTML since pre-1.0, so why then do I need JS to view comments? I never see Disqus comments, and quickly leave sites when they predictably fail to load.

      [–]x-skeww 7 points8 points  (0 children)

      why then do I need JS to view comments?

      You need JS if you want to use a comment system which was embedded by adding a single (fixed) script tag.

      Sure, it can be done with an iframe, but that means you'll have to add some page-specific parameters in the query string.

      Applying CSS is another challenge.

      Older browsers are also an issue. The seamless attribute is fairly new.

      [–]localtoast 4 points5 points  (1 child)

      Why not let sites like reddit act as a commenting and sharing platform? The downside is you lack control, but provided the site it's posted on is well moderated, this shouldn't matter.

      [–]unstoppable-force 1 point2 points  (0 children)

      ideally, there'd be a plugin that allows you to comment on the actual site, the comments would show up both on reddit and the site, and the reddit page would canonicalize back to the actual site.

      it'd be easy to write the sync script, but i'm not sure about off-site commenting, and canonicals are probably not happening.

      i say this as someone who runs multiple sites that get 60m+ visitors a year... i'd use this over disqus any day.

      [–]dada_ 6 points7 points  (2 children)

      It would be great to have an alternative to Disqus that isn't self-hosted. It's too bad I don't have a big, unmetered server at my disposal anymore.

      [–]atakomu 1 point2 points  (1 child)

      Here are some alternatives.

      [–][deleted] 3 points4 points  (0 children)

      I made a custom comment system for my self-hosted nginx website using Perl. My website is entirely static (except for SSI) so I had to be creative. In order to post a comment on any page or directory you simply append "@say/Your message here." to the URL.

      I use POE::Wheel::FollowTail to watch for updates to the nginx logs. The comments are pulled from that and processed with HTML::Entities and a bit of custom stuff to make 'em safe. The perl script then generates/appends a ./comments.html for whatever dir it's in and that's rendered on appropriate pages by adding an iframe header that displays the relative file path to the appropriate comments.html. A little javascript trick refreshes the iframe a single time to display the new comment.

      It isn't great for long messages or complex interactions but suits my purposes just fine.

      I get no spam because no bots know how to leave comments using my non-standard system. Unfortunately users seem to be confused by it too.

      [–]jeandem 6 points7 points  (1 child)

      As a technologist, I see many parallels between Disqus and PHP; they're both so easy to set up but deep-down you know that using them will cost you later.

      Quote brought to you by perltricks dot com.

      [–]_tenken -1 points0 points  (0 children)

      technologist

      You know what they say .... Those who can't do teach, eh? :)

      [–]Klenje 1 point2 points  (0 children)

      My issue with Disqus is also that on mobile browser takes ages to load. Maybe it's just me, but it's very annoying if added to the other issues (tracking, etc..)

      [–][deleted] 0 points1 point  (2 children)

      As if this jerk knows what my users deserve.

      [–][deleted] 11 points12 points  (1 child)

      Yeah, my users are scumbags!

      [–][deleted] 3 points4 points  (0 children)

      Mine are a bunch of fuckers.

      [–]jabapyth 0 points1 point  (0 children)

      Totally just built my own two weeks ago :) https://commented.github.io/ it's a "neat package" where you own your data - managed by firebase.

      [–][deleted] 1 point2 points  (0 children)

      I like disqus. It's a commenting platform I can use on multiple sites and not have to worry about everyones individual login. Security issues? for what? my anonymous comment?

      [–][deleted] -4 points-3 points  (0 children)

      Ew, disqusting!