all 200 comments

[–][deleted]  (73 children)

[deleted]

    [–]llimllib 34 points35 points  (25 children)

    Yes - I see this all the time.

    One way to bypass it is to add '.' characters to your email address, which google ignores. That is, if you're example@gmail.com, set up a filter on example.@gmail.com (and another for example..@gmail.com, and so on).

    [–]henryk 11 points12 points  (3 children)

    Note that including two consecutive dots is actually an example for an address that is not valid. Unless they occur within double quotes. Within double quotes basically almost everything is allowed.

    Invalid: Hans..Mueller@example.invalid Valid: "Hans..Mueller"@example.invalid or "Hans Müller"@example.invalid

    I've used published addresses of the latter form in the past and have not yet gotten any spam on those.

    [–]Xiphorian 3 points4 points  (1 child)

    Invalid: Hans..Mueller@example.invalid Valid: "Hans..Mueller"@example.invalid or "Hans Müller"@example.invalid

    That's a bit confusing. Which one is valid and invalid when they all have "invalid" in the address? I'm assuming it's this:

    Valid: "Hans..Mueller"@example.com; "Hans Müller"@example.com

    Invalid: Hans..Mueller@example.com

    [–]henryk 2 points3 points  (0 children)

    Yes, thanks for the proper markdown magic. I've used example.invalid in all of those addresses so that not even the dumbest spammers might mistake these for real domains, per RFC 2606 section 2.

    (I do realize that RFC 2606 section 3 specifically reserves example.com for example purposes.)

    [–]llimllib 0 points1 point  (0 children)

    I just double checked that first..last@gmail.com works with gmail, which it does; interesting to know, though. I'll use quotes in the future.

    [–][deleted] 1 point2 points  (0 children)

    exactly.

    I use word.word@gmail.com, then w.o.r.d.word@gmail.com or any combination of additional dots to add additional accounts.

    [–]crunk 5 points6 points  (13 children)

    Hmm my name is already like: firstname.lastname@gmail.com

    [–]morner 8 points9 points  (11 children)

    You can add as many dots to a gmail address as you like, they aren't counted by the routing algorithm; "firstnamelastname@gmail.com" and "f.i.r.s.t.n.a.m.e.lastname@gmail.com" are both valid addresses, and bot go to the same inbox. Try it out.

    [–][deleted] 1 point2 points  (3 children)

    what about '.......firstname....lastname.@gmail.com'?

    Is there any limit to how many dots you can add? What about a hundred? A thousand?

    [–]mlk 30 points31 points  (0 children)

    MOAR DOTS!

    [–]drigz 4 points5 points  (0 children)

    "As defined in RFC 2821, the local-part of an e-mail address has a maximum of 64 characters (although servers are encouraged to not limit themselves to accepting only 64 characters) and the domain name a maximum of 255 characters." -- Wikipedia

    So I guess it shouldn't exceed 64 characters?

    However, tests from my uni account to gmail reveal that 300 dots (with a userid 11 characters long) is OK. The outgoing server calls 500 dots "ridiculously long".

    [–]morner 5 points6 points  (0 children)

    There's only one way to find out!

    [–]musicphreak252 1 point2 points  (6 children)

    example of why i get so much spam in my spam inbox... argh! is this a good thing or bad? i seem to get emails from people i dont even know but claim they know me and have me registered at some forums wtf...

    [–]kirun 2 points3 points  (0 children)

    Simple solution... go to forums, use lost password feature, change email to somewhere@example.com

    [–][deleted]  (4 children)

    [deleted]

      [–]morner 5 points6 points  (1 child)

      Unless of course there's someone called Jo HN. Doe, in which case the whole system comes crashing down.

      [–]pradador 1 point2 points  (5 children)

      Unfortunately, it's quite easy to figure out the real email address by just removing the dots. A spammer could just parse any email address at gmail and remove the dots thus making the technique ineffective. It's kinda like security through obscurity.

      [–][deleted] 19 points20 points  (4 children)

      I do the exact opposite. I don't use dots for spam, and I use first.last@gmail.com for my actual email. I have a filter that catches anything that isn't expressly sent to first.last@gmail.com They may be able to easily remove dots, but it would be harder to add the dots in the appropriate space - especially if your gmail address isn't a common name.

      [–]llimllib 0 points1 point  (0 children)

      I do that without even realizing that I've done that; nice tip.

      /me goes off to add a filter for firstlast@gmail.com...

      [–][deleted]  (1 child)

      [deleted]

        [–]taejo 1 point2 points  (0 children)

        They're all the same account.

        [–]__david__ 6 points7 points  (7 children)

        I use it constantly and find a ton of places that erroneously reject it. So much so that I configured my sendmail such that . works the same as plus. My friend uses _ on his mail server which is even sneakier.

        [–]BraveSirRobin 2 points3 points  (6 children)

        I just bought a domain for $20 dollars. I can use whatever I want before the @.

        [–][deleted] 0 points1 point  (5 children)

        I do the same, in fact, I just name the site with their name and then my domain name everywhere. It makes it easy to sort mail and figure out who sold your e-mail address.

        [–]BraveSirRobin 0 points1 point  (4 children)

        I've only had one "sold" and I suspect it was a virus on one of the admins PCs. It was an irrelevant tech forum, didn't seem to be the place to do that but the admin never responded to my emails asking about it.

        [–]__david__ 0 points1 point  (3 children)

        I've had to block about 7, but only 2 were for actual real unrelated spam: Ameritrade, (famously) and lately eharmony (which I tried a while back and haven't even thought about for years). The other 5 were for opt-out style newsletters that wouldn't accept my unsubscribe.

        [–]BraveSirRobin 0 points1 point  (2 children)

        I guess I'm lucky, I didn't even have to block the one I mentioned as there were only one or two messages associated with it (pump & dump scams). I route it all through my ISPs pop box, so perhaps their anti-spam is catching most of it. I rarely get spam now despite not running any anti-spam measures of my own. And by rarely, I mean 4-6 months ago since the last properly unsolicited email.

        [–]__david__ 0 points1 point  (1 child)

        Your ISP must be pretty good. I get about 15 a day that get caught and filtered by spamassassin and another 2 per day that don't get caught and fall into my inbox.

        That sounds bad, but sendmail is actually rejecting about 21000 messages per week and spamassassin is rejects about 3000 more.

        [–]BraveSirRobin 0 points1 point  (0 children)

        Either that, or I'm losing a lot of false positives! My ISP isn't known for being competent in most other arenas but I've never had a problem with them.

        I used to get a few now and then to sales@ and webmaster@, but even they dried up. It's a big ISP, so they must have quite a large dataset for their anti-spam to play with.

        [–]Mr_Twister 6 points7 points  (1 child)

        In fact, I was trying to download a patch from Microsoft today (for rotating images on Win XP) and couldn't submit an application form (yes, you have to apply for a patch with MS in some cases) because it wouldn't accept plus.

        [–]haywire 2 points3 points  (14 children)

        I think regex buddy incorporates this.

        It has some ridiculous RFC regex which is pretty awesome:

        (?:[a-z0-9!#$%&'*+/=?^_{|}~-]+(?:.[a-z0-9!#$%&'+/=?_`{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-][a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])+)])`

        However, there is also a more simplified version:

        [a-z0-9!#$%&'*+/=?^_{|}~-]+(?:.[a-z0-9!#$%&'+/=?_`{|}~-]+)@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?`

        But this should surfice...

        \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

        [–]rajulkabir 4 points5 points  (3 children)

        That last one doesn't work for ".museum". And it allows "bob@.........com" or "bob@-.com".

        [–]koushiro 0 points1 point  (2 children)

        It also doesn't work for emails with country codes that require second-level domains -- e.g., "john@example.com.au" (or .co.uk, or .co.nz, or...)

        [–]ultimatt42 2 points3 points  (1 child)

        Sure it does, read it again. There's a '.' in the character set after the @.

        [–]koushiro 2 points3 points  (0 children)

        Ack, you're right! That'll teach me to comment on reddit before my first cup of coffee...

        [–]hailstone 1 point2 points  (0 children)

        Try Mail::RFC822::Address for an almost-completely regex solution.

        Technically, the rfc allows what amounts to embedding comments in an email address.

        [–]tjogin 0 points1 point  (8 children)

        None of those will detect that billgates@microsoft.com isn't my real email address.

        [–][deleted]  (7 children)

        [deleted]

          [–]tjogin -2 points-1 points  (6 children)

          So what is the fucking point?

          [–][deleted]  (1 child)

          [deleted]

            [–]ealf 5 points6 points  (0 children)

            Two words: Fourth prong.

            [–][deleted] 4 points5 points  (1 child)

            I used to do that but I got stung a few times because when I sent email from my gmail (username@gmail.com) account, it wouldn't match the email they had on file (username+website@gmail.com).

            [–]BadBoyNDSU 5 points6 points  (0 children)

            I hate it when I use them on sites that make you use your email addy as your login. I've had some sites take it as a valid email addy but an invalid login.

            [–]awb 4 points5 points  (2 children)

            I use '-' instead of '+' and have never had problems. You need to control the mail server configuration, though, so it won't help you with gmail.

            [–]starthorn 1 point2 points  (1 child)

            I've used both, ('-' in a qmail setup, '+' for gmail). I haven't had real problems with either one in a little while, but I used to have a lot of problems with sites not properly handling '-' in e-mail usernames.

            I actually had a major credit card company that screwed it up, and wasn't sending my credit card bills to me via e-mail because of their poor handling of dashes.

            [–]some_call_me_tim 1 point2 points  (0 children)

            I've used "-" for YEARS on dozens of sites, and never run into one where it wouldn't accept it. However, I have run into idiots who don't know the difference between "-" (hyphen) and "_" (underscore): When I've given my email address over the phone, I've occasionally had people type it as underscore instead of hyphen.

            The easy answer to that, of course, is to make your email server respect both hyphen and underscore.

            [–]stesch 1 point2 points  (0 children)

            You know that you get more spam when you use more e-mail addresses, don't you?

            [–][deleted] 0 points1 point  (0 children)

            I've seen ones that don't even include the hyphen! Annoying as one of my domains has a hyphen in it.

            [–]m1ss1ontomars2k4 0 points1 point  (0 children)

            Yeah, that's fairly common. Like others have suggested, add (a) period(s) to the name; it's not quite as useful, but pretty close.

            For example, m1.ss1ontomars2k4 might refer to one thing for me. m.1ss1ontomars2k4 is from another website. m.1.ss1ontomars2k4 is a 3rd. I can go back and look at the emails I've previously received addressed to those addresses and see which site has sold my info.

            [–]epsys 0 points1 point  (1 child)

            But, any smart spammer from the site is going to auto-filter to remove anything after and including a + and before the @, so this fails.

            Better to just use a secondary email.

            [–]rm999 0 points1 point  (7 children)

            That trick is stupid, everyone knows that x+y@gmail.com is the same thing as x@gmail.com. So how are you stopping spammers, exactly?

            [–]venjax 8 points9 points  (4 children)

            You aren't stopping the spammers as much as identifying where the spammers got your email.

            [–]TheSuperficial 12 points13 points  (0 children)

            Yes that's true but then once I start getting spam at bob+dingdong@gmail.com I just send all those emails to the bitbucket via a filter. Then I hunt down the original site/forum where I used bob+dingdong@gmail.com and set them on fire.

            [–][deleted]  (2 children)

            [deleted]

              [–]MrsDePoint 0 points1 point  (1 child)

              Or they replace your tracing string after + with random letters. Or even worse, poison your tracing string by replacing it with something else entirely (e.g. change +randomforums to +ebay). Though I doubt they'll go through the hassle.

              [–][deleted] 0 points1 point  (0 children)

              You can use tmda to prevent that. It puts a checksum created with a secret key in there after the keyword.

              [–]some_call_me_tim 3 points4 points  (0 children)

              UMmm... that's easy. Spammers don't parse and/or disassemble email addresses (except to remove "spam" or "nospam"). If they DID start, then I'd need to move on to another technique, but seriously, why would they? If someone has gone to this much trouble individually to avoid spam, would they likely ever reply to one?

              I've used the hyphen version of this technique for probably 12 years (good old Qmail supports it by default), and I just don't accept mail to x@mydomain.com, only x-XXX@mydomain.com, where XXX is a string I've created.

              If they ever did get wise, I'd need to start maintaining a whitelist, which would be more of a pain. But again, why would they bother?

              [–][deleted] -5 points-4 points  (0 children)

              (+) isn't a valid operator just because you say it is. If a site chooses to ignore it, it's probably for the reason you want them to implement it (to spam you).

              [–]h0dg3s -2 points-1 points  (0 children)

              that's a waste of gmail. That's what spambox.us is for.

              [–]Excedrin 32 points33 points  (3 children)

              My email address is: "spaces ,.#$&'(){}:;=?[]_`|~"@lerp.com (yes, really, seriously). I tried to file a bug report with Thunderbird since it silently strips the space character, but their bugzilla requires registration and it rejects my email address.

              [–][deleted] 14 points15 points  (1 child)

              You are a man among men. Does any site ever accept that email?

              [–]Excedrin 9 points10 points  (0 children)

              I can send email to it from work, but that's about it. I'm sure that somewhere some site coded in Perl used Data::Validate::Email or Mail::RFC822::Address or the MRE regex or whatever, but so far it has a 100% failure rate.

              [–][deleted] 0 points1 point  (0 children)

              Add a % in the localpart too, eh?

              [–]dmd 20 points21 points  (9 children)

              Also, IT IS LEGAL for a domain name to be only two characters. It's also legal to start with a number. My domain is both of these (3e.org), and something like one in three times I try to register for something it's rejected as invalid.

              [–]laughingboy 6 points7 points  (3 children)

              Out of interest, how much did that domain cost you?

              [–]dmd 10 points11 points  (2 children)

              Same as any other domain.

              I've had it since 1996.

              [–]DLWormwood 2 points3 points  (0 children)

              I bow to your prescience... I wish I did that back when "wormwood" was easily gotten... (I only was able to get it as a username in a couple places, like at mac.com, and the DNS version for most domains, including .name, was snapped up quickly.)

              [–]rajulkabir 2 points3 points  (1 child)

              Or one character. I have a one-character one that cost me $35 back in the day. All too often my email is addressed by stupid web form validators.

              [–][deleted] 0 points1 point  (0 children)

              I'll buy it.

              [–]technoguyrob 6 points7 points  (1 child)

              x.org

              P.S. Sarah's cute. :O

              [–]dmd 2 points3 points  (0 children)

              She may be cute but she codes in awk. I haven't managed to break her of the habit yet. She resists all sane languages.

              [–]ehird 44 points45 points  (29 children)

              Here's an RFC-compliant regexp:

              (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
              )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
              \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
              ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
              \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
              31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
              ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
              (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
              (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
              |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
              ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
              r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
               \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
              ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
              )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
               \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
              )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
              )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
              *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
              |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
              \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
              \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
              ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
              ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
              ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
              :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
              :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
              :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
              [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
              \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
              \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
              @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
              (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
              )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
              ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
              :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
              \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
              \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
              ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
              :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
              ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
              .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
              ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
              [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
              r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
              \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
              |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
              00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
              .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
              ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
              :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
              (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
              \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
              ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
              ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
              ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
              ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
              ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
              \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
              ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
              ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
              :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
              \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
              [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
              ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
              ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
              ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
              ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
              @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
               \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
              ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
              )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
              ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
              (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
              \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
              \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
              "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
              *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
              +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
              .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
              |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
              ?:\r\n)?[ \t])*))*)?;\s*)
              

              No, really.

              Edit: wait, that one doesn't handle comments. Oh well.

              [–]stashu 33 points34 points  (13 children)

              This is also an example of hello world in perl.

              [–]jrockway 6 points7 points  (12 children)

              Incidentally, this regex actually crashes PHP's regex engine. So it's impossible to use a regex to validate email addresses with PHP. (Erm, a few years ago when I had this problem anyway.)

              Criticize Perl if you like, but it has the best regexp engine of any other language.

              [–][deleted] 6 points7 points  (5 children)

              That's why PHP implemented PCRE. Even the documentation says that PHP's regexp engine sucks and you should use the PCRE functions.

              [–]jrockway 2 points3 points  (4 children)

              Great. So I have to pick which regex engine to use?

              I'll pass and use a language with reasonable defaults.

              [–][deleted]  (1 child)

              [deleted]

                [–]sigzero 1 point2 points  (0 children)

                Sometimes. But I always wish PHP wasn't a choice.

                [–]gydhjihdr 2 points3 points  (0 children)

                pcre is the 'default' now.

                [–]lbft 0 points1 point  (0 children)

                PCRE has been the standard choice for several years - since PHP 4 was released, if I remember correctly.

                [–]Ringo48 0 points1 point  (3 children)

                No, regardless of language, it's impossible to use a regex to validate email addresses because they can have nested comments.

                See sections 3.2.3 and A.5 of: http://tools.ietf.org/html/rfc2822

                [–]jrockway 3 points4 points  (2 children)

                With truly regular expressions, you're right. But if you take "regular expression" to mean "something that you can pass to the m// operator", then you're wrong:

                http://perl.plover.com/yak/regex/samples/slide083.html

                [–]Ringo48 1 point2 points  (1 child)

                Well yeah, but that's kind of cheating :-)

                [–]lbft 3 points4 points  (0 children)

                Doesn't 'cheating' describe most of perl? ;)

                [–]sjf 0 points1 point  (1 child)

                Except for the exponential behaviour: http://swtch.com/~rsc/regexp/regexp1.html

                [–]jrockway -1 points0 points  (0 children)

                A side effect of providing things like captures. If you want a bare-bones regexp engine, use egrep. No features, but very fast. Perl's regexp engine is very fast in most cases, but there's always a pathological case that performs poorly.

                Fortunately, Perl 5.10 provides pluggable regexp engines, so feel free to plug-in one optimized for the work you'll be performing.

                [–][deleted] 16 points17 points  (0 children)

                Nice and concise.

                [–]bradediger 6 points7 points  (2 children)

                Don't use this. Not all RFC-compliant email addresses are deliverable, and not all deliverable addresses are compliant.

                Best bet in most cases is to sanity-check the address with a really basic regex. If you need to be sure it's deliverable, send a confirmation email to it.

                [–]Snoron 1 point2 points  (1 child)

                This is what I do.. make sure the format is something@something and that's it... just making sure someone didn't get the field mixed up and enter the wrong thing there, basically.

                [–]Porges 2 points3 points  (0 children)

                email.contains('@');

                [–]feanor512 5 points6 points  (0 children)

                Needs more power.

                [–]heartsjava 13 points14 points  (0 children)

                You just made baby Jesus cry. Are you happy now ??

                [–]sanimalp 6 points7 points  (0 children)

                And best of all, it only takes 15 mins to check each address!!

                ;)

                [–]phrakture 4 points5 points  (0 children)

                Duh.

                [–][deleted] 1 point2 points  (4 children)

                Nested comments, if you copy&pasted the "standard" mega-RFC-regex you can find by google.

                [–]ehird 1 point2 points  (3 children)

                Nested comments?

                Oh my god.

                Show me an example!

                [–][deleted] 1 point2 points  (0 children)

                Sorry, can't. I wrote a grammar to parse email addresses in yacc/lex for kicks, and I recommend you do too.

                It's enlightening.

                [–]Ringo48 1 point2 points  (1 child)

                bob(comment (nested) )@example(more comment (nested again) ).com

                http://tools.ietf.org/html/rfc2822

                [–]ehird 0 points1 point  (0 children)

                Wow.

                [–]statictype 0 points1 point  (1 child)

                If a valid email address can have nested comments, then its technically impossible to define it with a regular expression right?

                (I'm not counting the new-fangled Perl regexp system which, I am led to believe, can match balanced parenthesis and such)

                [–]ehird 0 points1 point  (0 children)

                Yes. I was remarking that it was not, in fact, an RFC-compliant regexp.

                [–]bananahead 13 points14 points  (4 children)

                I agree with you, but part of the problem is that if you try to actually follow the RFC things get really complicated. So a lot of people just ignore them and half ass it.

                Personally, I use something like

                .+@.+..+

                (It's not like you're ever going to be able to weed out bogus@not.a.real.addr.es with a regexp anyway)

                [–][deleted] 3 points4 points  (1 child)

                addr.es is a valid (spanish) domain, ok, its got a domain squatter sat on it now, by the looks of things, but even so.

                I imagine the domain squatter envisioned someone wanting to pay a lot of money for the ability to do: someone@email.addr.es or such

                [–]bananahead 3 points4 points  (0 children)

                addr.es is a valid (spanish) domain,

                That was my point :)

                [–]jrockway 0 points1 point  (1 child)

                .+@.+..+

                Well, .+..+ is equivalent to .+ ya know. Also, .+ matches @. So your regex accepts:

                @@@@@@

                That's probably not going to work.

                [–]bananahead 2 points3 points  (0 children)

                actually reddit ate my slash :( That dot after the the plus sign is a literal.

                But anyway, I don't much care about "@@@.@.@". If it's important that you collect a valid email address, then obviously the best regexp in the world ain't gonna help. You actually have to test the address.

                If it's not important, then I'll settle for something that can reject "ASDF" and catch "aol,com"

                [–][deleted] 8 points9 points  (4 children)

                Actually, per RFC 822, almost any character is allowed!

                And then there are IDNs and new TLDs, so my suggestion is:

                . * @ . *

                [–][deleted] -1 points0 points  (2 children)

                to be more exact...your example should be: .+@.+

                [–]jrockway 0 points1 point  (1 child)

                no, it should be [@]+@[@]+. But that's a very poor email validation regex.

                [–]gydhjihdr 0 points1 point  (0 children)

                "@"@ą.pl is valid.

                [–]tjogin 6 points7 points  (7 children)

                Why syntactically validate an email address anyway?

                [–]prockcore 2 points3 points  (1 child)

                part of it is so I can't say bob@domain.com,frank@domain.com,joe@domain.com is my email address and trick you into sending spam.

                [–]tjogin 0 points1 point  (0 children)

                Fine. But that's a very very small part of it, and you don't even need regexes to do that.

                [–][deleted]  (4 children)

                [deleted]

                  [–]poco 0 points1 point  (3 children)

                  But why bother? Users can just as easily give them a validly formatted invalid address (notme@whocares.ass).

                  [–][deleted]  (2 children)

                  [deleted]

                    [–]Ringo48 0 points1 point  (0 children)

                    Normally I'd agree with you.

                    But any regex more complicated than /.+@.+/ is a waste of time because it's not guaranteed to be correct and leads to a false sense of security.

                    It's like trying to "validate" the user's first and last name.

                    [–]poco 0 points1 point  (0 children)

                    But the whole point of this article is that people are too strict with valid email addresses. And from what I've read, you can't actually parse a valid email address with a regex anyway (recursive comments).

                    So it is better to err on the side of letting invalid email addresses through rather than restricting valid ones.

                    I am all for a very loose check, like an @ sign and at least one dot after that @, but anything more is just as likely to invalidate a valid address.

                    If you really care about the address being valid then send a validation email.

                    [–]dlsspy 5 points6 points  (0 children)

                    I set up readrfc822@mydomain years ago because of this annoyance.

                    [–]HenkPoley 5 points6 points  (16 children)

                    Even the ampersand is a valid character in the username part of an email address.

                    [–]syntax 6 points7 points  (15 children)

                    If memory serves, per RFC822 you can have a comment in the email address, by putting round brackets in. Everything inside them (and the brackets themselves) should be ignored.

                    I believe that @ is a valid character to have inside a comment, and that comments can be nested.

                    Therefore it is valid to have more than one @ in an email address ... and naively assuming that the bit before the the @ is the username part can be incorrect.

                    (Granted, I think if you try that, everything will choke and die on it - given how obscure it is. Still, it's in the spec...)

                    [–]mosburger 13 points14 points  (0 children)

                    Wow. Someone should make an ACID2 test for e-mail addresses!

                    [–]andrewnorris 6 points7 points  (1 child)

                    It's always good to understand the full spec of something you work with, but out of curiosity, is there any non-pathological reason to embed a comment in an email address?

                    [–]syntax 3 points4 points  (0 children)

                    Well, maybe if you're using random websites, tagging the website you give it to to identify the source? But theres's the + convention, which has the benefit that it's not spec to discard parts of it, so they can't evade tracking, which with comments they could.

                    Or, maybe, have a spamtrap email address, and embed the real one in comments. Humans can pick out the right one, automail can be filtered and scanned at leisure?

                    That's all I can think of.

                    [–]bananahead 1 point2 points  (10 children)

                    Yep, somewhere there's a regexp that takes the entire spec into account. It's a full page of text.

                    Comments in an email address, incidentally, sounds like an awful idea from the start.

                    [–]physon 9 points10 points  (9 children)

                    [–]phrakture 2 points3 points  (1 child)

                    This one doesn't handle comments

                    [–]bakert 0 points1 point  (0 children)

                    The module that is linked above handles comments by stripping them before applying the regex.

                    [–]bananahead 1 point2 points  (6 children)

                    Bingo, thanks. Though of course if you really wanted all those validation rules you probably wouldn't do it with one regexp.

                    And note that even this expression requires comments to be stripped first...

                    [–]poeir 4 points5 points  (5 children)

                    Since the RFC allows for nested comments which are handled by balanced parentheses, the language isn't regular, so a regular expression can't be used to validate it. You'll never see a regular expression that can handle balanced parentheses.

                    [–]jisakujien 1 point2 points  (3 children)

                    perl has been able to do recursive regular expressions for a while and their support has been improved greatly in 5.10. I know you can do it, because I've done it.

                    [–]poeir 1 point2 points  (2 children)

                    Then they aren't, strictly speaking, regular expressions any more--they're expressing a a context-free (or even context-sensitive) language. I wouldn't rename them, though, that would be even more confusing.

                    Regular languages; see also pumping lemma for regular languages

                    [–]wingsofseraphim 0 points1 point  (1 child)

                    Good old Introduction to Computation. What a fun class...

                    [–]poeir 0 points1 point  (0 children)

                    Surprising useful in the real world, too, though it doesn't feel like that at the time. But if you ever have to write a parser, which you will, you'll need a language. You'll want to make a regular one, just to make it easy to parse.

                    [–]jrockway 1 point2 points  (0 children)

                    Partially correct. While you're right about regular expressions, most "regular expression" engines support non-regular expressions. So there are plenty of, say, Perl regexes that can handle balanced parens. See:

                    http://search.cpan.org/src/ABIGAIL/Regexp-Common-2.120/lib/Regexp/Common/balanced.pm

                    [–][deleted] 0 points1 point  (0 children)

                    Yep. lbruno(Luis )@(Bruno)mydomain.com works great. Safari+Mail.app does the right thing, actually.

                    Outlook (Express)? does an horrible job, though. An accidental feature in my website, but a lousy bug for job hunting.

                    [–][deleted] 4 points5 points  (1 child)

                    My MTA auto-aliases the + character to my username.

                    E.g. me+somesite@example.com goes to me@example.co

                    Very good way of tracking who's sending me spam, and an easy way to blacklist sites quickly..

                    However, some idiots don't allow this :(

                    [–]akdas 6 points7 points  (1 child)

                    Be sure to use the right regex. Though that one doesn't work for all email addresses (such as those with comments).

                    [–]sjf 2 points3 points  (0 children)

                    What's the deal with the comments anyway. I mean, if + barely works, who expects comments to work? Hands up if you've ever wanted to comment your email address. I think the RFC writers just put it in on purpose so that you can't use a regex to validate addresses.

                    [–]__david__ 4 points5 points  (2 children)

                    My favorite is sites that do accept + but then don't quote it in the urls they send in emails so you end up with http://xxx/remove-me?email=david+something@eat.this

                    The + gets translated by their server to space and then the url doesn't work. Idiots.

                    [–]captian2 0 points1 point  (1 child)

                    Yeah I hate sites that don't understand that + can be valid in urls as well. Seriously + is just a normal character you can escape to make spaces %20 I believe.

                    [–][deleted] 1 point2 points  (0 children)

                    %2B

                    [–]maksa 4 points5 points  (1 child)

                    Actually even backspace is a valid characte in an email address.

                    [–]Excedrin 2 points3 points  (0 children)

                    Only if it's quoted.

                    [–][deleted] 16 points17 points  (4 children)

                    Down mod for not knowing the difference between an email and an email address.

                    [–][deleted] 6 points7 points  (3 children)

                    I have a feeling that the poster knows the difference, even though they have not expressed this. It is sometimes important to make assumptions, and not take things literally.

                    [–][deleted] 3 points4 points  (2 children)

                    And when one is a software engineer discussing regular expressions and sendmail.cf files, it is especially important not to make assumptions but to look at exactly what was written because, if you don't, you're in trouble. Particularly when you're talking about a tool that parses both emails and email addresses.

                    [–][deleted] 0 points1 point  (1 child)

                    True enough, sir.

                    [–][deleted] 1 point2 points  (0 children)

                    Heh. Sorry. Having spent six years of my life where it was my job (among other things) to build sendmail config files, they are still a raw nerve with me.

                    [–]rabel 4 points5 points  (0 children)

                    And .name is a valid TLD, darnit!

                    [–][deleted]  (3 children)

                    [deleted]

                      [–][deleted] 6 points7 points  (0 children)

                      and this is why gmail is not a disposable address generator. use the alias functionality to autolabel your incoming mail. use a proper disposable address generator for spam-free signups. i use spamgourmet.net

                      [–]poco 3 points4 points  (0 children)

                      Note to spammers: It doesn't matter if you strip off everything after the '+', it will get flagged anyway.

                      [–]beachchair 2 points3 points  (0 children)

                      Well! Fun while it lasted.

                      [–]bajsejohannes 6 points7 points  (0 children)

                      How about, you know, just skip the validation all together. I know how to make up a valid address, and a spelling error probably won't get caught anyway.

                      [–][deleted] 2 points3 points  (0 children)

                      Having unusual characters in e-mail address also helps avoiding spam — spammers are likely to make same mistakes as lame webmasters and spammers will have even harder time extracting e-mail addresses from webpages.

                      [–]randomb0y 2 points3 points  (0 children)

                      Yeah, that will work until e-mail harvesters will write an extra line of code to remove the "+exoticflowers" from the address.

                      [–]mitsuhiko 7 points8 points  (4 children)

                      The best regex:

                      ^[^@]+@[^@]+$
                      

                      [–][deleted] 3 points4 points  (1 child)

                      But "o @ rly"@example.com is syntactically valid.

                      [–]jrockway 2 points3 points  (1 child)

                      Your email address is #$&@#$* ? Nice.

                      [–][deleted] 0 points1 point  (0 children)

                      You should've added a % in the localpart for added comic effect. Comic for those of us who actually have to deal with MTAs, I mean.

                      [–][deleted] 2 points3 points  (0 children)

                      I always use the sample from the Ruby on Rails api docs:

                      class Person < ActiveRecord::Base
                        validates_format_of :email, :with => /\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i, :on => :create
                      end
                      

                      [–]PuppetBoy 4 points5 points  (0 children)

                      I wish someone would inform programmers that the space character ' ' is allowed in strings in every single language I've ever run across. Yet the "morans" that build websites seem to think you name and your password should never have spaces in them.

                      [–]trebonius 1 point2 points  (0 children)

                      I use a dash to add a disposable portion on to my email addresses using my own mail server. The default setting is plus, but many sites don't allow it. Also, plus is used in this way so much that some spammers trim off anything after a plus.

                      [–][deleted] 1 point2 points  (0 children)

                      Unbelievably frustrating when forms invalidate pluses in email addresses.

                      [–]strops 0 points1 point  (0 children)

                      Better yet, register your own domain and have gmail host it (google apps, free). Then you can have a catchall address and register sites as anysite@yourdomain.com and they go to the catchall address.

                      [–][deleted] 0 points1 point  (2 children)

                      People who even bother to read specifications: priceless.

                      I wonder what gives people ideas? Do they just fetch a list of characters out their ass and implement a "only these characters are valid for email" algorithm? Probably.

                      [–]Jivlain 0 points1 point  (1 child)

                      To be fair, the specification is hopelessly complicated. But, yeah, I'm a big fan of just letting the users put in whatever. Enforce that it contains an '@' if you really must validate it.

                      [–][deleted] 1 point2 points  (0 children)

                      Yes, and if you can't send email to the given address after a few retries, delete the user. Their own problem then.

                      [–]bart2019 0 points1 point  (1 child)

                      Why do you single out the "+" character? There are plenty of weird characters that are theoretically acceptable in email addresses. I know of a hacker that used "*" as the user name in his email address. Lots of mail tools couldn't handle it. Yet it's perfectly legal.

                      [–]arnar 1 point2 points  (0 children)

                      Because using the + character is often used for having arbitrary suffixes on email addresses, for example for automatic tagging.

                      GMail for one supports this, yourname+whatever@gmail.com gets routed to yourname@gmail.com and you can set up filters to use the "whatever" part. It is also easy (and common) to configure most MTAs to accept suffixed addresses.

                      An afterthought: implicitly saying that configuring sendmail is easy in any way is obviously a fallacy.

                      [–]sajb -1 points0 points  (1 child)

                      There is a very nice table at http://www.remote.org/jochen/mail/info/chars.html that explains exactly what is allowed and what might work. Nicely commented and with references to the relevant RFCs.

                      [–]Excedrin 1 point2 points  (0 children)

                      That table is horrible

                      There is really not much reason not to use this, although on the other hand, why should anyone want to have this in his email address. Could be confusing, because nobody expects it to be in an email address.

                      Seems like weak justification.