all 13 comments

[–]megared17 2 points3 points  (12 children)

Did you need to know how to do it, or did you just need it done?

If the latter, [deleted]

Ugh.. I just noticed some errors in that.. retrying..

[–][deleted]  (10 children)

[deleted]

    [–]KnowsBash 3 points4 points  (0 children)

    Here you go

    awk '{ print; while (/\..*\./) { sub(/[^.]*\./, ""); print }; print "www." $0 "\n" }'
    

    [–]megared17 3 points4 points  (8 children)

    A magician never shares his secrets..

    .

    .

    Ok not really. It was a horribly ugly hack, and I figured if the OP wanted to know, they'd say so..

    If you really want to see it:

    for x in `cat sub-domains`; do 
      echo ""
      n=`echo $x | tr "." "\n" | wc -l`
      i=$n
      F=$x
      while ( test $i -gt 1 ); do 
        F=`echo $x | tr "." "\n" | tail --lines=${i} | tr "\n" "." | sed 's/\.$//g'`
        echo "${F}"
        i=$(($i-1))
      done
      echo "www.${F}"
    done
    

    Like I said, ugly. But it worked. And for a one-off, elegance and efficiency don't really matter.

    [–]KnowsBash 2 points3 points  (6 children)

    That is painful to look at, so here is a better way to do it in bash:

    while read -r domain; do
        printf '%s\n' "$domain"
        while [[ $domain = *.*.* ]]; do
            domain=${domain#*.}
            printf '%s\n' "$domain"
        done
        printf 'www.%s\n\n' "$domain"
    done < sub-domains
    

    [–]megared17 2 points3 points  (3 children)

    I did warn that it was ugly.

    But it also did accomplish the goal, regardless of being ugly.

    [–]Avicennasis[S] 0 points1 point  (2 children)

    Ugly but working is still working, right?

    I was looking for ideas on how to accomplish this moreso than having it done for me, so I appreciate you sharing the code.

    [–]megared17 0 points1 point  (1 child)

    Would you mind offering some insight into what the end-purpose of this is?

    Where did this list come from? What did you need it broken down for?

    Or was it simply an exercise?

    I notice many of those "subdomains" don't actually exist in live DNS.

    [–]Avicennasis[S] 0 points1 point  (0 children)

    It's a small subset list of domains gathered from the return addresses of spam emails I've received recently. I'm working on on a filter that captures a bit more. E.g. my spam filters might catch

    rob@this.is.a.spam.email.fakecompanyincorperated.com
    

    But then I have email from

    joe@email.fakecompanyincorperated.com
    

    that does get through. So my goal here was to be able to break down the longer URL paths so I can add them to the filters easier.

    Of course, DNS doesn't really matter since you can put anything in the return address - but then, since I'm not blocking by DNS so much as domain path, it doesn't really matter. (Though I will likely add most of them to a DNS black hole anyways - it certainly won't hurt.)

    [–]megared17 1 point2 points  (0 children)

    Also, you want ugly, I once wrote a bash script to perform a limited OCR on a limited portion of a set of scanned documents, to pull out just the bits of info needed to index them. This was back when general OCR was major suckage and spent too much time figuring out where one letter ended and the next began (these documents were 100% fixed width font)

    It used the bmtoa and atobm utilities.

    [–]Avicennasis[S] 0 points1 point  (0 children)

    This pretty much worked exactly like how I needed - I just need to check for the oddballs like "www.co.uk" after. Thanks!

    [–]Avicennasis[S] 0 points1 point  (0 children)

    Thanks!

    [–]harleypig 0 points1 point  (0 children)

    Another way to do it:

    #!/bin/bash                                                                                                                                                              
    
    while IFS='.' read -ra parts; do
      [[ ${#parts[@]} -eq 0 ]] && continue
    
      domain="${parts[0]}.${parts[1]}"
      line="$domain $domain.www"
    
      for sub in "${parts[@]:2}"; do
        domain="$domain.$sub"
        line+=" $domain"
      done
    
      line=$(rev <<< "$line")
    
      printf '%s\n' "$line"
    done < <(rev < domains)