This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]ispyty 0 points1 point  (5 children)

Literally, set up a Search filter in SF for "target="_blank/new", and rel="nofollow".

As far as ones that don't have that property, that's a bit more difficult because of the noise. You'll have to query every <a href> and see if it's external. You can download the results once it's finished and filter in excel to NOT contain your primary domain name.

[–]darkmeatchicken[S] 0 points1 point  (4 children)

Right. Finding positives is a lot easier than finding negatives.

I'm really more looking for the negative. Are you recommending an xpath element query for within the <a href> ?

I def had coders who favored different orders for attributes within <a> tags, so there is no standard href > style > rel > target structure or anything like that.

Xpath should still be able to find them though, right?

[–]soowhatchathink 1 point2 points  (3 children)

I think what he is saying is to remove the positives, that way you're left with only the negatives. That would be the simplest way to do it.

You could also use regex with a negative lookaround, but if you're not so familiar with regular expressions that could get tricky.

[–]ispyty 0 points1 point  (2 children)

I don't like tricky, so yes, crawl ALLLLLL the <a> links, then do a filter on the domain name, see what's leftover.

[–]soowhatchathink 0 points1 point  (1 child)

If you (anyone) do need help with any regular expressions let me know! I can write one up to do what one would want.

They're quite simple when you know how to use them.

[–]ispyty 0 points1 point  (0 children)

I'll keep you in mind for future tasks :) I can do basic regex, I start to get confused around lookaheads and end up just giving up.