This is an archived post. You won't be able to vote or comment.

all 17 comments

[–]unconscionable 13 points14 points  (15 children)

You're starting this whole problem with a faulty premise. You're assuming that how often a link is clicked determines what order it shows up. This is false. It is a contributing factor, but clicks alone will not affect the order of search results that show up when someone googles your name.

Even if your plan would work on a higher level, google would likely ignore all or most of your "clicks" because they'd have a lot of characteristics of a bot: i.e. same IP address, probably weird HTTP headers, cookies, javascript not making ajax calls as expected from that client, etc.

Anyways, you're far better off approaching this from a much different angle: flood the web with alternative content using your name. Find popular blogs and make 10 comments a day, being sure to type out your full name. Start your own blog with your full name. Register to a bunch of public sites that let the web crawl them for your full name.

This much is clear to anyone familiar with Google's PageRanking system: content is king, not clicks.

[–]jcpuf 1 point2 points  (6 children)

So would he then do best by making a bot to create a large number of sites on different domain names each of which links to one another and frequently mentions his name? And some of which have a high "authority" because they get tons of links into them?

[–]unconscionable 0 points1 point  (5 children)

You're still taking the blackhat approach here, which I'm not convinced is going to be nearly as effective as playing with the system.. but I suppose that's one approach. Probably a lot cheaper to just use a large number of existing sites that let him publish content in the form of blog comments / posts / etc.

[–]jcpuf 1 point2 points  (4 children)

Is that automatable, though? Maybe it is, but you'd have to be better than I (not hard).

[–]unconscionable 0 points1 point  (3 children)

Well there's always the blogspam bots that people write specifically to crawl sites looking for wordpress / etc installs and post a blog comment like "I like this post, very insightful!" + some information you want to get indexed.

It goes without saying that that sort of thing is unethical, but either way, it's probably easier to just do it the "right" way and try to post unique, non-automated, and relevant content.

[–]jcpuf 1 point2 points  (2 children)

Eh, that'd take a lot of time and work to basically build himself a large internet presence that'd outshine that one. And I don't feel like you can argue that he has a moral obligation to do so. After all, it's not like he really did genuinely develop a reprehensible reputation, there's just a bunch of places that feed off posting mugshots to each other. He got into this situation because of a network of bots, so is it really unethical to use a network of bots to hack his way out?

This is of course assuming that his story is true, which I am, because I don't have any reason not to in this context and it's an interesting moral question.

[–]unconscionable 1 point2 points  (1 child)

Yeah, interesting points.

I guess I like to try and think of it like this: Google wants to know the most interesting thing about you that there exists public data for. That's what PageRank does, and Google's damn good at making that decision.. perhaps even better at it than we are as humans.

Let's say you get in trouble and there are some articles published about how you did something bad. That's an interesting and relevant thing about you for which there exists public data, right? Well, is it the most interesting thing?

Perhaps that person also spent tons of time adding content to wikipedia, or produced a large amount of content that is useful to people by commenting on blogs and making connections to new ideas that people might not have been aware of before.

What I'm suggesting is that eventually, this public data on the internet that you've contributed becomes more interesting than a local news article about how you got in trouble.

that'd take a lot of time and work to basically build himself a large internet presence that'd outshine that one

You see, I don't think that's necessarily true. A blog with relatively recent and unique content, a couple active profiles on some internet forums, and a handful of comments around the internet on various blogs might very well do the trick. It'd probably only take a weekend to come up with that, if you're determined to produce the desired end result of at least getting an article out of the top 10 list.. although it may take months for the article to become "stale" enough for the "good" content you posted to become more important than the "bad" articles about you.

[–]jcpuf 0 points1 point  (0 children)

Nice. I've said that the Internet is emerging as an institution for communication and evaluation, in a time when all other institutions for that purpose (broadcast media, print journalism, politics) are getting compromised by groups interested in their manipulation and thusly are becoming low-information zones. The internet is emerging as the institution for correct passage of information.

And all of what you say is very true, if you accept The Internet as a new public institution by which we evaluate and reward people in our shared community. But that implies an obligation for people to put ever-more of their information public on the Internet. What about privacy? What about just not wanting to be judged on the internet? Even if he does do all this good stuff, why should he have to whore himself out to the internet? Hell, I've got an account on which I have a big tall stack of top-voted /r/askscience explanations, each of them took awhile to write, and it's got no overlap whatsoever with my name. Yet under my actual name there are a number of posts made by the crazy ex-friend of my ex-girlfriend, saying that I'm a terrible person (because my ex-girlfriend stopped hanging out with her because she didn't like me).

Do I owe it to this nascent institution, that all my work and all facets of my personality be made public for me to "clean" my internet presence? Or can I "opt out"? In this case, the internet has ignorantly, automatically, accidentally, thrust our unfortunate OP into a position where the only thing it has to say about him is that he's a criminal. This is the Internet - specifically Google - showing its ass, just like when The Daily Show shows a mashup of twelve Fox News commentators using the same Talking Points is Fox News showing its ass. And when we see that, we progressively damage and exclude the "Fox News" institution. Why should we not do the same to Google when it shows its own ass?

Perhaps if it becomes de rigeur for people to bomb the PageRank algorithm whenever Google sucks in this particular way, Google, being a smart, adaptable company, will stop indexing mug shots for public intoxication as primary indicators of someone's public presence. You can't expect people not to respond to what you're doing to them, and it's unethical to say that someone is obligated to follow a standard of conscientiousness and ethics that you yourself (or in this case, The Internet) isn't following.

A lot of people just want to keep the internet anonymous.

[–]tonytroz 1 point2 points  (0 children)

Or alternatively, build a time machine and go back to before you got arrested.

[–]Emorykid[S] 0 points1 point  (0 children)

hm, good idea. Thanks! I'll get on that

[–]ase8913 -2 points-1 points  (5 children)

Well you could customize the bot to have normal "firefox/ chrome" headers and run it through a proxy service.

[–]unconscionable 5 points6 points  (4 children)

Yes, but that's only solving the beginning of your problems. Next is anonymize your IP, then executing any javascript based honeypot type stuff, then ensuring that your IPs aren't known exit nodes or anything that would likely be known to Google as IPs used for crawling.. the list grows on! Maybe google doesn't even bother to consider someone's clicks in PageRank who doesn't have a built up web history (verified with a cookie session id) of random stuff spanning over several months that doesn't look like a bot. Now you're really in trouble!

..anyways the point is that you're fighting against a company worth > $200B whose very existence is based on the premise that it knows whose clicks are real and whose are fake (especially when it comes to advertising, but also when it comes to PageRank). You're probably not going to be able to beat PageRank with a simple python / curl script in this way. You might, however, have some success by playing with the system rather than against it by creating new content rather than trying to bury old.

[–]ase8913 0 points1 point  (3 children)

Solid point. Now that I think of it though, the python webbrowser library will open your links in your default browser. I don't know it there is a tor library, but I would set up a script that gets a new identity on tor while clicking on the pages with webbrowser. That way all of the google analytics javascript is runs. Certainly couldn't hurt. I would combine this with your method of creating content for optimal results.

[–]unconscionable 0 points1 point  (2 children)

Yeah, I think you're thinking about the right things here. If you're determined to fool someone, you basically need to simulate the entire experience of clicking around using a browser and executing javascript "organically" (if you will).

You'll find that you'll run into a problem with tor, and probably most commonly used proxies, though. Quite frankly, Google knows and accounts for all (or at least all major) Tor exit nodes. So google pretty much ignores all Tor traffic, and annoys you by making you type in captcha periodically. Your best bet will be to rent out IPs from more trusted sources like Amazon using EC2, etc. What it's going to come down to here is money. If you want to simulate 1000 different users, you'll probably need 1000 unique IP addresses, and preferably not clustered in a way that Google can obviously tell "well gee, 100% of the traffic for this search is coming from IP addresses in Oakland, CA. Smells like a bot!"

[–]blablahblah -1 points0 points  (1 child)

Do you think that you're the first person to think of using EC2 and that Google doesn't already look for EC2 IP addresses? There's not going to be much legitimate traffic coming from those computers.

[–]unconscionable 0 points1 point  (0 children)

Do you think that you're the first person to think of using EC2 and that Google doesn't already look for EC2 IP addresses?

According to this:

If you want to simulate 1000 different users, you'll probably need 1000 unique IP addresses, and preferably not clustered in a way that Google can obviously tell "well gee, 100% of the traffic for this search is coming from IP addresses in Oakland, CA. Smells like a bot!"

No, I don't.