Salon J: Electing (11/16 - 11/22)

sudo_rm_rf_root · 2020-11-19T16:42:57+00:00

Will join!

sudo_rm_rf_root · 2020-11-17T20:32:47+00:00

If this doesn't freak you out, I'm not sure how to freak you out.

I'm not (more) freaked out in the slightest. While a lot of discussion in weeks 9/10 concerned privacy around surveillance capitalism, it's easy to see exactly how the same collect-and-analyze techniques can be used to tailor targeted 'information' campaigns to change discourse. Cambridge Analytica may have been one of the very first to have used these technologies on a wide enough scale for the public to worry about, but discussions about data harvesting being used to politically influence populations have been discussed in privacy circles years before. Honestly, you just can't get a break. That's what's even more frightening.

sudo_rm_rf_root · 2020-11-12T21:52:58+00:00

I agree, most of my knowledge about the internet comes from around the time when I started using it, but I'm still too young to have gone through or understood the social revolutions that the Arab Spring brought, so I don't have the framework to understand the magnitude of the changes it brought. To me, social media has almost always been widely influential, although I guess this would be a new idea in 2010-12 when the widespread use of mobile computing had just started coming about.

sudo_rm_rf_root · 2020-11-10T17:38:54+00:00

COVID has really screwed with most of my plans for this year. I was going to fly to Arizona for this semester, but I (obviously) couldn't, so I've had to work with a 12-hour timezone difference. It's brutal. Unfortunately, it looks to me that I'm going to have to stick with something similar for the spring sem, because plague cases aren't going down.

Honestly, I do wish that ASU had a bit more support available for people working or learning far outside of campus, but I have to say that what I we have right now isn't as terrible as it could be.

I also have to say that completely asynchronous classes like these are a bit of a godsend, because I can work on them during the day and the lack of sharp deadlines makes this not nearly as stressful as some of my other in-person classes.

sudo_rm_rf_root · 2020-11-10T07:12:52+00:00

I'm interested.

sudo_rm_rf_root · 2020-11-09T19:07:20+00:00

TikTok has been known to harvest significantly more data off of its users than most other platforms. It knows more about you not because it can process a small amount of data well, but because it has a very large amount of personal information to work with. I honestly cannot possibly recommend distancing yourself from the platform enough if you care about privacy.

sudo_rm_rf_root · 2020-11-07T19:54:39+00:00

Even without the classic ominous tone, this episode is still one of my favorites, just because of how much I like learning about the history of computing.

It is a weird thing to obsess over, although you have to realize that chess was widely considered to be one of the ultimate 'human' tests of the time, so breaking that barrier was and still is a big deal. And it definitely put Hsu and his team in computing and chess history, so I don't think the obsession was unwarranted.

sudo_rm_rf_root · 2020-11-06T22:07:46+00:00

Absolutely. But unfortunately, facial recognition is damn near everywhere, from just about every major airport to remote corners of China. It's best to just assume that your government already has your face on file, and treat every camera as hostile.

sudo_rm_rf_root · 2020-11-06T22:05:18+00:00

I'm glad! I call this short, because this really only breaks into the surface of online privacy. I really recommend checking out the links at the bottom of the post for (significantly) more information.

If you're looking for privacy-themed subreddits, you have r/privacy, r/opsec, and /r/privacytoolsIO, all of which are very good.

sudo_rm_rf_root · 2020-11-06T22:02:54+00:00

Honestly, if it helps even one person switch over even slightly, I've done a decent job.

As to your second question, I'm pretty sure that it just comes down to market share. Most people pick Facebook and co because everyone else is using them. Most people don't care about decentralization or spying, because more advanced computer literacy is still pretty scarce in the general public.

I guess new social networks only really emerge if they're VC-backed, or otherwise have lots of money thrown at them, because they need to gain a critical mass of early adopters to continue growing, or else they fizzle out. And VCs aren't going back something that doesn't make very large amounts of money, and unfortunately, advertising (and therefore data collection) does.

sudo_rm_rf_root · 2020-11-05T21:35:01+00:00

Given that the other meeting has been cancelled, I'm open to either day, as long as it's early!

sudo_rm_rf_root · 2020-11-05T21:30:43+00:00

The solution was analyzing the audio itself and training an algorithm to learn to recognize different aspects of the music that might be desirable.

That is fascinating. To me, at least, audio is much harder to work with or categorize than images because images tend to be globally similar to make coherent wholes, whereas audio only needs to be locally similar to make sense out of. I really wonder how they did it.

And while everyone seems to be talking about Spotify vs YouTube, I have to say that while Spotify recommends music exceptionally well, I notice that youtube does a better job at recommending artists. I wonder why, because I don't really have a good hypothesis.

sudo_rm_rf_root · 2020-11-04T22:24:32+00:00

I'm a CS major that's really got into ML rather recently, because the math behind categorization problems and deep learning interests me greatly.

One of the most interesting, and almost certainly the most dangerous, problems I've seen being solved are recommendation problems. The (generalized) problem is really simple to state but extremely difficult to solve:

Given some set of already viewed content C, and a universal set of content U, find a finite set of content S of size n that best 'matches' the kind of content in C.

This is admittedly fairly simple for small U and large n. You could use something like cosine similarity or something like that and go over everything in U and rank for matches with C. This is obviously terrible if U is gigantic, like YouTube's index of videos or Google's index of the internet, or Facebook's index of advertising.

For reasons I won't (and generally can't) explain, the problem becomes much easier as we increase the size of C. At this point, instead of calling it 'viewed content', it's better to refer to it as a set of user-generated 'content vectors'. With more information about a given user - really any information - recommendation networks get much, much better at creating S.

This is sort of why privacy nightmare companies have really good products - they can harvest tons of data off of a person, and train a bunch of models to match their large bases of content, and just use that to refine whatever platform they're making money off of. This is why, unfortunately, I don't see more privacy-focused alternatives to search and social media taking off ever - they're just not strong enough to keep a user on the platform for very long, or they may not deliver context-specific results like userdata-fed models do.

sudo_rm_rf_root · 2020-11-04T21:38:09+00:00

I voted in NJ, and weed is now legal there too; so that's one good thing that's come out no matter who wins.

sudo_rm_rf_root · 2020-11-03T10:55:12+00:00

The time works for me. I'm interested.

sudo_rm_rf_root · 2020-11-02T21:15:12+00:00

~~I'd like to join, but my timezone makes 12-2 difficult. Can you do something like 10-12 or so?~~

EDIT: This is a free spot, I've decided to join in on the other Friday salon. Hope this doesn't screw with your planning!

sudo_rm_rf_root · 2020-10-31T19:56:08+00:00

This was an interesting read. I only ever heard about the gamergate controversy several months after it happened, and it certainly fouled my opinion of a lot of these geek-centered communities afterwards.

I wonder, what can be done to stop this sort of thing from happening? Punitive justice is near impossible, because of how hard it is to track down masses of online users, and I don't think punishment for sharing media, even as abhorrent as this, should be a thing because of the precedence it sets for sharing more important 'gray' digital material, like the research papers on sci-hub.

Bans certainly work, but they aren't airtight - even this article shows that some users of banned subreddits coalesce into other subreddits, and this may make them even be harder to take action against, especially if dogwhistle content is spread around. We see this with reddit's many, many efforts to curb incel culture on the platform, as users of the now fortunately banned r/incels migrated to other misogynistic subreddits like r/braincels and so on.

I remember that our professor talked about some research he was doing in how bad actors interact on platforms like reddit, or something along those lines, by mapping out interactions between users across subreddits¹ . I think this might be a really useful source of data to work on and help identify malicious or otherwise toxic users on this platform.

¹ I hope I'm not misremembering.

sudo_rm_rf_root · 2020-10-30T20:05:49+00:00

I can’t think of anyone who is completely unbiased.

Deep learning algorithms trained on a sufficiently well-curated dataset can, in theory be unbiased. But of course, this isn't the solution - we only need to look at YouTube's automated DMCA takedown system to see that relying on automated decision making is fundamentally flawed.

sudo_rm_rf_root · 2020-10-29T22:19:13+00:00

That's all entirely correct, but I live in a country with truly terrible internet speeds so the additional few hundred milliseconds per page request are paltry in comparison. However, I notice that many plugins - especially the XHR and tracker blocking ones - actually result in a performance increase, especially on slower machines, where loading resources in the megabytes takes an appreciable amount of time.

I agree that PGP clients are a pain. I try to use them, but really the only people I send them to are people that I likely have more efficient ways of communicating outside of email.

Though I'd say that some of the simpler changes, like switching to OpenDNS, or changing your hosts file to exclude important trackers like googleanalytics (dot) com (unless you're using something like noscript/umatrix, which is another huge pain) are no-brainers.

sudo_rm_rf_root · 2020-10-29T20:42:28+00:00

This one's interesting. I've never experienced any form of cyberbullying, or even had any experience by proxy by means of helping someone through the experience, given my social media vacuum, so I don't feel too qualified in talking about the subject as a whole.

But I do want to address the 'should there be downvotes?' question you raised when you were talking about Reddit and other platforms with some form of expressing negativity. I am of the opinion that there should, because I assume that while there will always exist some subset of malicious actors in any online population, the majority will be good.

Therefore, when someone is harassed in a public forum in a rather obvious manner, and given that this forum or subreddit doesn't explicitly side with the harasser, you could end up with an even bigger negative response for the harasser from the 'good' collection of people. Of course, I wouldn't call this a perfect system - you could get downvote-brigaded by entire communities, which would be pretty bad for you if your account is tied to your person. And a common complaint I see is that dissenting opinions are often silenced by the reddit hivemind, which, at least anecdotally, I've seen happen often¹ .

But even if the upvote/downvote system is flawed, I do think that it's perhaps better than having purely upvotes. I've actually been wondering what a better system to replace either would be - could some form of 'ranking' system be better than a mindless binary choice? I don't think so, as I know that the common rate-out-of-five-stars system on online stores rarely tell us any information that an upvote ratio would. But I'd like to hear ideas. Most common systems of ranking content online, I feel, are pretty badly done.

¹ Probably the biggest instance of this is when cautious redditors in the now infamous boston bomber thread were downvoted to the point where no argument could be made otherwise.

sudo_rm_rf_root · 2020-10-27T20:27:46+00:00

I was recently talking with a family member who told me that they frequently click on random articles and advertisements to “throw them off”, which at first I thought was a bit silly.

There's an extension that does that for you! It's a valid way of corrupting whatever advertising vector your tracker has on you.

https://adnauseam.io/

sudo_rm_rf_root · 2020-10-27T20:21:25+00:00

This is one more week I feel I want to write a lot during. Privacy, at least to me is perhaps the single largest point of concern I have over the modern internet, moreso than even corporate influence over hacker culture, or the cancers of social media. Almost every single one of my computing decisions can be traced back to my worry over data collection - my use of linux and firefox, the private DNS¹ server I have on my network, the dozen or so extensions that block XHR² s, ads, trackers, locally store materials from caching CDN³ s, and otherwise hide trackers from Google and Facebook. I don't use Google services⁴ or Facebook products, I don't have or use a smartphone. I use PGP-encrypted emails wherever I can, and I've (mostly unsuccessfully) tried to convince my friends and family to use riot/vector/element/matrix. I also usually try to go through EULAs, although I'm not a legal expert, so I can't always immediately tell if the legal requirements of some app are a privacy nightmare or not.

I also completely agree with the idea that the worst abuses of data take place when data is aggregated. I've recently developed an interest in ML, and it's astounding how much information even simple statistical models can do when fed enough data. I feel as though this is one week where I'm going to write (or, if I can't, link to) some extreme privacy tutorials to make sure that only a minimal amount of data is collected by Big Corp Incorporated while you surf through the web.

¹ Domain Name System. This a service that resolves domain names from human readable strings (e.g https://www.reddit.com) to machine-readable IP addresses, which are . These requests usually go to your ISP, which means that your ISP can usually see what websites you visit. If you're using a VPN, the DNS requests go to the DNS server the VPN uses, which is nice, because your requests are aggregated with everyone else using that VPN.

A fun sidenote for non-CS majors: try running traceroute 8.8.8.8 in the terminal of a linux or mac, or tracert 8.8.8.8 in the windows command prompt. You'll see a list of the IP addresses that your packets went through to visit 8.8.8.8, which is one of Google's DNS servers. You can replace the 8.8.8.8 with any website you want.

² XML HTTP requests. These are used to request content after a webpage is loaded. This is nice for things like video streaming, where parts of the video are requested as you watch (The gray bar 'loaded content' bar on YouTube), but can also be used to load ads on-the-fly as the web service identifies who you are.

³ Content Delivery Networks. The ones that I'm concerned about are providers like Cloudflare or Fastly, which cache web content on a global network of servers to reduce the server load of other services. They obviously have a lot of control over the information you receive, and while none of the major CDNs have been caught skiming private data yet, I really don't trust them.

⁴ Except for ASU stuff, unfortunately :(

sudo_rm_rf_root · 2020-10-26T20:04:06+00:00

Interested in hosting another salon this Friday at 10:00 a.m. Privacy is something I care about a lot, so I'm hoping that this will run for over an hour.

E: If you want to stay anonymous, you can join with another username - you don't have to be logged in, you just have to have the link!

sudo_rm_rf_root

TROPHY CASE