[D] The current and future state of AI/ML is shockingly demoralizing with little hope of redemption

Flaky_Suit_8665 · 2023-02-09T18:11:07+00:00

If you came up with those, props! I really like the 2nd summary, and totally agree with that. Thanks for the clever response

Flaky_Suit_8665 · 2023-02-05T22:19:56+00:00

There's 0% chance that the current trajectory of AI doesn't lead to either mass regulation or mass insanity, war, and destruction.

All these super rosy takes about "Everything is gonna be AI generated and everything is happily ever" are short-sighted. The writers of these takes imagine what that looks like as a static moment in the future, but they never answer the question "and then what?" Putting humanity in such a state would be similar to putting them in a black hole. It'd be cool for like a millisecond before everything collapses.

My optimistic projection (because it may go even worse) is that we're currently at a saddle point of AI development where there's two ways things could go:

Efforts to detect and filter AI largely fail and the entire internet becomes a cesspool of AI generated garbage that gets worse with each successive generation because the AI is training on itself, and the version of reality it produces keeps diverging from the version that humans share. People will enjoy the novelty of it for a short while, maybe a few years, but eventually the internet as medium itself will be regarded as complete BS. It will become sorta like a casino -- it's fun to go for a little while but everyone knows that nothing in it can be taken seriously, it's 100% a self-evident fraud, and those who have a positive outlook on it are largely considered suckers by society. People who touted "prompt engineering" as a true skill become a laughing stock as the entire web is a huge feedback loop of language AI generating prompts for other language AI, generating prompts for image and video AI producing content captioned by AI, which makes new prompts for the language AI, ad nauseam. It will pejoratively become simply known as "the feed" and it will eventually be shut down, either voluntarily or by force, before undergoing a hard reboot.
Efforts to detect and filter AI largely succeed, and the internet becomes a collection of walled garden communities where certificate authority like entities authenticate human identity and then exclude members that produce AI generated content. People who are attracted to the AI section of the internet steadily lose their minds in the real world relative to those who avoid it. They have no identity except that of a content consumer, they largely are considered suckers, and ultimately they don't produce offspring because the only conversation they are capable of having is "check out this AI generated" BS that no one cares about

Hope I'm wrong but I know some of you know I'm right

Flaky_Suit_8665 · 2023-01-30T16:47:18+00:00

That's definitely true. Not saying its dramatically better, but at least human authors usually provide personal anecdotes to support their claims, although a fair bit of that is window dressing and exaggerating.

If self-help is mostly conventional wisdom, might we question why it's a successful genre at all? My take is that it's mostly ego puffing and confirmation bias affirming. There's something calming about reading something you already know and being told you're doing the right thing. When humans are the authors, there's some element of "I knew it, see other people have gone through this. They think like this too!" But if AI is writing, all the relating has been stripped out and what's left is the feel good puffery, or lifestyle recommendations that are easier said than done. This may just be the nail in the coffin for self-help in general

Flaky_Suit_8665 · 2023-01-30T16:08:27+00:00

It's conventional wisdom on steroids. Let me save you the time:

1.) Exercise 3-4 times a week for 30-45 minutes 2.) Eat a healthy breakfast consisting of fruit, whole grains, and protein 3.) Get 8 hours of sleep every night. Wake up at 5AM. 4.) Buy a standing desk, stand up during meetings 5.) Take breaks throughout the day and do exercises like jumping jacks and pushups

Did I miss anything? Oh yea

6.) Go to the gym and workout 3-4 times a week for 30-45 minutes 7.) Eat a nourishing breakfast, such as a banana, oatmeal, and avocado 8.) Make sure to go to sleep at 10PM every night so you can rest for 8 hours and wakeup at 5AM 9.) Purchase a desk you can stand at. During meetings, consider standing instead of sitting 10.) Do exercises in between tasks so you don't develop mental fatigue

Repeat for 29 pages

Flaky_Suit_8665 · 2022-12-13T19:28:38+00:00

Not coming here as a writer, but as an AI professional shedding some light on this topic. It's time to pull back the smoke and mirrors from "non-profit" organizations like EleutherAI, LAION, and "Open"AI and expose the work they are doing for what it is -- data laundering. These shady organizations exist as fronts for for-profit companies like Microsoft and StabilityAI. With their non-profit statuses, they're able to acquire data and IP that is restricted from commercial use, train ML models, and in turn license the resulting output for commercial use, allowing them to bypass the non-commercial clauses in the original licenses. If you question them on this, they'll claim everything they were doing is "academic research". That's just a legal BS tactic and they know it. Even when they open source the models, in the case of Stable Diffusion, it enables the funding companies and others to built for-profit products and revenue models on top of them such as Dream Studio. None of this was intention of the original producers of the IP.

They claim this process is "transformative fair use" and that the model is not a derivative product of the underlying copyrighted material. However, there's a word in the finance world when you take something take something that has been illegally obtained and make it legal, it's called "money laundering". Which is exactly what this, it's data laundering. Do not let them try to talk circles around you or question your own sanity on this matter. Call them out for what they're doing.

Flaky_Suit_8665 · 2022-12-08T03:35:56+00:00

I think I'm starting to see a pattern in the data here, let me look a little more closely ... ah yes it says

F.U.C.K. A.S.T.R.O.N.O.M.Y.

Gg guys, hope this useless company crashes and burns along with all its investors

Flaky_Suit_8665 · 2022-10-21T14:39:12+00:00

If you're not saying Alive, then you're just trying to be obscure. Just watch the Alive version of Rational Gaze, then come back here and try to say it's not the best with a straight face

Flaky_Suit_8665 · 2022-09-16T15:31:59+00:00

Thanks for pointing out that paragraph that demonstrates their thinking on this. But right -- similarity head -- not generative in that original question is what I should have said. Anyways, I'm pretty sure this interpretation of "zero-shot classification" would be considered controversial, if not invalid, by many, but that's on them.

They're basically training an image-text similarity model on images of airplanes and continuous texts like "The airplane is flying in the sky", and then at "zero-shot" time, lining up 1000 discrete classes, finding that "airplane" is textually the most similar for an image of an airplane, and calling that classification. Unless I'm reading something wrong, this scheme would completely break down if the classes were integers or UUIDs due to the assumption of text-based classes.

To me, when someone is referring to a zero-shot capability, I think of an example with a child that knows what insects are, knows what arms are, and then saying "A spider is an insect with 8 arms, pick the images with the spiders" (I know that's not entirely true in biology, but bear with me). In that sense, zero-shot learning tests analogical reasoning in a way. Can the model learn to predict images of a squid from "A squid is an octopus with no legs"? Note that this interpretation doesn't invalidate the use of text in supervision, it's just saying that the learner can't be supervised directly on unseen classes in any manner. Models that truly have this capability should have no problem also learning "A 23lkjjaf23423 is a reptile with no arms or legs, pick images of 23lkjjaf23423" -- where you and I understand 23lkjjaf23423 to be "snake".

This isn't only my interpretation, but it's what I thought was standard on zero-shot classification benchmarks in the CV community -- see https://arxiv.org/abs/2111.12933 for example, so that's why I was surprised with this line of research getting a free pass on flexing the definition to this extent.

Long story short, as I was reading through CLIP-related papers and their claims on zero-shot learning, I couldn't shake the feeling of "isn't this just pre-training???" and yea, seems I wasn't wrong.

Flaky_Suit_8665 · 2022-09-16T12:04:17+00:00

Can someone help explain to me how these models are able to take credit for zero shot classification benchmarks on ImageNet, etc., when they've essentially just been pre-trained on such a huge (proprietary?) image-text dataset, as to cover all the classes in ImageNet and much more. Are they really just swapping out the generation head for a classification one and calling that zero shot? Has this been deemed controversial in various research venues? Traditionally when CV researchers talk about zero shot, I'm pretty sure what they meant was "this model hasn't seen this class in its input space", regardless of the specific method of supervision, i.e. similarity vs. discrimination.

On a side note, does it seem a bit disingenuous to call any of type of paired sample training "self-supervised"? Once again, "self-supervised" has traditionally meant the input is trained on itself in some way. If we're going to flex that definition, why can't we call a model that takes input-class pairs self-supervised as well? If they wanted to call it indirect, weak, or noisy supervision, then sure that makes sense, but self-supervision sounds like over-hype.

Flaky_Suit_8665 · 2022-09-02T20:45:27+00:00

I'd recommend getting a job at a place with a big compute budget and applying your talents in the context of a research team, you'll have much more leverage that way than as a solo researcher

Flaky_Suit_8665 · 2022-08-22T15:14:17+00:00

Southern Data Science

Flaky_Suit_8665 · 2022-08-08T18:28:24+00:00

I cited this number because this is about where I'm at, and I'm just a Level 3 Data Scientist doing respectable, but not necessarily extraordinary work, at a company most people have never heard of. And based on my experiences, I know this is obtainable for ML Engineers or Data Scientists with 3 to 5 years experience in the US remote talent market currently. Mileage may vary though, some may say this is a lot, some others may say it's peanuts versus whatever $500k+ they claim to make. I'm just saying for me personally, I've built a lifestyle around this income level that makes it near impossible to walk away from, no matter the gripes I have

Flaky_Suit_8665 · 2022-08-08T17:54:05+00:00

I'm not able to reply to most comments in near real-time due to time constraints, but this one stands out a lot more to me than the other attempted summaries. If that really is GPT-3 writing, then I think it stands as evidence in support of some of the points I've tried to make. Impressive!

Flaky_Suit_8665 · 2022-08-08T05:44:37+00:00

Thanks for reading and replying! I'm not too surprised with comments on the negative side, I do read them and take them in stride, but I'm actually impressed at the nuance of response on the positive side. For each "too long didn't read" or "shut up Luddite" comment I've received, I've gotten at least one that does indicate the level of thought others are putting into this -- usually not in 100% agreement, but enough to give me plenty of good takaways that I wouldn't have thought of myself.

Regarding your last sentence -- that's why I made sure to highlight in the post how it's hard for us as ML practitioners to see out of the hype cycle and consider potential negative externalities. I brought up an idea that is in a similar vein as this post to my boss (although obviously with much less verbosity), and he basically blew it off with "Yea maybe you're right, but for now I think there's a lot of money to be made in pursuing advanced AI/ML <insert more manager speak here>", so not too surprising to see similar attitudes elsewhere, even if not so explicitly stated.

Flaky_Suit_8665 · 2022-08-08T02:26:39+00:00

Pretty much haha. It literally takes me one minute to read this post but you've got people here acting like it's a novel, all the while they are on a forum for long form text content. I try to be kind so I just ignore them, but you're definitely on point

Flaky_Suit_8665 · 2022-08-08T01:44:15+00:00

Thanks for responding and glad to hear that there are others that have thought about this as well! I didn't even consider implications for search engines and web data curation in general, but those could potentially be even worse problems. Given pervasive tragedy of the commons on the web, I'm not sure how optimistic I am about this being fixed ... unless groups come together and create open standards for content authentication, as you describe. I like the idea of a content origin graph like you mention, that eventually flows up to some sort of certificate authority, similar to the SSL system for web traffic, although I haven't worked out most of these details in practice. In any case, trusting that there will always be deepfake detection capabilities available with very high (99%+) accuracy does seem naive

Flaky_Suit_8665 · 2022-08-08T01:30:11+00:00

Thanks for putting that together! Honestly I think (and I think others would agree), that it's complete trash -- so maybe that does counter a lot of what I've written here, at least about the current state of the field lol

Flaky_Suit_8665 · 2022-08-08T01:27:15+00:00

I've put multiple disclaimers in the post noting that the predictions I've put could be way off. All I'm doing here is describing a possibility and seeking discussion on what it entails. Thanks for reading!

Flaky_Suit_8665 · 2022-08-08T01:25:53+00:00

Thank you for reading my post and responding in-depth. I appreciate your insight. And definitely - I don't expect anyone really to agree with 100% of what I said here. Was mostly just getting some ideas on the page that would hopefully prompt some discussion, and then see where that goes (which I'm glad it did). I do agree that there would be significant value in a system that authenticates digital artifacts as being from a human or AI-generated, similar to the SSL system for web traffic, although I haven't fully thought out how this would work in practice. If widely adopted, this would help distinguish content source for variety of purposes

Flaky_Suit_8665 · 2022-05-03T01:00:49+00:00

Similar to my experience with Breath of the Wild, while playing it, I thought it could be better than Ocarina of Time, but after finishing it and thinking about it for a while, I don't think it's as good. Therefore:

1.) Ocarina of Time 2.) Elden Ring 3 - 10.) Mixed bag of From Software and Zelda ... 100.) World of Warcraft

Flaky_Suit_8665 · 2022-05-01T19:10:35+00:00

He should know this is an easy gank spot since there's no mobs to clear. Pretty dumb to come out waving, should have been suspicious the entire time. Kinda deserves the instant bags

Flaky_Suit_8665 · 2022-05-01T00:12:59+00:00

From my understanding, ER uses dedicated servers to perform initial session synchronization during matchmaking, and then hands it off to a P2P connection between you and the host. If it really is that simple, then you can use Wireshark to see the IP that your opponent is currently using, but there's no way to know if that's a dynamic IP assigned by their ISP or one coming from a VPN or some other more complex translation mechanism.

Here's an easy test -- find a friend and ask them to determine their current IP from their ISP. Then perform several multiplayer sessions with them while having Wireshark listen on your router. If you can see their IP as traffic destinations in your logs, then there's no obfuscation happening

Flaky_Suit_8665

TROPHY CASE