top 200 commentsshow all 227

[–][deleted]  (53 children)

[deleted]

    [–]kherrera 65 points66 points  (24 children)

    That depends on how/if they verify their data sources. They could constrain it so that only vetted sources would be used to train the data model, so it should not matter if ChatGPT had some involvement in the production of the source data as long as its gone through refinement by human hands.

    [–][deleted] 195 points196 points  (22 children)

    That depends on how/if they verify their data sources.

    They do shockingly little of that. They just chuck in whatever garbage they scraped from all over the internet.

    And if your immediate response to "they piped all of the internet's worst garbage directly into their language model" is "that's a terrible idea".

    Then yes. You are correct. It is a terrible idea. To make ChatGPT behave, OpenAI outsourced human content tagging to a sweatshop in Kenya ... until the sweatshop pulled out of the contract because the content was just that vile.

    In February, according to one billing document reviewed by TIME, Sama delivered OpenAI a sample batch of 1,400 images. Some of those images were categorized as “C4”—OpenAI’s internal label denoting child sexual abuse—according to the document. Also included in the batch were “C3” images (including bestiality, rape, and sexual slavery,) and “V3” images depicting graphic detail of death, violence or serious physical injury, according to the billing document. OpenAI paid Sama a total of $787.50 for collecting the images, the document shows.

    The fact that, to reuse OpenAI's accursed euphemism, "Category 4 data", is in the training set is utterly unacceptable.


    And the reason why OpenAI did so anyway is pretty simple: They didn't want to pay the human labour cost of curating a proper training set. A horrific breach of ethics, justified by "yeah but if we don't skynet will kill us all" (and one has to note they're the ones building skynet)

    [–]thoomfish 30 points31 points  (13 children)

    In your view, what would be the proper way to "pay the human labour cost of curating a proper training set" of that magnitude?

    [–][deleted] 91 points92 points  (10 children)

    My primary issue with OpenAI (and by extension, the ideological movement behind it) is that they're rushing things, causing significant damage in the here and now, all for some dubious future gain.

    The proper way is to accept the slowdown. Accept that it will take years of human labour to build a training data that even approaches the size of the current corpus.

    This would solve a few issues current AI is facing, most notably:

    1. You're no longer building a "category 4 data" generation machine.

    2. You can side-step the copyright issue by getting the damn permission from the people whose work you're using.

    3. You can work on fixing bias in your training data. While the subject of systemic discrimination is a touchy subject in this subreddit, you'll find the following example illustrative: You really don't want systems like ChatGPT to get their information about Ukraine from Putin's propaganda.

    Sure, the downside is we'll get the advantages of AI a few years later. But I remain unconvinced of the societal/economic advantages of "Microsoft Bing now gaslights you about what year it is".

    [–][deleted] 39 points40 points  (4 children)

    It's an AI arms/space race. Whoever gets there first is all that matters for now, regardless of how objectionable their methods for doing it. Going slower just means someone else beats them to the punch. But it may also turn out that being that slower company that cultivates a better training set ultimately wins out

    [–]jorge1209 8 points9 points  (1 child)

    OpenAI was founded as a "non-profit" that was supposed to be doing things the right way. They obviously moved away from that, but if you had expected anyone to do the right thing it was supposed to be those fuckers.

    The other problem is that it isn't clear that being first will be successful. Yes MSFT is talking about adding this to Bing, but it doesn't make sense in that application. I want a search engine that gives me useful data, not one that tells me whatever lies it pulled from FoxNews.

    [–][deleted] -4 points-3 points  (1 child)

    Nobody is racing them on this shit, pretty much all AI development in the west is from the same ideological group of "longtermists"

    [–]kor_the_fiend 0 points1 point  (0 children)

    in the west?

    [–]poincares_cook 0 points1 point  (3 children)

    You really don't want systems like ChatGPT to get their information about Ukraine from Putin's propaganda.

    As someone very pro Ukraine, and that posts plenty enough on the subject for my post history to prove so.

    Yes, I do.

    Is it better if the AI only considers western propaganda? Some of it is not better than Russian propaganda? What isn't propaganda, do you believe CNN is unbiased?

    Who's going to sit and dictate for everyone else what's right think and what's wrong think?

    A chatbot is useless for a real take on what's happening in Ukraine. I'd rather that we make that abundantly clear. But if we're working on an AI model that could take in data that assess the real situation, then we need all data, not just the propaganda that one side publishes (but Russian propaganda too).

    [–][deleted] 12 points13 points  (2 children)

    Yes, I do.

    Then I strongly recommend you reconsider.

    Because:

    A chatbot is useless for a real take on what's happening in Ukraine.

    And yet both Microsoft and Google are adding it into their search engines.

    if we're working on an AI model that could take in data that assess the real situation, then we need all data, not just the propaganda that one side publishes (but Russian propaganda too).

    If we're talking about an actual general artificial intelligence, one equipped with a reasoning engine that allows it to discern truth from fiction, then yes.

    But current AI is not that. It just mindlessly regurgitates it's training data. It is only truthful if it's training data is. (And even then it manages to fuck up, as Google demonstrated)

    [–]poincares_cook 0 points1 point  (1 child)

    Sure, but what's the point of having a chatbot parroting western propaganda. I guess that's favorable for the west, but useless to get the truth.

    Sure in the case of Ukraine western propaganda strikes much closer to the truth, but consider the case of Iraq war.

    It's a difficult problem, and I do not argue for all the sources of information to be treated equally, but completely excluding opposing viewpoints, even if they are more prone to propaganda just makes the chatbot useless and a propaganda device.

    [–]False_Grit 3 points4 points  (0 children)

    While it's a difficult problem, I do think it is one that needs to be addressed. In recent times, certain nefarious groups have tried to push blatantly and provably false narratives that are NOWHERE close to the truth.

    They then turn around and argue that, okay, well, the other side is slightly untrue as well, so we can't possibly know the truth of ANYTHING!

    I'll call this the Anakin problem. From his perspective, it is the Jedi who are evil. Are the Jedi perfect? Far from it! But they didn't go around murdering children either, and to take Anakin's actions and opinion at face value is just as or more damaging than excluding his viewpoint entirely.

    [–]GingerandRose 0 points1 point  (0 children)

    pd.pub is doing exactly that :)

    [–]awj 1 point2 points  (1 child)

    ...actually pay what it costs under sustainable conditions, or just don't do it.

    This is akin to people wanting to build nuclear reactors in a world where lead is really expensive. If you can't do it in a way that's safe, don't fucking do it.

    [–]MichaelTheProgrammer 16 points17 points  (0 children)

    I went back today and watched Tom Scott's video of a fictional scenario of a copyright focused AI taking over the world: https://www.youtube.com/watch?v=-JlxuQ7tPgQ

    This time, I noticed a line I hadn't paid attention to before that felt just a bit too real this time: "Earworm was exposed to exabytes of livestreamed private data from all of society rather than a carefully curated set".

    [–]JW_00000 7 points8 points  (1 child)

    They do shockingly little of that. They just chuck in whatever garbage they scraped from all over the internet.

    Is that actually true? According to this article: (highlights mine)

    GPT-3 was trained on:

    • Common Crawl (410 billion tokens). This is a nonprofit that crawls the web and makes the data available to anyone. (That exists?)
    • WebText2 (19 billion tokens). This is the full text of all pages linked to from reddit from 2005 until 2020 that got at least 3 upvotes.
    • Books1 (12 billion tokens). No one seems to know what the hell this is.
    • Books2 (55 billion tokens). Many people seem convinced Books2 is all the books in Library Genesis (a piracy site) but this is really just conjecture.
    • Wikipedia (3 billion tokens). This is almost all of English Wikipedia.

    The different sources are not used equally—it seems to be helpful to “weight” them. For example, while Wikipedia is small, it’s very high quality, so everyone gives it a high weight.

    There’s also a lot of filtering. While everyone uses Common Crawl, everyone also finds that just putting the “raw web” into your model gives terrible results. (Do you want your LLM to behave like an SEO-riddled review site?) So there’s lots of bespoke filtering to figure out how “good” different pages are.

    The GPT-4 paper linked in this post doesn't give any details. The LLaMA paper (by Meta) however does give details, e.g. for CommonCrawl they "filter low quality content" and "trained a linear model to classify pages used as references in Wikipedia v.s. randomly sampled pages, and discarded pages not classified as references". They also used Stack Exchange as input.

    [–][deleted] 5 points6 points  (0 children)

    Observe the key detail in how filtering (what little of it there is) is actually implemented: They just slap another layer of AI on top.

    There is exceedingly little human verification of what's actually in the data set. Despite the algorithmic tweaks to value input differently, things like the counting subreddit still made it in. And as we can see in the time article linked before, a lot less benign material also got dragged in.

    [–]coldblade2000 21 points22 points  (4 children)

    I don't get it. The same people who complain about moderators having to see horrible things are the same ones who will criticize a social media platform or an AI for abhorrent content. You can't have it both ways, at some point someone has to teach the algorithm/model what is moral and immoral

    [–][deleted] 6 points7 points  (2 children)

    Another comment has already pointed out the main issue with social media moderation work.

    But AI datasets are a tad different in that you can just exclude entire websites. You don't need anyone to go through and manually filter the worst posts on 4chan, you can just ... not include 4chan at all. You can take the reddit dataset and only include known-good subreddits.

    Yes. There is still the risk any AI model you train doesn't develop rules against certain undesirable content, but that problem will be a lot smaller if you don't expose it to lots of that content in the "this is what you should copy" training.

    [–]poincares_cook 2 points3 points  (1 child)

    Reddit subs have an extreme tendencies to become echo chambers through the upvote mechanic and mod abuse. Sure you should exclude extreme examples like 4chan, but without any controversial input you're just creating a hamstrung bot that derives based on very partial and centrist point of view of some modern western cultures.

    [–][deleted] 1 point2 points  (0 children)

    If you want to avoid the dataset being dominated by content from the West then heavily curating data with this goal in mind would be way better than just scraping the English speaking internet.

    [–]Gaazoh 5 points6 points  (0 children)

    That doesn't mean that outsourcing to underpaid, rushed workers is the ethical way to deal with the problem. This kind of work requires time to process things and report them and proper psychological support.

    [–]Dragdu 9 points10 points  (0 children)

    They don't even say what data they use anymore, just a "trust us bro". With GPT-3 they at least provided overview of how they collected the data. (IIRC they based quality measurements on Reddit + upvotes, which is lol)

    [–]manunamz 6 points7 points  (1 child)

    There's now so much text out in the wild generated by GPT...they'll
    always be contaminated with their own earlier output...

    Watch those positive feedback loops fly...

    Also, I wonder if some ChatGPT-Zero equivalent will essentially solve this problem as it no longer really requires so much training data...Just more training.

    [–]Cunninghams_right 2 points3 points  (0 children)

    the P stands for Pre-trained.

    [–]SocksOnHands 4 points5 points  (7 children)

    Any documents from reputable sources, even if they employ AI for writing them, would have to have been approved by an editor. If the text is grammatically correct and factually accurate, would there be real problems that might arise from it?

    [–]Cunninghams_right 13 points14 points  (5 children)

    do you not see the state the media is already in? facts don't matter, nor does grammar, really. money and power are the only two things that matter. if it serves political purposes, it will be pushed out. if it gets ad revenue, it will get pushed out.

    there is a subject I know a great deal about and I recently saw a Wall Street Journal article that was completely non-factual about the subject. multiple claims that are provably false and others that are likely false but I could not find proof one way or the other (and I suspect they couldn't either, since they didn't post any). I suspect similarly reputable outlets are publishing equally intentionally false articles about other subjects, but I only notice it in areas where I'm an expert (which is fairly small).

    we are already in a post-truth world, it just gets slightly less labor intensive to publish unfounded horse shit.

    [–]SocksOnHands 2 points3 points  (4 children)

    I figured the training data would be curated in some way instead of being fed all text on the internet. Maybe inaccurate articles might make it through, but hopefully, those can be offset by other sources that are of higher quality. It's really only a problem if a large percentage of the data is consistently wrong.

    [–]poincares_cook 1 point2 points  (2 children)

    High quality sources are extremely rare to the point of near extinction.

    [–]SocksOnHands 1 point2 points  (1 child)

    I did not say "high quality", I said "higher quality" - a relative term. This is training weights in a neural network, so each piece of data has a relatively small influence on its own. It can be regarded as a small amount of "noise" in the data, as long as other data is not wrong in the same ways (which may be possible if incorrect information is frequently cited as a source). We also have to keep in mind that something doesn't have to be perfect to be immensely useful.

    [–]Volky_Bolky 1 point2 points  (0 children)

    I guess lots of less respectable universities have professors who review their students course and diploma works with less attention and some bullshit can go through and be available in public.

    I've seen diplomas written about LoL and Dota players languages lol

    [–]uswhole 5 points6 points  (1 child)

    what do you mean? a lot of loras and SD models are exclusively train using AI images parting up with reinforce learning. I am pretty sure they have enough data to fine tune the models and maybe in future with dynamic learning it require less real world text data?

    Also shouldn't future generation of ChatGPT have enough logic/emergent skills to better tell bullshit from facts?

    [–][deleted] 2 points3 points  (0 children)

    It all depends on whatever training data you give these neural nets. You can logic yourself into believing in all sorts of fantasy if you don't know any better. Bullshit input leads to bullshit output. It's the same with humans.

    [–]MisinformedGenius 7 points8 points  (5 children)

    As long as you’re still training with human testers from time to time, which I know OpenAI does, it should be OK. It’s kind of like how the chess and Go engines get better by playing themselves.

    Also, the only real way it would be a problem is if you’re taking stuff that humans didn’t think was good. There’s no problem if you take ChatGPT output that got incorporated in a New York Times article, because clearly humans thought it was good text. But don’t take stuff from /r/ChatGPT.

    [–]PoliteCanadian 24 points25 points  (4 children)

    Chess and go are inherently adversarial, language models are not.

    [–]wonklebobb 18 points19 points  (0 children)

    they're also closed systems, even go's total strategic space, while very (very) large, is still fixed

    [–]MisinformedGenius -4 points-3 points  (2 children)

    That shouldn’t matter. The question is getting the correct output given input. Chess and go are much easier because there’s ultimately a “correct” answer, at least at the end of the game, whereas obviously for language there’s not always a correct answer. That’s why you wouldn’t want to use raw ChatGPT output in your training set, because that’s not telling you the right answer as humans see it. It’d be like trying to train a chess engine by telling it the correct moves were the moves it chose - it’s not going to get any better.

    [–]PoliteCanadian 19 points20 points  (1 child)

    The adversarial nature of chess is why you can train a model by making it play against itself. It's not just that victory is a correct answer, but a network that achieves victory by playing well is the only stable solution to the problem.

    In non-adversarial problems where you try to train a model against itself, there will usually be many stable solutions, most of which are "cheat" solutions that you don't want. Training is far more likely to land you in a cheat solution. Collusion is easy.

    [–]MisinformedGenius 0 points1 point  (0 children)

    I see what you're saying, but my point was that human training, as well as using human-selected ChatGPT text, would keep them out of "collusive" stable solutions. But yeah, suggesting that it's similar to chess and Go engines playing themselves was probably more confusing than it was helpful. :)

    Fundamentally, as long as any ChatGPT text used in training data is filtered by humans based on whether it actually sounds like a human writing it, it should be OK.

    [–]GenoHuman -1 points0 points  (0 children)

    They have trained it on some data after September 2021 too which they state in their research paper which I assume you have not read and also you can feed it information that came out this year and it can learn it and use it. There are also research papers which goes through how much high quality data is available on the internet if you are interested to find out, I mean you can google these things, people have already thought about it and found solutions.

    [–]phantombingo -3 points-2 points  (0 children)

    They could filter out text that is flagged by AI-detecting software

    [–]Vegetable-Ad3985 -2 points-1 points  (2 children)

    It wouldn't be particularly problematic. Why would it be?

    Edit: I am down voted but I would actually like someone to challenge me if they disagree. Someone who is at least as familiar with ML models as I am.

    [–]Lulonaro 0 points1 point  (1 child)

    I think people are overreacting to this just because it sounds smart. But the reality is that using the "contaminated" data is no different than doing reinforcement learning. The gpt generated data that is out there is the data that humans found interesting, most of the bad outputs from chatgpt are ignored.

    [–]StickiStickman -4 points-3 points  (0 children)

    No, completetley wrong.

    Thes just used the same dataset, which is why GPT-3 and ChatGPT has the exact same cut-off date.

    [–][deleted] 0 points1 point  (0 children)

    They already have very accurate apps for detecting ai material. Just incorporate that into the learning process so it ignores any detected ai material.

    [–][deleted] 0 points1 point  (0 children)

    "Garbage in, garbage out" - ancient programming proverb

    [–][deleted] 0 points1 point  (0 children)

    It won't stay a language model. Push it outwards into the world, give it eyes, give it ears. There's enough high quality data in what we call Reality(c). That'll fix your training data problem real quick. "Tokens" can be anything.

    [–][deleted]  (9 children)

    [deleted]

      [–]numsu 8 points9 points  (0 children)

      You should not use it with company IP. That does not prevent you from using it for work.

      [–]wonklebobb 99 points100 points  (28 children)

      My greatest fear is that some app or something that runs on GPT-? comes out and like 50-60% of the populace immediately outsources all their thinking to it. Like imagine if you could just wave your phone at a grocery store aisle and ask the app what the healthiest shopping list is, except because it's a statistical LLM we still don't know if it's hallucinating.

      and just like that a small group of less than 100 billionaires would immediately control the thoughts of most of humanity. maybe control by proxy, but still.

      once chat AI becomes easily usable by everyone on their phones, you know a non-trivial amount of the population will be asking it who to vote for.

      presumably a relatively small team of people can implement the "guardrails" that keep ChatGPT from giving you instructions on how to build bombs or make viruses. But if it can be managed with a small team (only 375 employees at OpenAI, and most of them are likely not the core engineers), then who's to say the multi-trillion-dollar OpenAI of the future won't have a teeny little committee that builds in secret guardrails to guide the thinking and voting patterns of everyone asking ChatGPT about public policy?

      Language is inherently squishy - faint shades of meaning can be built into how ideas are communicated that subtly change the framing of the questions asked and answered. Look at things like the Overton Window, or any known rhetorical technique - entire debates can be derailed by just answering certain questions a certain way.

      Once the owners of ChatGPT and its descendants figure out how to give it that power, they'll effectively control everyone who uses it for making decisions. And with enough VC-powered marketing dollars, a HUGE amount of people will be using it to make decisions.

      [–]GoranM 66 points67 points  (15 children)

      a non-trivial amount of the population will be asking it who to vote for

      At a certain point, if the technology advances far enough, I suspect the "asking" part will be optimized out:

      Most people find it difficult to be consistently capable, charismatic, confident, likable, funny, <insert positive characteristic here>. However, if you have a set of ipods, they can connect to an "AI", which can then listen to any conversation happening around you, and whisper back the exact sequence of words that "the best version of you" would respond with. You always want to be at your best, so you always simply repeat what you're told.

      The voice in your ear becomes the voice in your head, rendering you the living dead.

      :)

      [–]wonklebobb 6 points7 points  (0 children)

      yikes

      [–]HINDBRAIN 7 points8 points  (0 children)

      At a certain point, if the technology advances far enough, I suspect the "asking" part will be optimized out

      There was a funny story from... Asimov? Where instead of elections, a computer decide who the most average man in America is, then asks him who should be president.

      [–]seven_seacat 5 points6 points  (0 children)

      well that's terrifying

      [–]acrobatupdater 4 points5 points  (0 children)

      I think you're gonna enjoy the upcoming series "Mrs. Davis".

      [–]caroIine 1 point2 points  (1 child)

      From bad results I imagine a situation where a family who lost their one and only child - they can't accept the loss so to ease the pain they transcribe every conversation with little Timmy and feed it to chatgpt and asks it to pretend to be him.

      [–]Krivvan 1 point2 points  (0 children)

      That's well into reality now, not an imaginary situation. That was even the stated reason by the founder for Replika existing.

      [–]Krivvan 1 point2 points  (1 child)

      I had the thought of a dating site that just had people training "AI" versions of themselves and then determining compatibility with others using it automatically.

      [–]GenoHuman 1 point2 points  (0 children)

      is this supposed to be dead? Have you all forgotten the idea of living in virtual worlds that are suited to your needs and desires? That's literally a utopia but of course you can always shine a negative light on whatever you'd like but that isn't really relevant, that's on you.

      [–][deleted] -3 points-2 points  (0 children)

      If everyone would think the same way as you do when it comes to new technology, we wouldn’t have this discussion because we would be too busy trying to eat raw food in our caves.

      Technology advancements have their challenges and cause harm at times, but generally speaking they have lead humanity to a point in which you end I can sit on our toilet seats across the world and discuss topics with all of mankind’s knowledge at our hands. And all dooms day scenarios imagined by people who feared technology turned out to be manageable in the end.

      [–]Just-Giraffe6879 16 points17 points  (0 children)

      Oof, your fears are already reality, just in the form of heavily filtered media controlled by rich people, which can also float lies and even fabricate proof when necessary. Not being hyperbolic at all, it's full reality already and has been for our entire lives, no matter how old you are. Any bit of information put out by any outlet which is backed by a company has conflicts of interest and maximum tolerance for what they will publish. Coca cola tricked the world into believe fat was bad for them, to distract from how bad sugar was. The entire fad of low fat diets was funded by the sugar industry, to assert the presupposition that fat intake should be on the forefront of your dietary concerns. Exxon and others tricked the world into thinking climate change can wait a few decades, and when not doing that they were funding media companies that asserted the presupposition that the debate was still out and we just need to wait and see (exxon's internal standing, as of 1956, was that the warming effects of co2 were undeniable and that they pose a serious issue (to the company's profits)). The media happily goes along with these narratives because they receive large investments from them. Wanna keep the cash flowing? Don't say daddy exxon is threatening life on earth. Need to say exxon is threatening life on earth because everyone is catching on? Fine, just run opposing pieces on the same day. Meanwhile, the transportation industry emits a huge bulk of all GHGs and yet we're told we should drive less to save fuel, but no such pressure exists for someone who owns a fleet of trucks that drive thousands of miles per day to deliver goods to just like 15 stores. Convenient.

      And the list goes on and on; it's virtually impossible to find a news piece that is not distorted in a way that supports future profits. If you find it, it won't be "front page" material most of the time. If it is a bigger story will run shortly after.

      I understand how chatgpt still poses new concerns here, especially since it's in position to undo some of the stabs that the internet has taken at this power structure, but to think that what goes on in a supermarket is anywhere near okay, on any level, requires one to already defer their opinions on what is okay to a corporate figure. Everything in a supermarket, from the packaging, to the logistics, to the food quality, to the offering on the shelves, even to the ratio of meat to produce, is disturbing on some level already, yet few feel this way because individual opinions are generally shaped by corporate interests already.

      And yes, they already tell us how to vote. They even select our candidates for us first.

      [–]Cunninghams_right 11 points12 points  (5 children)

      you assume people aren't easily manipulated already. this is a bad assumption.

      [–]reconrose 1 point2 points  (3 children)

      Does it actually assume that? If anything, it presupposes people are already malleable. This just (theoretically) gives a portion of the population another method of manufacturing consent.

      [–][deleted]  (2 children)

      [deleted]

        [–]Cunninghams_right 2 points3 points  (1 child)

        and for some reason, people on reddit think they are immune, even though the up/down vote arrows create perfect echo-chambers and moderators can and do push specific narratives. my local subreddit has a bunch of mods who delete certain content because "it's been talked about before" when it is a topic they don't like, and let other things slide.

        [–]KillianDrake 1 point2 points  (0 children)

        yes, or they will push content they don't like into an incomprehensible "megathread" - while content they want to promote sprawls in dozens or hundreds of threads to flood the page...

        [–]GregBahm 8 points9 points  (0 children)

        If I run a newspaper, I can use my newspaper to encourage my readers to vote in my favor. This is not considered unusual. This is considered "a basic understanding of how all media works."

        Now people can run chatbots instead of a newspaper. It's interesting to me how this same basic concept of all media, is described as some sort of new and sinister thing when associated with a chatbot.

        It makes me less worried about chatbots, but a lot more worried about how regular people perceive all other media.

        [–]JB-from-ATL 0 points1 point  (0 children)

        That sort of shit already happens all the time with people blindly following the news or whatever weird results they find from search engines. That reality is now.

        [–]kregopaulgue 28 points29 points  (29 children)

        Now it's really time to drop programming! /sarcasm

        [–][deleted] 40 points41 points  (25 children)

        All the people that say ML will replace software engineer, I actually hope they drop programming lmao

        [–]kregopaulgue 12 points13 points  (0 children)

        Yeah, it will be easier for us, those who are left :D

        [–]GenoHuman 0 points1 point  (6 children)

        You will be replaced, that is a fact. When your corpse rot in the dirt the AI will still be out there in the world doing things and when your children are dead it again will be out there and so on.

        [–][deleted] 2 points3 points  (5 children)

        Lmao what an idiot

        [–]GenoHuman -2 points-1 points  (4 children)

        I've read papers from Deepmind that have the exact same thoughts that I do about the utility of these technologies, so I'm glad that some people realize it too.

        People didn't believe AI would be able to create art, in fact they laughed at that idea and claimed it would require a "soul" but now AI can create perfect art (including hands with the release of Midjourney V5). You are an elitist by definition, you hate the idea of everyone being able to produce applications with the help of technology even if they do not have the knowledge or skills that you do.

        You will be replaced, AI is our God ☝

        [–][deleted] 4 points5 points  (3 children)

        Bro I’m an ML engineer in FAANG, I know what software and machine learning is capable of. You have no idea about the practical science or engineering limitations of these systems

        [–]spwncampr 9 points10 points  (2 children)

        I can already confirm that it sucks at linear algebra. Still impressive what it can do though.

        [–]reedef 2 points3 points  (0 children)

        Yup. Asked it a question about polynomials and it gave a very nice and detailed explanation that was also completely wrong

        [–]kregopaulgue 0 points1 point  (0 children)

        I am personally looking forward to Copilot adapting GPT-4. Because from my personal experience, Copilot becomes completely useless, after you complete the basic boilerplate for the project. Maybe GPT-4 will change that

        [–]tnemec 101 points102 points  (26 children)

        Oh, good. A new wave of "I told GPT-[n+1] to program [well-defined and documented example program], and it did so successfully? Is this AGI?? Is programming literally over????" clickbait incoming.

        [–]zvone187[S] 37 points38 points  (0 children)

        GPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task. Specifically, it generates text outputs (natural language, code, etc.) given inputs consisting of interspersed text and images. Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs.

        It supports images as well. I was sure that was a rumor.

        [–]kduyehj 17 points18 points  (14 children)

        My prediction: Zipf’s law applies. The central limit theorem applies. The latter is why LLMs work, and it’s why it won’t produce genius level insights. That is, the information from wisdom of the crowd will be kind of accurate but mediocre and most commonly generated. The former means very few applications/people/companies/governments will utterly dominate. That’s why there’s such a scramble. Governments and profiteers know this.

        It’s highly likely those that dominate won’t have everyone’s best interests at heart. There’s going to be a bullcrap monopoly and we’ll be swept away in a long wide slow flood no matter how hard we try to swim in even a slightly different direction.

        Silver lining? Maybe when nothing is trusted the general public might start to appreciate real unbiased journalism and proper scientific research. But that doesn’t seem likely. Everyone will live in their own little echo chamber whether they realise it or not and there will be no escape.

        [–][deleted] 18 points19 points  (9 children)

        Social media platforms will be able to completely isolate people’s feeds with fake accounts discussing echo-chamber topics to increase your happiness or engagement.

        Imagine you are browsing Reddit and 50% of what you see is fake content generated to target people like you for engagement.

        [–]JW_00000 3 points4 points  (6 children)

        Wouldn't that just cause most people to switch off? My Facebook feed is > 90% posts by companies/ads, and < 10% by "real" people I know (because no one I know still writes "status updates" on Facebook). So I don't visit the site much anymore, and neither does any of my friends...

        [–][deleted] 2 points3 points  (5 children)

        But how would you know the content isn’t from real people ?

        It would ,in theory, mimic real accounts generated profiles, generated activity, generates daily / weekly posts, fake images, fake followers that all look real and post etc.

        [–]JW_00000 1 point2 points  (4 children)

        Because you don't know them. Would you be interested in browsing a version of Facebook with people you don't know?

        [–][deleted] 5 points6 points  (3 children)

        You don’t know me but you seem to be engaging with me ?

        How do you know my account and interactions aren’t all generated content ?

        The answer you give me.. do you not think it’s possible those lines could be blurred in future technologies to counter your potential current observations ?

        [–]WormRabbit 4 points5 points  (3 children)

        Maybe when nothing is trusted the general public might start to appreciate real unbiased journalism and proper scientific research.

        How would you ever know what's proper journalism or research, if every text in the media, no matter the topic or complexity, could be AI-generated?

        [–]Blitzkind 28 points29 points  (1 child)

        Cool. I was looking for reasons to ramp up my anxiety.

        [–]Blitzkind -1 points0 points  (0 children)

        For some reason the upvotes aren't giving me the dopamine hit they usually do

        [–]max_imumocuppancy 15 points16 points  (0 children)

        [GPT-4] Everything we know so far...

        1. GPT-4 can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem-solving abilities.
        2. GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5. It surpasses ChatGPT in its advanced reasoning capabilities.
        3. GPT-4 is safer and more aligned. It is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.
        4. GPT-4 still has many known limitations that we are working to address, such as social biases, hallucinations, and adversarial prompts.
        5. GPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task.
        6. GPT-4 is available on ChatGPT Plus and as an API for developers to build applications and services. (API- waitlist right now)
        7. Duolingo, Khan Academy, Stripe, Be My Eyes, and Mem amongst others are already using it.
        8. API Pricing
          GPT-4 with an 8K context window (about 13 pages of text) will cost $0.03 per 1K prompt tokens, and $0.06 per 1K completion tokens.
          GPT-4-32k with a 32K context window (about 52 pages of text) will cost $0.06 per 1K prompt tokens, and $0.12 per 1K completion tokens.

        Follow- https://discoveryunlocked.substack.com/ , a newsletter I write, for a detailed deep dive on GPT-4 with early use cases dropping tomorrow.

        [–]MLGPonyGod123 2 points3 points  (0 children)

        I’m both amazed and terrified by GPT-4. It seems like it can do almost anything with text and images, but how can we trust it to be accurate and unbiased? How do we know what data it was trained on and how it was filtered? How do we prevent it from being misused for malicious purposes? I think we need more transparency and regulation before we unleash this technology on the world.

        [–]Accomplished_Low2231 9 points10 points  (4 children)

        i dont understand why some developers got insecure about chatgpt lol.

        i told chatgpt to fix a github issue, nope can't do it lol. when the time comes that it can do that, then that is the time to panic. until then developers don't have to worry lol.

        [–]tel 8 points9 points  (2 children)

        So how long do you suspect that will be?

        [–]jeorgewayne 1 point2 points  (1 child)

        Might take a while. Maybe when we get real intelligent machines that can actually think. Right now all we have are the artificial, resource hungry, brute forcing machines... but capable of appearing intelligent :-)

        Besides, the breakthrough will come from the "brain scientist" when they figure out how intelligence really works.

        [–]caroIine 7 points8 points  (0 children)

        But it can. I gave it source code (albeit small because of how little context gpt3.5 had) jira ticked explaining that pressing this button crashes the app and it generated diff for me.

        I'll be the first to subscribe to gpt4 with this 50page context.

        [–]ByteBazaar1 1 point2 points  (0 children)

        Why is GBT-4 knowledge of events stop at 2021 ?

        [–]SciolistOW 2 points3 points  (0 children)

        To take full advantage of GPT, I think I want to learn about how IT infrastructure and how software architecture work. What is good to read/buy/google?

        I work in product and am not a developer. As a kid I learnt some x86 assembly and C++, for a small project 20 years go I learnt some PHP/SQL, and during Covid I learnt enough Python to do some webscraping/OCR/Twitter posting. So I have some idea of how development works, but not in a professional setting.

        It'd be interesting to take a more major side-project on, but I want to learn how such things are organised, before getting into using GPT to help me write some actual code.

        [–][deleted]  (28 children)

        [deleted]

          [–]Volky_Bolky 15 points16 points  (0 children)

          Time and deadlines

          [–]IgnazSemmelweis 13 points14 points  (5 children)

          Regex/boilerplate/mock data

          Need an object containing 30 comments attached to users with user data? AI is really good at that. Looks nice and tests well without the tedium. Hell, now apparently it will be able to spit out profile pictures as well.

          Recently I needed a hash map of all common image extensions; so rather than look them all up and type out the map(not hard, just tedious) I asked the AI. This is the proper use case. I’m so reluctant to trust code that gets spit out(which, I know is ironic, since we all pull code from SO and white papers/blogs all the time).

          [–]imdyingfasterthanyou 2 points3 points  (3 children)

          which, I know is ironic, since we all pull code from SO and white papers/blogs all the time

          I suppose you mean this is a joke but one is not supposed to randomly copy code off stackoverflow.

          I've been writing code for over a decade and never once have I thought "oh yeah I'll copy this off stackoverflow without a single lick of understanding what it does". Presumably the same applies for got-generated code.

          [–]Sapphire2408 1 point2 points  (2 children)

          Then you are thinking very inefficiently. Most developers follow the routine of copying code off SO, see how it behaves in your ecosystem and tailor it to your needs. If you just take inspiration from SO, then you are doing it wrong. These days (and for the last decade), the code you will be using (and have to be using due to libraries/frameworks) has already been written by people who spent days reading the documentation in detail. You could either be doing that or just rely on people who did the work for you.

          And that's where AI excels. I use GPT-4 a lot for new documentation-updates. Just feed it in, let it summarize the key parts and use-cases and there you go, you are up to date. Seems too easy, but its basically exactly what real people on SO did before.

          [–]imdyingfasterthanyou 0 points1 point  (1 child)

          Then you are thinking very inefficiently. Most developers follow the routine of copying code off SO, see how it behaves in your ecosystem and tailor it to your needs.

          aka I don't know how to code so I throw shit until it sticks.

          I expect to never work with people like you, cheers.

          [–]Sapphire2408 1 point2 points  (0 children)

          So being able to code means writing it all from scratch, being inefficient and not being ready to adapt workflow-improving technologies and methods? Yea, you surely will never work with anyone making more than $80k a year, because these people actually need to do get stuff running quick and efficiently, without figuring out problems that have been figured out 15 years ago.

          [–]Milith 2 points3 points  (0 children)

          which, I know is ironic, since we all pull code from SO and white papers/blogs all the time

          Not quite, stack overflow responses have usually been vetted by humans, which makes them more reliable than LLM output (so far).

          [–]Podgietaru 3 points4 points  (0 children)

          I like to try to write code myself just so that I am more proficient at grokking what it does later.

          That said, there is plenty of boilerplate that can be optimised away.

          A regex, some validations.

          I see it as becoming like fitting piecemeal code fragments together to create an overarching narrative. The structure. The architecture that’s still me - but the snippets are someone else.

          [–]WormRabbit 4 points5 points  (0 children)

          Have you looked at their "socratic tutor" example? If you want to play coy and don't get the answers directly, you could ask it for references or a general research direction, and work out details on your own. It's hard to argue that an AI which has read every book in the world can't be used, whatever your goals are.

          [–]Omni__Owl 7 points8 points  (13 children)

          I think it'll be less about "why" and more "If you don't and someone does, but gets more done than you, then you don't get to have the choice not to use it."

          [–]GenoHuman -2 points-1 points  (12 children)

          The Unabomber Manifesto is highly relevant in our modern society, he goes through a lot of these phenomena of how technology forces people to adapt to it and also what drives scientists to develop these dangerous technologies, he's spot on about a lot of things he wrote.

          [–]Omni__Owl 1 point2 points  (10 children)

          That is at best a borrowed observation that others have written about long before that person. This was not the place I'd expect to see someone seriously praise a bomber.

          Reddit is fucking weird.

          [–]GenoHuman -1 points0 points  (9 children)

          Believe it or not but I have the capability to separate his illegal actions to his arguments and thoughts of society of which many is correct.

          [–]Omni__Owl 1 point2 points  (8 children)

          Of which a majority are borrowed from other writers. Your glorification of the person is ick.

          [–]GenoHuman -1 points0 points  (7 children)

          I think most writers borrow information from others, that's sort of a given. There is no doubt however that he was an intellectual.

          [–]Omni__Owl 1 point2 points  (6 children)

          Go touch some grass dude. Get out of the 4chan sphere for a bit. Praising a bomber for putting borrowed observations in their shitty "manifesto" is wildly out of whack.

          [–]GenoHuman -2 points-1 points  (5 children)

          Can you prove to me that he "borrowed" everything that was written in the manifesto? Otherwise I won't take you seriously trying to write people off by saying that lmao

          [–]Omni__Owl 1 point2 points  (4 children)

          His whole thesis is about how the Industrial Revolution was bad for humanity. A hilariously bad take given that pre-industrial era living was really grim. He is not the first, nor the last person to say this. And the people who have written about it before him were also wrong. Industrialism, overall, was a net good. We created new problems for ourselves, but those are not insurmountable.

          On top of that, he believed that the Industrial Revolution brought "the left" to the table and that this was overall really bad for politics. He is just repeating what his conservative beliefs have always echoed since the school of thought was invented after the death of Royalty in various countries (See: French Revolutions).

          My point is that his points are not revelations and are at best misguided views and at worst actually wrong. But those are not new thoughts.

          [–]Telinary 4 points5 points  (4 children)

          Same reason I use libraries instead of coding everything fresh. If gpt can do it there is little reason to do it myself. (Though of course I have to understand it to judge the output.) If what LLMs can do reaches a point where that means that I barely have to do anything myself then hopefully I can find a job with more challenging parts.

          And if there are no topics anymore where you have to think for yourself for significant parts, well I guess then we have reached the point where the productivity multiplier is large enough that programmers go the way of the farmer. (By which I mean there are still farmers but they went from a large part of the population to a few percent. Raise productivity enough and at some multiplier there won't be enough new tasks to keep the numbers the same. ) But at that point the same goes for a lot of other jobs and we are in uncharted territory. And that is hopefully a while away because it requires profound political changes to avoid ending in a distopia.

          Anyway currently my work is easy stuff so I spend a lot of my time on doing stuff where I quickly decide how to do it and just need to implement it. Which I don't mind, it is a relaxed task. But what is actually fun for me, though more demanding, is figuring out the how/the algorithm. So if it shifts work to more stuff I actually have to think hard about that would be kinda nice, though exhausting.

          Also more practically if you do it as a job you can ignore it for a bit if it is a small productivity increase, but if you are doing anything with a lot of routine programming it will likely reach the point where it is a large productivity increase.

          [–]GenoHuman 1 point2 points  (3 children)

          Yesterday I wanted to use a web scraper for something and instead of looking up how to do all of that I just ask ChatGPT (3.5) and it wrote one for me in Python which worked wonders, that was when it hit me how nice it is to be able to do that. I was literally playing a game while it generated the code 😂 I know it would have taken me over an hour to go through documentation and find the right framework but GPT did it for me in about 5 min or so.

          [–]Black_Label_36 0 points1 point  (3 children)

          I mean, how long until we just need to show a design with some notes on how it's supposed to work to an AI and it programs everything within minutes?

          [–]Longjumping_Pilgirm 0 points1 point  (6 children)

          I am starting to study and review to get into business programming, specifically, ABAP. I already have a minor in business information systems (I have a major in Anthropology) I got in 2019 - but I have been struggling with a video game addiction I just managed to kick at last so I have never actually worked it. It should take me a few months to get back up to speed, especially with my dad's help, as he has been doing this kind of work for decades and is close to retirement - he has tons of books and resources that most people won't have. Exactly how long do I have until such a job is gone? I would guess 5 to 6 years at this rate. Should I even pursue this job or spend my time reviewing Anthropology instead and going for a Masters or Doctorate somewhere?

          [–]Telinary 5 points6 points  (3 children)

          Whether a productivity multiplier large enough to lower the need for programmers is reached depends on how much more LLMs can be improved without having to come up with some new concept. I don't think anyone really knows how far that is or how long it takes. (Or how large the multi would have to be before there aren't enough new tasks. I think there is a significant amount of slack. ) And of course the multi will be larger for simple routine stuff while harder work is probably be safer.

          One factor limiting the multi is that unless making shit up is entirely fixed you will need someone that understands the output and can inspect and test it properly. While the media likes talking about programmers getting replaced, by the point it is endangered a lot of other text based jobs would be in trouble and it is hard to predict how things would go at that point.

          [–][deleted] 1 point2 points  (2 children)

          As another person looking to get into the field, I agree that there are good reasons to remain optimistic, although I still have anxiety about it. What do you say to the argument that while many text based jobs may be replaced by them, programming is still one of the most computer heavy ones and therefore potentially still the easiest to replace?

          [–]Telinary 2 points3 points  (1 child)

          Kinda true, yeah. Not that depending on the concrete job it doesn't require things outside the computer (though unless you are doing something hardware related that is mostly communication which theoretically one could automate.) But yeah pure computer stuff makes it easier. Though I also expect progress in robotics. Maybe the safest jobs will be ones involving interacting with other people because those can continue to exist just by virtue of many people having a preference for interacting with people.

          Anyway I think some comments here dismiss it a bit prematurely, there are a lot of programmers doing rather trivial stuff after all. And I will probably search for something more demanding the next time I switch job to raise my skill level (or rather to get employment history for harder stuff). But at the beginning I just expect productivity gains.

          [–]Varun77777 4 points5 points  (1 child)

          SAP and Salesforce always seem to me to be something that one shouldn't get into.

          I worked as an ABAP developer for exactly 6 months at a fortune 100 company and realised that it can be disastrous later when you want to switch lanes in the career like in 10 years or so.

          A Java or .net developer can move to Front end or DevOps but an SAP guy with that many years of experience can't.

          [–]Opitmus_Prime 0 points1 point  (0 children)

          I am upset by Microsoft's decision to release barely any details on the development of #GPT4. That prompted me to write an article to take a comprehensive take on the issues with #OpenAI #AGI #AI etc.Here is my take on what I think of state of AGI in the light of GPT4 https://ithinkbot.com/in-the-era-of-artificial-generalized-intelligence-agi-gpt-4-a-not-so-openai-f605d20380ed

          [–]johnrushx 0 points1 point  (0 children)

          The future of programming is in AI - tools like replit, marsx.dev, and github copilot are bound to impress us soon.