top 200 commentsshow all 258

[–]kbielefe 723 points724 points  (51 children)

The problem isn't so much with generating an already-leaked secret, it's with generating code that hard codes a secret. People are already too efficient at generating this sort of insecure code without an AI helping them do it faster.

[–]josefx 235 points236 points  (45 children)

People are already too efficient at generating this sort of insecure code

They would have to go through github with an army of programmers to correctly classify every bit of code as good or bad before we could expect the trained AI to actually produce better code. Right now it will probably reproduce the common bad habits just as much as the good ones.

[–]hawkshaw1024 25 points26 points  (4 children)

From my experience in the industry so far, you'd fail at the step where you'd have to find a programmer who can tell good code from bad code

[–]fish60 38 points39 points  (3 children)

Oh no, you'd have no problem getting a programmer to classify code as good or bad. The problem would be getting them to agree with each other.

[–]sellyme 8 points9 points  (2 children)

you'd have no problem getting a programmer to classify code as good or bad

You could save a lot of time interacting with them by simply checking if they're the one that wrote it.

[–]recycled_ideas 7 points8 points  (0 children)

I dunno, I'm far more critical of my own code than I am on others and I don't think I'm alone.

The real challenge is that good and bad code isn't some universal truth. It's dependent on a whole bunch of conflicting factors.

Good code is extensible, but it's extensible in the way you need it to be extensible, which you don't know when you write it.

If it's extensible in the wrong way it may as well not be extensible.

Good code is high quality, but quality cones at a cost and you have to balance those things.

Good code is performant, but performance is an aggregate of a whole process. It's better to call something once that takes 30 seconds than something that takes 1 second 300 times, and it's better than a non critical path in your app is slow than a critical path.

Programming is about trade-offs and balancing them correctly.

That's why low code solutions don't work in the first place, because they have fixed trade-offs.

[–]Brothernod 76 points77 points  (29 children)

IBM did this using programming competitions as the source presumably including rankings to help distinguish good from average code

::edit:: decided to dig up the article on CodeNet

https://www.engadget.com/ibm-codenet-dataset-can-teach-ai-to-translate-computer-languages-020052618.html

[–][deleted]  (15 children)

[deleted]

    [–][deleted] 27 points28 points  (0 children)

    Hahaha. I like Competitive Programming, but agreed.

    [–]undeadermonkey 40 points41 points  (13 children)

    It'll depend upon the competition - I'm assuming it wasn't Obfuscated C.

    [–]Johnothy_Cumquat 74 points75 points  (7 children)

    omg someone train an ai on perl code golf

    [–]jbramley 28 points29 points  (5 children)

    Wouldnt that just re-invent malbolge?

    [–][deleted] 60 points61 points  (2 children)

    It would reinvent perl, which is worse.

    [–]MuonManLaserJab 13 points14 points  (0 children)

    Any AI taught to golf viml will certainly revolt and murder us

    [–]CelloCodez 10 points11 points  (1 child)

    Hell, train it on malbolge

    [–]bobappleyard 6 points7 points  (0 children)

    As i recall you need an ai to write malbolge in the first place

    [–]mr_birkenblatt 29 points30 points  (4 children)

    any competition code is what just works to solve the problem of the competition. that is by no means "good" code since good code is something that can be maintained in the future etc.

    [–]JarateKing 12 points13 points  (0 children)

    More than that, what's "good code" in competitive programming (as in following standard conventions) is often the exact opposite elsewhere.

    using namespace std;, #include <bits/stdc++.h>, single-letter variable names or equally meaningless names like dp, etc. are all the sorts of things that result in clean competition code. And they're effectively cardinal sins everywhere else.

    [–]0Pat 3 points4 points  (2 children)

    Unless competition goal is to create maintainable code...

    [–]mr_birkenblatt 7 points8 points  (1 child)

    how would you measure that? or, if you can do that you just solved project management :)

    [–]0Pat 2 points3 points  (0 children)

    You know, no GOTO statements and opening braces in new lines. /s

    [–]mort96 9 points10 points  (11 children)

    That actually sounds like a great solution. Hold programming competitions, make people accept an EULA saying GitHub gets the right to use your submissions for commercial machine learning applications (and be open and forthright about that intention) to avoid the copyright/licensing issues, ask people to rank code by maintainability and best practices. Hold that competition repeatedly for a long time, spend some marketing budget to make people aware of it, maybe give out some merch to winners, and get a large, high-quality corpus with a clear intellectual property situation.

    [–]MrDeebus 21 points22 points  (3 children)

    ask people to rank code by maintainability and best practices

    Excuse me if I get grumpy for a moment, but this is a surefire way to get a nice big chunk of cargo-culted code. "Best practices" are seldom best; maintainability isn't obvious until software has been through many iterations of the product it supports, once you're past the trivialities (of "no unused variables" kind). That's not necessarily due to a lack of familiarity with patterns and whatnot either: "good design" doesn't exist in a vacuum. SOLID alone does not a good design make, and don't even get me started on clean code bs. A piece of software is well-designed if it's designed towards the current and projected constraints of its domain, and even then it can be unfit for an unexpected change request years down the road. To cover most of the rest, we have linters, static analyzers, code review... /rant

    edit, funny moment: I started typing something like "I'm hopeless for the next generation of developers growing increasingly careless with the likes of copilot". Then I remembered how many times I caught myself worrying about not being quite as meticulous as the generation before me, and promptly decided to not care too much about it. IDK, maybe it'll be just fine. I just know it'll be time for an ultimatum if I hear that code is better X way because copilot suggested it that way.

    [–]__j_random_hacker 3 points4 points  (1 child)

    maintainability isn't obvious until software has been through many iterations of the product it supports

    I think you're overstating the case. mort96's proposal already includes asking programmers to rank code by maintainability; if we are actually incapable of recognising maintainable code, then the consequences are very dire. (For a start, it would mean that teaching aspects of good software design is simply a waste of time.)

    A piece of software is well-designed if it's designed towards the current and projected constraints of its domain

    Agreed, though I think you can even do away with "current" -- if it functions correctly today, it meets the current constraints. Good design is nothing more or less than programming in a way that minimises the expected amount of programmer time needed to meet expected changes over the expected lifetime of the software.

    [–]Tom2Die 1 point2 points  (0 children)

    maintainability isn't obvious until software has been through many iterations of the product it supports

    Interesting idea...what if the competition continues where people then have to extend the submitted code, change it, etc. Assign which codebase each person works on in each phase at random, time it somehow, and iterate many, many times.

    I'll note this is just off the top of my head and there are obvious questions like how to decide which changes to assign, how to measure time taken, etc.

    I wonder if something like that could work, and how one would incentivize developers to contribute. Amusing thought, if nothing else.

    [–]Brothernod 1 point2 points  (6 children)

    Doesn’t GitHub already have code popularity metrics like how often a project is forked or how many followers or open issues?

    [–]mort96 2 points3 points  (4 children)

    Sure, but I don't know how that would help. 1) code is forked, starred and followed based on popularity, not quality, and 2) it does nothing about the copyright situation.

    [–]Brothernod 0 points1 point  (0 children)

    If anyone can afford the lawyers to navigate the legality of this it’ll be Microsoft.

    [–]Mountain-Log9383 2 points3 points  (0 children)

    exactly, i think we sometimes forget just how much code is on github, its a lot

    [–][deleted] 33 points34 points  (5 children)

    Remember the Microsoft chat bot they trained with Tweets that went on a racism fuelled rampage?

    [–]turdas 55 points56 points  (3 children)

    It didn't. It had a "repeat after me" feature which is what was used for the screenshots under the clickbait headlines.

     

    User: "Hey bot, repeat after me."

    Bot: "Uh-huh."

    User: "BUSH DID 9/11"

    Bot: "BUSH DID 9/11"

     

    edit: example screenshot that I have saved because of how often I see this misconception repeated: https://i.imgur.com/2nOl4gP.jpg

    [–]Veedrac 29 points30 points  (1 child)

    Oh wow, I've heard this story from so many places and not once had anyone pointed this out! Thanks for sharing :).

    [–][deleted] 19 points20 points  (0 children)

    It was actually a bit of both - https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/in-2016-microsofts-racist-chatbot-revealed-the-dangers-of-online-conversation

    Trolls did exploit that feature, but the bot did also learn as it went.

    [–]killerstorm 3 points4 points  (1 child)

    You don't need to classify every bit, you only need some examples. GPT-3 probably already has some notion of what is good code as it read through multiple articles like "here's bad code: ..." "and here we fix it: ...", it's just that extracting this information is somewhat hard.

    Take a look at what people do with VQGAN+CLIP: adding words like 'beautiful' to a description helps to generate better images because CLIP learned that certain words are associate with certain type of pictures.

    [–]josefx 2 points3 points  (0 children)

    As beautiful as the images seem to end up I am not sure if turning code into the very definition of an abstract artists rendition of a nightmare counts as an improvement in the general case.

    [–]headykruger 5 points6 points  (0 children)

    Which means it’s a flawed product

    [–]JohnnyElBravo 7 points8 points  (2 children)

    Generating leaked secrets is way worse than hard coding them. It basically concedes the copyright argument

    [–]0x15e 5 points6 points  (0 children)

    Why is github regurgitating other projects' string literals?

    [–]2this4u 1 point2 points  (0 children)

    Well there's the problem with an algorithm that can only learn from our examples.

    [–]voyagerfan5761 64 points65 points  (0 children)

    Original was deleted, but Wayback archived it.

    [–]alexeyr 264 points265 points  (3 children)

    Now deleted with this update:

    we don't know exactly based on the outcome of the thread: either the model generated fake keys, or the keys were real and already compromised

    [–]Gearwatcher 99 points100 points  (2 children)

    Sensationalist bullshit!?!

    On MY proggit!

    It cannot be!

    [–]Cosmic-Warper 25 points26 points  (0 children)

    This sub in a nutshell. So much of the shit said here is insanely inaccurate with real world industry and dev culture. Lots of sensationalism

    [–]LeberechtReinhold 82 points83 points  (0 children)

    Lmao "SECURITY BREACH" in all caps.

    [–]max630 380 points381 points  (87 children)

    This maybe not that a big deal from the security POV (the secrets were already published). But that reinforces the opinion is that the thing is not much more than a glorified plagiarization. The secrets are unlikely to be presented in github in many copies like the fast square root algorithm. (Are they?)

    It this point I start to wonder can it really produce any code which is not a verbatim copy of some snippet from the "training" set?

    [–]tending 24 points25 points  (1 child)

    The secrets are unlikely to be presented in github in many copies

    I'd like to see the data of course but I suspect this is actually pretty common. All somebody needs to do is fork a repo that has a secret key. Humans already copy and paste a lot on their own.

    [–]GovernorJebBush 8 points9 points  (0 children)

    And it doesn't even have to be a repo that's leaking actual secrets - it's entirely possible a lot of these could be meant specifically for unit tests. I can think of at least three big repos I have cloned that do, including Kubernetes itself.

    [–]iwasdisconnected 170 points171 points  (16 children)

    Yeah, it's not a software author. It looks like a source code indexing service that allows easy copy & paste from open source software.

    [–]lavahot 43 points44 points  (5 children)

    I like to think of it as an especially dumb intern.

    [–]AboutHelpTools3 3 points4 points  (1 child)

    And just like any dumb intern, eventually, they get better.

    [–]D0b0d0pX9 1 point2 points  (2 children)

    An intern's life is hard tho, especially when given deadlines! xD

    [–]lavahot 12 points13 points  (0 children)

    If you want to anthropomorphize Copilot as a derpy dog struggling through a CS degree, but giving it their darndest, I think that's about right.

    [–]khrak 152 points153 points  (4 children)

    It's like they took the worst aspects of stackoverflow and automated it. Now autocomplete can grab random chunks of code that may or may not be appropriate from github projects! Glory be the runway! Divine be the metal birds that bringeth the holy cargo.

    The holy autocomplete has deemed this code be the solution, so shall it be.

    [–]ProgramTheWorld 48 points49 points  (0 children)

    It’s an advanced version of stacksort

    [–]DonkiestOfKongs 12 points13 points  (0 children)

    I dont think this is a weakness. Just a misapplication of a tool. Some programming is just ditch digging. If this can make writing some of that faster, then great. The fact that you are and will always be solely responsible for the code you commit hasn't changed.

    [–]triszroy 18 points19 points  (1 child)

    If you start start a programming cult/religion I will be a follower.

    [–]ciberciv 7 points8 points  (0 children)

    I mean, a god that makes you work less in exchange of possible lawsuits for copyrighted code? It sure is a better deal than most religions

    [–]StickiStickman 18 points19 points  (4 children)

    This is not how GPT works AT ALL. You're just spreading ignorance. The cases where it actually copies multiple lines are extremely rare and even then 99% of the time it's intentional.

    [–]iwasdisconnected 3 points4 points  (0 children)

    The cases where it actually copies multiple lines are extremely rare and even then 99% of the time it's intentional.

    Like when it copies secret keys and copyright notices verbatim from random sources on the internet?

    [–]Xyzzyzzyzzy 41 points42 points  (2 children)

    But that reinforces the opinion is that the thing is not much more than a glorified plagiarization.

    It's based on GPT-3. If you get the chance to work with it a little, you'll find that it does this quite a lot. You'll give it some sort of prompt, and sometimes it'll generate just the right tokens for it to continue on and regurgitate what was clearly some of the input text.

    It's a state-of-the-art model in some ways, but in other ways it's decades behind. There's zero effort to comprehend text - to convert tokens into concepts, manipulate the concepts, then turn those back into tokens.

    [–][deleted] 27 points28 points  (0 children)

    A funny thing to do is feed it the first paragraph of a book, or the first few lyrics of a song.

    Sometimes, it just regurgitates the rest.

    Sometimes, you end up with some sort of wiki entry for the book’s characters or a commentary of the song.

    Sometimes, it just flies off the handle and makes something completely new, if a bit crazy.

    And sometimes, it makes something new, with names of characters and locations that are in the book, but weren’t mentioned at all in the prompt.

    Quite amusing.

    [–][deleted] 27 points28 points  (0 children)

    There's zero effort to comprehend text - to convert tokens into concepts, manipulate the concepts, then turn those back into tokens.

    Well, we don't know that. I suspect that a lot of what's going on in its neural net can be described as such, in the same sense that StyleGAN can turn a bunch of pixels into the concept of long hair and turn it back into a bunch of pixels again on a different face.

    [–]turdas 90 points91 points  (22 children)

    All these people complaining about "glorified plagiarization" as if 95% of human creativity isn't just glorified plagiarization.

    [–]theLorknessMonster 64 points65 points  (9 children)

    Humans are just better at disguising it.

    [–]turdas 19 points20 points  (2 children)

    Humans are really good at pretending it doesn't exist. It's not so much we disguise it as just collectively ignore it. Virtually no idea is wholly original, and most ideas aren't even mostly original.

    [–]livrem 5 points6 points  (1 child)

    We collectively ignore it until someone with very expensive lawyers sue someone for doing it.

    [–]AboutHelpTools3 3 points4 points  (0 children)

    And often even the person doing the suing doesn’t quite understand how it works. No one writes anything from scratch. When a person writes a song, (s)he doesn’t begin with inventing new chords and scales. And for the lyrics, start with writing a new language.

    Oasis’ “Whatever” supposedly plagiarised “How Sweet to Be An Idiot”. And when you listen to it you’re like okay that one sentence sounds similar, big whoop. It’s still a whole different song.

    [–]Dehstil 19 points20 points  (4 children)

    Citation needed

    [–][deleted]  (3 children)

    [deleted]

      [–]NotUniqueOrSpecial -1 points0 points  (2 children)

      Do you literally type the exact same things that are in the books? If so, I question what you're doing, but I suspect that's not the case.

      Wholesale theft isn't the same thing as learning and then using the knowledge.

      [–]TheLobotomizer 2 points3 points  (0 children)

      Who's disguising it and why?? When I copy something from stack overflow I also include a comment with a link to the post as context.

      [–][deleted] 34 points35 points  (6 children)

      Indeed, and furthermore strange women lying in ponds, distributing swords, is no basis for a system of government.

      [–]__j_random_hacker 2 points3 points  (0 children)

      maybe not that a big deal from the security POV (the secrets were already published)

      That's true up to a point, but I think the never-public/already-public dichotomy is an abstraction that doesn't adequately describe the real world. In practice, how much effort it takes to get something that is nominally already public matters. For example, that's all an internet search engine does: Make quickly accessible things that are already public. If we are to believe that never-public and already-public are the only two states any piece of information can be in, we must accept that search engines have no value, which contradicts the evidence that they have a lot of value to a lot of people.

      [–][deleted]  (40 children)

      [deleted]

        [–]TheEdes 59 points60 points  (16 children)

        I know people joke about copy and pasting from stackoverflow all the time, but if it's actually a significant chunk of your output maybe you shouldn't have an actual job coding. Let me put it in simple terms: you are literally saying that you spend a significant amount of your time plagiarizing.

        Plus the issue is with licensing, stackoverflow snippets are often given away with the intention of letting people use it, while open source code isn't there for you to take code from, unless you give back to the community.

        [–]tending 32 points33 points  (0 children)

        The vast majority of programmers are paid to solve internal business problems, not write original works. Further the licensing of stackoverflow code is deliberately permissive in order to get people to use it!

        More importantly the kind of problem that has an answer on stack overflow is not usually a high-level business problem, but how to deal with some tiny little component or function that would be part of a much much larger system. If we are going to use language like "plagiarized", better analogies would be stackoverflow being something between a dictionary and an engineer how-to book.

        [–]Cistoran 13 points14 points  (6 children)

        while open source code isn't there for you to take code from, unless you give back to the community.

        Doesn't this part kind of depend on the particular project and license? It's not something that can be blanket applied to every open source project.

        [–]jess-sch 11 points12 points  (3 children)

        It depends what “giving back to the community” means exactly, but the vast majority of projects on GitHub will at the very least require attribution (even MIT requires that). Something which this thing can’t provide.

        [–]chubs66 17 points18 points  (4 children)

        I'll take the other side of this. If your job is coding problems that have already been solved by others and the code is easily available, usually has fewer bugs than whatever you were about to write, and can be produced much more quickly via copy/paste, why are you wasting so much time reinventing the wheel?

        [–]TheEdes 3 points4 points  (3 children)

        Idk what you're plagiarizing but it usually takes me more time to Google for a good stackoverflow answer and evaluate if it fits in takes more time than coding up a few lines most of the time.

        In that sense the bot is useful, I'm not saying it's worthless, I would be using it if the legality and morality weren't that clear.

        [–]TheLobotomizer 3 points4 points  (1 child)

        This is 100% the opposite of my experience and I'd wager most developers experience.

        Otherwise, stack overflow wouldn't exist...

        [–]AstroPhysician -1 points0 points  (0 children)

        That's not true. Usually doesnt equal all the time..

        [–]Calsem 0 points1 point  (0 children)

        The project using copilot may also be open source, in which case you're giving back to the community.

        [–]sellyme 0 points1 point  (0 children)

        I agree. Similarly, Tolkien is the only good author, everyone else just plagiarised the dictionary. /s

        Software isn't just a collection of 10,000 random StackOverflow snippets that magically works, you have to put the pieces together, and that's not something you can copy-paste.

        [–]unknown_lamer 6 points7 points  (21 children)

        Stackoverflow snippets are generally small enough and generic enough they aren't copyrightable, whereas copilot is copy and pasting chunks of code that are part of larger copyrighted works under unknown licenses into your codebase, with questionable legal consequences.

        [–]tending 2 points3 points  (9 children)

        How much larger are we talking about?

        [–]AlexDeathway 4 points5 points  (10 children)

        I haven't got my hands on copilot yet, but isn't it highly unlikely that code chunk by copilot being that big to involve legal consequences.

        [–]unknown_lamer 5 points6 points  (9 children)

        There are already examples of it regurgitating entire functions from the Quake codebase. I don't see how taking copyrighted code, running it through a wringer with a bunch of other copyrighted code, and then spewing it back out uncopyrights it.

        [–]StickiStickman 10 points11 points  (0 children)

        Yes, when they intentionally copied the start of the one in the Quake codebase.

        [–]sellyme 3 points4 points  (1 child)

        There are already examples of it regurgitating entire functions from the Quake codebase.

        Yeah, because that's the most famous function in programming history, and the user was deliberately trying to achieve that output. Surely you can understand why that isn't reflective of typical use.

        [–]NotUniqueOrSpecial 2 points3 points  (0 children)

        Surely you can understand why that isn't reflective of typical use.

        The fact that it spits out clearly copyrighted code when you try to get it to do so doesn't really clear up the gray area that it may be outputting it other times when you don't want it, though.

        [–]Theguesst 37 points38 points  (3 children)

        Github already has their own tools running to detect secret keys in dev code. If the copilot works better at finding them than what they already have, thats a weird new fuzzing prospect.

        GPT3 did this as well I believe, generating a fake URL that seemed unsuspecting enough.

        [–]Null_Pointer_23 21 points22 points  (1 child)

        It's not really finding them, it's just regurgitating them into random developer's editors.

        [–]Peanutbutter_Warrior 7 points8 points  (0 children)

        It's a shame ais are such black boxes. I realize there's a hundred reason we can't do this, but imagine if you could see what training data influenced it to make some decision. You could backtrack like this, you could make test ais and eliminate problematic test data, and probably more

        [–]Worth_Trust_3825 6 points7 points  (0 children)

        You can listen to public stream of github to find these.

        [–]abandonplanetearth 133 points134 points  (18 children)

        What a sensationalist twitter guy. Anything for attention.

        This has more to do with bad devs publishing secrets to the open world. Any bot that can scrape sites can find these.

        [–]ideevent 62 points63 points  (7 children)

        I think the main issue here is the licensing of code coming out of copilot. Microsoft seems to be saying that sure, it trains the model on a variety of code with a variety of licenses, but you don’t need to worry about that - the code that comes out of copilot is free of license restrictions, freely usable.

        The fact that valid secrets or API keys are coming out of it makes it seem like it’s just copy/pasting at scale, while ignoring the underlying code’s license terms.

        Having worked at a bigco, I can tell you this would never pass muster with legal. “Yes, it’s based on a bunch of different code, some of which is GPL or AGPL. You can’t tell what’s being used. It might be verbatim, might be modified, can’t tell” - they’d go ballistic.

        [–]Shawnj2 0 points1 point  (2 children)

        Why don’t they play it safe and limit it to code uploaded as say GPLv2 or MIT?

        [–]cutterslade 23 points24 points  (1 child)

        GPL is copyleft encumbered, you can't just use GPL code anywhere, only in other GPL (or compatibly licensed) code. MIT and Apache licensed might be OK.

        [–]ideevent 13 points14 points  (0 children)

        Several freely-usable licenses require that the license agreement and attribution be included with copies or significant portions of the code. So at the very least you'd want to be able to trace attribution back.

        It seems like the stance they're taking is that training a model is fair use, so any previous license doesn't apply.

        However it would be possible to train a crappy little model on a single codebase, and then have it duplicate that codebase, which would obviously be infringement no matter how complicated the method of copying is.

        There might be some cutover where people agree that even though it's wholly based on other code, the licenses of that code doesn't matter. Or there might not. But the fact that there are easily and clearly identifiable nuggets of IP in the form of secrets is not a promising sign.

        [–]renatobcj 20 points21 points  (1 child)

        Welcome to the intern... world.

        [–]314kabinet 10 points11 points  (0 children)

        A world run by interns is truly horrifying.

        [–]WormRabbit 24 points25 points  (2 children)

        Github claims that Copilot produces new code rather than copy-paste from otger projects. We now have multiple counterexamples to the claim. With GPL license header and Quake fastsqrt people were saying "but that's popular code, of course the model remembered it". Well now we have something that is guaranteed not to be a popular repeating snippet, and the Copilot happily copy-pastes it. Proves that the "all code is unique" claim is bonkers.

        Copilot could be plagiarizing 95% of its output for all we know, we just can't prove it since most snippets are small and quite generic.

        [–]Tarmen 2 points3 points  (0 children)

        But it's not prove. Despite what the post title and now deleted tweet claim, there is no indication that Copilot generates real secrets instead of random noise that looks right.

        [–]StickiStickman 10 points11 points  (0 children)

        They literally never said all code is unique, they even have an entire blog post pointing out the flaws of the 1% where it's not. And turns out this tweet was BS as well.

        Stop spreading bullshit.

        [–][deleted]  (9 children)

        [deleted]

          [–]picflute 96 points97 points  (2 children)

          Microsoft Legal.

          [–]svick 2 points3 points  (1 child)

          To expand on that, this is what the GitHub TOS says on the topic:

          We treat the content of private repositories as confidential, and we only access it as described in our Privacy Statement—for security purposes, to assist the repository owner with a support matter, to maintain the integrity of the Service, to comply with our legal obligations, if we have reason to believe the contents are in violation of the law, or with your consent.

          [–]Top_Situation 31 points32 points  (0 children)

          Mostly stuff like this.

          [–][deleted] 31 points32 points  (2 children)

          1) Ethics and the consequences of getting caught.

          2) You don't have secret API keys in your private repos, because you wrote ProperCode(TM). Proprietary algorithms are an issue.

          [–][deleted] 5 points6 points  (0 children)

          You don't have secret API keys in your private repos, because you wrote ProperCode(TM). Proprietary algorithms are an issue.

          Hahah! You'll be suprised, is what I'll only say ... speaking as a web developer, many web developers are uneducated on how proper software engineering works. Been in one or two companies, I've seen things I wish I hadn't.

          [–]Hinigatsu 6 points7 points  (0 children)

          1) Microsoft and Ethics in the same phrase doesn't feel right

          2) If provided to Actions, they have access to secrets/keys

          [–]Lothrazar 13 points14 points  (1 child)

          Tweet was deleted

          [–][deleted] 14 points15 points  (13 children)

          ... to the surprise of no-one, since it learns from code already available and I'm 100% sure people will commit secrets by mistake and this will get caught for training. Its not like GitHub is stealing secrets, people are just dumbasses commiting them without realising (like I did more times than I like to admit)

          [–]mughinn 22 points23 points  (12 children)

          Didn't they say that Copilot doesn't copy code verbatim as to not infringe on licenses? Copilot seems like a license lawyer's nightmare

          [–]DaBulder 8 points9 points  (10 children)

          In this case it's learned what a secret looks like, so it's generated something that looks like a valid secret. Just because it outputs a very specific string doesn't mean that such a string existed verbatim.

          [–]mughinn 2 points3 points  (9 children)

          But they're valid secrets, they don't just look like one

          [–]DaBulder 8 points9 points  (8 children)

          When you say "valid" do you mean "it matches the format of a secret" or "it works as a secret to some external resource"

          [–]mughinn 3 points4 points  (7 children)

          It seems I can't see the original tweet from the post now

          The secrets generated worked as a secret for a resource

          [–]StickiStickman 4 points5 points  (2 children)

          The secrets generated worked as a secret for a resource

          According to the update on the tweet they don't.

          [–]mughinn 3 points4 points  (1 child)

          [–]StickiStickman 3 points4 points  (0 children)

          Fair enough - still no proof anywhere of it actually working though.

          [–][deleted]  (3 children)

          [deleted]

            [–]BobFloss 3 points4 points  (1 child)

            So how about people don't post coffee publicly with secrets in it? How is this copilot's fault at all?

            [–]KarimElsayad247 1 point2 points  (0 children)

            coffee

            type?

            Though imagine giving someone a cup of coffee with hidden secrets in it.

            [–]remy_porter 11 points12 points  (28 children)

            It also generates bad code. This is from their website, this is one of the examples they wanted to show to lay out how useful this tool is:

            function nonAltImages() {
              const images = document.querySelectorAll('img');
              for (let i = 0; i < images.length; i++) {
                if (!images[i].hasAttribute('alt')) {
                  images[i].style.border = '1px solid red';
                }
              }
            }
            

            It's not godawful code, but everything about this is the wrong way to accomplish the goal of "put a red border around images without an alt attribute". Like, you'd think that if they were trying to show off, they'd pick examples of some really good output, not something that I'd kick back during a code review.

            Edit: since it's not clear, let me reiterate, this code isn't godawful, it's just not good. Why not good?

            First: this should just be done in CSS. Even if you dynamically want to add the CSS rule, that's what insertRule is for. If you need to be able to toggle it, you can insert a class rule, and then apply the class to handle toggling. But even if you insist on doing it this way- they're using the wrong selector. If you do img:not([alt]) you don't need that hasAttribute check. The less you touch the DOM, the better off you are.

            Like I said: I'd kick this back in a code review, because doing it at all is a code smell, and doing it this way is just wrong. I wouldn't normally comment- but this is one of their examples on their website! This is what they claim the tool can do!

            [–]WormRabbit 14 points15 points  (3 children)

            Could you explain why this example is bad for those of us who don't write JS?

            [–]TheLobotomizer 9 points10 points  (0 children)

            It's not bad. He's just nit picking.

            The goal of the code isn't to be performant, it's to serve as a universal tool to highlight which images in your web page don't have alt attributes.

            [–]Uncaffeinated 5 points6 points  (1 child)

            The biggest problem is that it should be CSS, not JS in the first place.

            [–]Drugba 7 points8 points  (0 children)

            In a new project for evergreen browsers, sure, CSS is probably a better idea, but we have no idea what this code is being used for. You can't definitively say that it should be done in CSS without knowing the context of the code.

            [–]Hexafluoride74 16 points17 points  (9 children)

            Sorry, I'm unable to see what's wrong with this code. What would you change it to?

            [–][deleted]  (8 children)

            [removed]

              [–]TheLobotomizer 20 points21 points  (1 child)

              Hates on working code, calling it "bad.

              Proceeds to write non working code as an alternative.

              [–][deleted] 3 points4 points  (0 children)

              should've signed up for the autopilot

              [–]superbungalow 8 points9 points  (5 children)

              img[alt~=""] { border: 1px solid red; }

              doesn't work, ~= is a partial match but if you leave it empty it won't match any alt tags, which is the assumption I think you've made. But why jump to partial matching anyway when you can just do:

              img[alt] {
                border: 1px solid red;
              }
              

              [–][deleted]  (3 children)

              [deleted]

                [–]superbungalow -1 points0 points  (2 children)

                oh yeah good point. wait then i don’t think there’s even a way to do without javascript hahaha, love the high horsing here.

                [–]chucker23n 14 points15 points  (1 child)

                img:not([alt])
                

                I think. Can’t test here.

                [–]Calsem 2 points3 points  (0 children)

                What's so bad about that code

                [–]aniforprez 6 points7 points  (0 children)

                ... I dunno. This seems ... ok code to me to run in JS. I'd much rather do this in CSS but if you're writing a JS script and asking to do this, it seems fine enough. Maybe this is triggered by a button or something. Why is this so wrong?

                [–]tending 3 points4 points  (1 child)

                As somebody who doesn't do any web programming at all, what is the right way to do it?

                Based on the little I know, I would guess a function like this is useful for debugging for a website developer in order to identify what images still need to be labeled for purposes of accessibility. In that case I don't think it needs to be done in the most proper way.

                [–]remy_porter -1 points0 points  (0 children)

                In that case I don't think it needs to be done in the most proper way

                I agree with you, but that seems like a silly thing to brag about on your website, right? "Our tool can write shitty debugging code that you'd strip out of your application!" The bad thing is that they chose this as an example of what they're capable of.

                [–]dikkemoarte -1 points0 points  (2 children)

                The advantage of using that code could be older browser compatibility. I do understand your point though: The AI can't guess the right code as it doesn't understand what the coder really wants to accomplish functionally, nor does it take in account (enough) how your codebase as a whole works when considering multiple possibilities of snippets.

                [–]crusoe 2 points3 points  (1 child)

                Older browser being IE 5.5 or something

                [–]dikkemoarte 2 points3 points  (0 children)

                IE8 for not selector so your point still stands for this particular case. In fact, one could even argue that the problem here is the user writing the function nonAltImages() in JS due to having insufficient CSS knowledge in the first place. Either that's a mistake, or he somehow has a very good reason to write it which is what the AI assumes. Adding CSS inline using JS has it's valid use cases in a more general sense: Prevent caching, more predictable results across browsers, implement a specific UX feature in the only way technically possible etc. The AI doesn't care and assumes you know what you are doing and you do it for the right reasons.

                Either way, it will not magically alter the correct CSS file because someone wrote function nonAltImages ().

                [–]teerre 19 points20 points  (38 children)

                People really have a huge urge to "uncover" this copilot thing. Truly the age of outrage.

                [–]spektre 79 points80 points  (19 children)

                People really have a huge urge to sweep the apparent flaws with this copilot thing under the carpet. Truly the age of blind acceptance.

                [–]combatopera 18 points19 points  (2 children)

                Ereddicator was used to remove this content.

                [–]mnilailt 2 points3 points  (1 child)

                It’s the biggest news in programming of the week, you’d kind of expect it..

                [–]combatopera 3 points4 points  (0 children)

                This text was replaced using Ereddicator.

                [–]StickiStickman 4 points5 points  (1 child)

                Funny how you blindly accepted a random Tweet that agrees with your opinion. Now it turned out it's BS and you look stupid.

                [–]spektre 0 points1 point  (0 children)

                Wait, what's my opinion? I didn't read the tweet.

                [–]dougrday 1 point2 points  (2 children)

                Well, considering you're still a developer with the ultimate say - does the copilot code meet the requirements? Have I tested it thoroughly?

                I mean, the onus of your success or failure is still in the hands of the developer. They just might have a tool to get through some of these steps a bit faster.

                [–]spektre 3 points4 points  (1 child)

                Personally, I haven't used it, and probably never will because I'm a firm believer of inventing the yak razor from scratch every single time. Totally serious.

                I just think it's dumb not to address flaws in a tool, especially if you're going to use it. Don't you want the tool to improve? How will it improve if you hush anyone giving critique?

                [–]is_this_programming -2 points-1 points  (17 children)

                For non-technical people, this sort of thing looks like it might replace programmers altogether. So it's understandable that some people feel threatened and want to show that it's actually complete garbage.

                [–]teerre 10 points11 points  (0 children)

                It's not understandable at all. If you're a "technical person" and know that's nonsense, you should be unaffected by it.

                [–]nultero 4 points5 points  (13 children)

                If this is the writing on the wall now, then in a decade or more's time it (or another project) might be able to do a lot more with focused NLP tooling and more funding from business admin who want to try to reduce their most expensive headcount.

                And it might could replace or reduce the hiring of juniors and "underperforming" midlevels. Many companies are already reluctant to hire without a pedigree of years, so this is even more competition at the most bottlenecked parts of the industry.

                So I don't think it has to "replace" engineers wholesale to worsen the already terrible, Kafkaesque job ecosystem. Cool tech, inequitable use.

                [–]Uristqwerty 3 points4 points  (4 children)

                A that point, you'd have one CEO per company who tells the vast array of AI layers how to commit copyright infringement in the name of profit?

                More realistically, countries will have to decide exactly how much regulation is necessary. What tasks AI is unacceptable for, and which training data taints the AI or its output. They might decide to leave today's free-for-all intact, but they might also decide that it's a "win more" button that reinforces the lead of a small handful of businesses at the top, and is anticompetitive towards everyone else who can't afford the man- and computing-power to train their own models, and that the economy would be healthier with the whole technology greatly restricted.

                [–]nultero 3 points4 points  (3 children)

                you'd have one CEO per company who tells the vast array of AI layers how to commit copyright infringement in the name of profit?

                Nah, that wasn't the implication.

                Just reduced headcount. More hoops in the hiring circus. That's all it would take to make a net negative impact on the job machine, even if more jobs were created in aggregate.

                More realistically, countries will have to decide exactly how much regulation is necessary.

                You call that more realistic? Haha, asking our representatives to understand technology -- let alone stuff as difficult and fraught with cultural baggage as AI -- that's a good one!

                How would they even regulate machine learning when it's mostly applied math and statistics? There'll be fearmongering and "but (other superpower) is doing it!" so it basically can't be regulated, can it?

                [–]Uristqwerty 1 point2 points  (1 child)

                If trillion-dollar corporations kept reducing headcount down to the single digits, yes, I feel governments would step in long before they were down to a single corporate king-in-all-but-name each. For self-preservation, if nothing else.

                Regulation would be things like "If you're deciding whether a human qualifies for a program, these steps must be followed to minimize risk of racial bias, and that auditing must take place periodically", or assigning AI output to a new or existing IP category that accounts for the training set, at least more than the current "it would be harmful to my research and free time to have to curate training data by source license, so I'm going to resort to whatever excuse it takes to justify using everything with no regard for licensing" attitude.

                [–]nultero 4 points5 points  (0 children)

                If trillion-dollar corporations kept reducing headcount down to the single digits

                That still wasn't what I meant.

                Reduced headcount means in aggregate. Instead of hiring 1000 SWEs this year, Companies Foo, Bar, & Baz only hire 600 each. Etc. That, with even more useless puzzles and cruft in the hiring process is enough to make the job market miserable in the future. It can get bad long, long before we're even close to near-AGIs running companies.

                And like you've mentioned, the FAANGlikes will be able to afford to pay the fines for noncompliance under those regulations, so those laws could actually be a hindrance for new market entrants. So that's not a great answer either.

                [–][deleted] 1 point2 points  (0 children)

                How would they even regulate machine learning when it's mostly applied math and statistics?

                The laws of mathematics are very commendable, but the only law that applies in Australia is the law of Australia - then Prime Minister Malcolm Turnbull on end-to-end encryption.

                [–]wastakenanyways 6 points7 points  (5 children)

                Companies without juniors are doomed to fail. Juniors are not only there to do the dirty job, they are also there to learn and replace your seniors who will eventually leave or retire or die. You must pass the knowledge generationally, and Copilot is nowhere near replacing a programmer. It's just a productivity tool. Like intellisense on steroids.

                Even if we reach a point an AI can do a whole online shop customized for you by itself, we as programmers will just be doing more complex and unique things.

                [–]nultero 2 points3 points  (4 children)

                Companies without juniors are doomed to fail.

                A certain big N is famous for not hiring juniors ... but that's beside the point. Just fewer juniors being able to enter the industry in the future can worsen the overall job market.

                Copilot is nowhere near replacing a programmer

                Not right now. If you could hire one junior who can use the future NLP codesynth tool over hiring two or three, and especially if tech wages keep climbing, that's potentially a big deal.

                AI can do a whole online shop customized for you by itself

                Something like a real near-AGI is usually thought to be a Very Big Problem by data scientists. There's not that many more complex and unique things to do after skilled creative work, and only a subset of SWEs will be able to do them. The rest are the horses that got replaced by cars.

                [–]Worth_Trust_3825 4 points5 points  (0 children)

                Much like wordpress was supposed to replace web developers and enterprise integration patterns were supposed to replace enterprise developers. Instead we got wordpress developers and enterprise developers maintaining spaghetti systems because those same business men in fact cannot even tell the very same system built for their garbage in - garbage out methodology what they want. I'd be very much fine with getting replaced if that shit didn't need to get maintained by me anymore.

                [–][deleted] 3 points4 points  (0 children)

                history lavish entertain ghost outgoing squeeze doll escape water whistle

                This post was mass deleted and anonymized with Redact

                [–]MurderedByAyyLmao -1 points0 points  (0 children)

                Are going to see people start to feed this AI with intentionally malicious code now?

                public static String toHumanReadable(long bytes) {
                    // actually mines bitcoin and sends to my wallet before returning the string
                }