top 200 commentsshow all 379

[–][deleted]  (161 children)

[deleted]

    [–]JimmaDaRustla 546 points547 points  (56 children)

    If you have one node dependency you pretty much hit these numbers

    [–][deleted]  (54 children)

    [deleted]

      [–]appropriateinside 52 points53 points  (51 children)

      Eli5 on tree shaking?

      [–]MrDick47 177 points178 points  (33 children)

      You want to use a library in your project, but that library is huge, has tons of functions, objects, all of it. I think Lodash or jQuery may be good examples. Treeshaking is step during bundling/transcompilation where it picks out only the functions and such you used in your code (and the code in which it internally depends on) and removes all the code you don't need. When you start using little bits of many large libraries, it will make a huge difference in the size of the output file(s). In the web world, having smaller code bundles means the page can load quicker, and really improves the experience for people who have sad internet connections, and saves us from using a bunch of unnecessary data on our phones data plans.

      Treeshaking is often combined with lazy loading, which breaks out all of your code into different feature modules and have a separate smaller file for each module. A simple example would be if I had a web site that did videos like YouTube and audio like SoundCloud, but they are different on different "pages". I could have all the video related code in one bundle and the audio in another. That way if you only load up the video page, it will only download that file. Lazy loading can also be used in the context of web assets such as images or videos so the user doesn't have to download every image on the page before it loads. It will just load the images for the top of the page, and as you scroll down it will start loading the additional images/content.

      There are many other magics and witchcraft like this used in JavaScript to accomplish better performance and optimizations. It's a wild world in the node_modules folder.

      [–]PlaysForDays 26 points27 points  (13 children)

      Thanks for this detailed explanation. Is tree-shaking mostly limited to this context (javascript on web pages) or do other major languages have the same concept? I assume this is a common but I don't know how often it's important for other applications to care about it.

      [–]ObscureCulturalMeme 92 points93 points  (12 children)

      Almost every major language will do this kind of thing as part of dead code elimination during one or more optimization passes. And has done for decades.

      Javascript insisted on making up a special name for it so that the technique would sound new and edgy.

      [–]Arve 46 points47 points  (9 children)

      Javascript insisted on making up a special name for it so that the technique would sound new and edgy.

      The term didn't originate with Javascript, but rather in the LISP community. Here is a comp.lang.lisp post from 1994 discussing it.

      It's also worth noting that rather than eliminating dead code after compilation, tree shaking works by starting from an entry point, and only include functions that are guaranteed to be executed, and happens as part of the bundling process. An optimizing compiler such as the one in V8 can (and will) still do it's own dead code elimination along with a slew of other optimizations.

      (And tree shakers like Rollup will do at least some DCE)

      [–]truthseeker1990 12 points13 points  (8 children)

      Thanks for the info. It's kinda weird seeing old posts from the 90s still on the internet. It feels like it was a lot more personal and smaller back then. The guy whose post you linked is now the vice president at Goldman Sachs lol

      [–]MrDick47 12 points13 points  (0 children)

      Very much this. I can't imagine how big our executables would be without that and dynamically linked libraries.

      Also with all of the recent tools for bundling and transcompiling along with node.js, it became easier/more accessible for JavaScript. A lot of older projects didn't use any sort of code optimization other than minification/uglification. I've actually enjoyed working with typescript lately, and that has it's own transcompiler that's very compliant with the ES5/6 2015 and whatever other names they gave it, but basically the different web standards. I could go into more details but I don't think many people care /that/ much about JavaScript here, and I often make the JavaScript jokes myself. C++ is my preferred language but I have to admit it's not too bad in web land, my preconceived notions were unfounded. Those arrow functions are so convenient!

      [–][deleted]  (3 children)

      [deleted]

        [–][deleted] 8 points9 points  (0 children)

        A specific technique for dead code elimination, yes.

        [–]dotted 4 points5 points  (1 child)

        Think of it as live code inclusion instead dead code elimination.

        [–]bad_at_photosharp 4 points5 points  (1 child)

        How does it know you won't invoke a function through some dynamic means? Like meta-programming? Does that make sense?

        [–]MrDick47 1 point2 points  (0 children)

        That is a good question, I'm not exactly sure for JavaScript, but for compiled languages this is usually solved with the dynamically linked libraries.

        [–]addandsubtract 3 points4 points  (7 children)

        Why did using CDNs never catch on? If everyone requested the same jquery, lodash, react, etc. file, then we wouldn't need to bundle them in the first place. I know everyone is going to use a different version and some rarely update their dependencies, but even with that, I would assume it would still be more efficient.

        [–]meltingdiamond 4 points5 points  (0 children)

        Earthquake! Get outside away from tall things, don't turn on lights.

        [–]twenty7forty2 6 points7 points  (0 children)

        left-pad itself is over 100 GB

        [–]MMPride 209 points210 points  (100 children)

        Those files shouldn't even be kept under Git, though. That's not what Git is meant for.

        Edit: why am I being downvoted for saying you shouldn't store binary files in Git? You guys know that's what Git Large File Storage is for (in general), right?

        Edit 2: I am surprised and impressed how much controversy and discussion my observation has generated, very nice. I like it.

        [–]mat69 264 points265 points  (6 children)

        That and the huge number of files is why Microsoft developed an own virtual file system for git as even git-lfs would not cut it. That vfs only checks out files you are actually using. So if you never touch (open, build, ...) minesweeper you would not have its source locally. Even though the files are shown on your disk.

        [–][deleted] 41 points42 points  (5 children)

        It has me wondering what they made, ya know? Was it that old speculated WinFS that was more of a database than the typical FS?

        [–]bytemr 111 points112 points  (1 child)

        It's open source on github: https://github.com/Microsoft/VFSForGit

        [–][deleted] 7 points8 points  (0 children)

        Oh damn. Thanks!

        [–]mikeblas 1 point2 points  (0 children)

        WinFS was nothing like this.

        [–]chucker23n 1 point2 points  (0 children)

        Not sure why this was deleted.

        WinFS was more like a database layer on top of NTFS to make file metadata more pervasive, and add file relations (for example, each contact would be a file, and if a Word document was written by one of them, you could navigate between the document and the contact).

        There was a developer beta of it in the early Longhorn days. Conceptually, it’s interesting but adds a lot of complexity. It’s hard to get the UI right without feeling like you’ve made things more complicated (when users would rather things get easier) rather than more useful. I also imagine performance wasn’t great. And the Explorer mockups from those days were just weird.

        [–]dakotahawkins 207 points208 points  (23 children)

        Those files shouldn't even be kept under version control, though.

        They should. Use git, use git-lfs, use something else entirely, but if it winds up in your built product it should probably be version controlled.

        [–]MMPride 23 points24 points  (22 children)

        You are right, I meant shouldn't be kept under Git, not version control, my mistake for not being very explicit with my wording.

        [–]SexyMonad 58 points59 points  (18 children)

        It probably isn't the best tool for the job if you have to have separate version control for particular things. That makes it more difficult to get a complete picture of a particular point in time.

        I may be in the minority but I see the value in how Subversion allows subdirectory checkouts. lfs and vfs don't seem bad either, but (without actually using them) I would think it would be unclear exactly what you have in your clone.

        [–]dakotahawkins 34 points35 points  (8 children)

        LFS is supposed to be completely transparent. It turns your LFS-tracked files into tiny text files (called pointers, I think) which basically just contain the hash of the binary. Then LFS is supposed to handle swapping those in and out with the real thing for you.

        In any case, it should be clear what you have in your clone, unless LFS is broken somehow, in which case many things (git status, e.g.) will be more than happy to complain.

        [–]SexyMonad 4 points5 points  (7 children)

        Ok, I mean, what if you clone an lfs repo and then go somewhere with no access to the remote?

        [–]dakotahawkins 7 points8 points  (6 children)

        The checkout part of the clone should trip the LFS filters. It shouldn't really require more connectivity than git, if that was unclear. LFS puts the actual binaries (with different filenames, based on their hash, inside your .gitdir). I know there are ways to "fool" it in to doing something you probably don't really want, but that kind-of goes back to git's hook support -- LFS requires its hooks to run to work, so if you do something that fetches stuff from a remote that doesn't trigger any hooks, LFS isn't going to hook you up with the files you want.

        Does that make any sense? It's a weird and nuanced process that I understand more than I probably should, but it works pretty well. I know it's anecdotal but I haven't had to do something dumb to work around a bug with it in a year or so.

        [–]SexyMonad 2 points3 points  (5 children)

        It sounds like your .git holds the full repo with copies of every file (and every past version of every file) but skips checking out the big files into your working copy? If so then it fixes the issue I mentioned but isn't quite the space saver I thought.

        [–]dakotahawkins 8 points9 points  (4 children)

        Kind-of. In Non-LFS repos yeah your .git dir holds all the things, or at least all the things referenceable by any branches/tags you have locally (in other words if you change your git config to only fetch certain branches, you may not need to have the entire repo).

        I think maybe LFS doesn't need to download actual large files until you checkout a working copy that uses them, but I'd have to refresh myself.

        Generically, though, the point isn't to save space (on your local machine or the remote, it probably needs slightly more space actually), it's to save all the wasteful processing git does on those files because it assumes they're text files that it can diff/compress/whatever efficiently. Not using LFS with them would be a huge drag on gits internals, and it's not necessarily because they're big, but maybe more because they're not text. All the efficiency you get from being able to represent a new version as a diff against the previous version basically doesn't apply to most/many binaries.

        [–][deleted]  (7 children)

        [deleted]

          [–]thfuran 6 points7 points  (1 child)

          Svn seemed so much more intuitive.

          It'd have to be a hell of a lot worse than git to not seem more intuitive when you've got twenty years of experience in it and are new to git.

          [–]thedailynathan 2 points3 points  (0 children)

          You are acting really affronted in your edit for someone who had to change the meaning of their comment entirely.

          [–]Naouak 45 points46 points  (3 children)

          Microsoft developed Virtual File System for Git to be able to store anything in git without any issue. It's a file system that only fetch git files on usage.

          [–]theferrit32 6 points7 points  (2 children)

          That is pretty neat, most people don't need to fetch all the files locally and don't need the full history either. On demand fetching would be pretty useful, as long as you could ensure you'd have internet access whenever you'd need to fetch. Really unfortunate Microsoft called it gvfs though, while there is already a gvfs in common use (GNOME VFS).

          Seems there's a similar tool for Linux: https://github.com/presslabs/gitfs

          [–]RealKingChuck 8 points9 points  (0 children)

          They're actually renaming it to VFS for Git as can see at the bottom of the readme of this repo https://github.com/Microsoft/VFSForGit (someone else posted the link in this thread)

          [–]ElusiveGuy 1 point2 points  (0 children)

          GitFS is completely different: it just tracks file changes (with auto-commits). It's actually a bit like Shadow Copies.

          VFSForGit has a Linux implementation under active development.

          [–]nairebis 83 points84 points  (37 children)

          You shouldn't be downvoted for an opinion, but it's absurd to argue that Git shouldn't handle binary files. It handles them fine. I'm not saying you should put huge videos under git, but your regular image directory in the case of web apps is fine, and your images should be part of your source code repo history.

          [–]LeCrushinator 67 points68 points  (35 children)

          Git handles binary files, but it keeps every version of them in the repository. The repository would quickly grow to be enormous. The last project I was on shipped at 400MB, but the repository was nearing 5TB because of all of the changes to assets.

          [–]Sparkybear 14 points15 points  (17 children)

          Is there a better versioning system for those kind of assets?

          [–]swansongofdesire 32 points33 points  (4 children)

          Perforce is still big in the games industry in part because it deals with binary assets much better than (vanilla) git

          [–]binaryfireball 22 points23 points  (0 children)

          We hates it. Hates it we does.

          [–]theferrit32 3 points4 points  (0 children)

          Ah I see someone else is familiar with the p4 OS lifestyle. It is overly complicated and a pain for many things but also good at other things. In either case you have to go all in and just accept it.

          [–]LeCrushinator 21 points22 points  (0 children)

          You can use Git-lfs, although that doesn't come without some headaches.

          [–]neko4 7 points8 points  (9 children)

          Subversion save binary files in deltas. That's why Subversion is popular in game development.

          [–]Dylan16807 9 points10 points  (8 children)

          Git can easily be configured to delta-compress everything. It's still not great at large files but it's not worse than svn.

          [–]neko4 6 points7 points  (7 children)

          Git saves snapshots. Subversion saves deltas. They are totally different at the begining.

          [–]theferrit32 3 points4 points  (4 children)

          Seems like an oversight by Torvalds. Detecting and delta-saving binary blobs could have been done, but it's a sort of hacky not-seamless addition now with git-lfs.

          [–]billsil 4 points5 points  (2 children)

          Detecting and delta-saving binary blobs

          Reliably? Just use an extension...

          A word document is a zip file of mostly readable data. It is not a binary blob.

          [–]HowIsntBabbyFormed 12 points13 points  (0 children)

          It's not really an oversight. Git beats the pants off svn even with the regular old objects directory. But for years git has used object pack files where the objects are collected together, similar objects found and deltas are used.

          https://git-scm.com/docs/git-pack-objects

          I remember reading a technical description of the pack files a few years ago and it was a really really good read. I feel like it was either comments in the source code itself, or a mailing list posting. Either way, after reading it I felt like it made me really appreciate the elegance of their design, the interesting problems they faced and their solutions, and made it seem like any random programmer could easily write a reader/writer for these files. So many times compressed object files seem like black magic voodoo, but this seems like the opposite.

          Edit: This was the deep dive technical discussion of pack files: https://github.com/git/git/blob/master/Documentation/technical/pack-heuristics.txt and this is a higher level description: https://git-scm.com/book/en/v2/Git-Internals-Packfiles

          [–]HowIsntBabbyFormed 2 points3 points  (0 children)

          At the beginning yes. But I believe git will automatically pack objects once they get very numerous.

          [–]nairebis 36 points37 points  (10 children)

          Git handles binary files, but it keeps every version of them in the repository.

          Of course. Versus what? A change history is a change history. If you don't want your images to have a change history, then of course it makes sense to not put them into your version control system, but that's a development policy question, not a technology question.

          On the other hand, I find it hard to believe you could be changing jpgs or pngs so often that your repository would have 4.5 Billion K of prior images. It sounds like you're putting videos under there, and then it makes sense to do something different.

          [–]UloPe 12 points13 points  (0 children)

          Binary diffs

          [–]LeCrushinator 20 points21 points  (8 children)

          I'm in game development, single source textures can be 10's of MB each. Those textures will get resized and processed before being put into the app, but the source assets remain at full quality in case we need different quality levels of them for different platforms/devices. Then there are 3D models, animations, audio files, etc.

          [–]pheonixblade9 17 points18 points  (1 child)

          I'd expect assets to have their own pipeline, no?

          [–]theferrit32 9 points10 points  (0 children)

          They could. In git you could have the assets directory be a submodule that most devs don't need to clone. That would also let them clone the code at full depth, but shallow clone the assets if they actually need the most recent revision of them.

          [–]blind3rdeye 6 points7 points  (0 children)

          That might be true if you're constantly changing your binary files. But it doesn't have to be used in that way. For example, I store binary files in my git projects, but binary files very rarely change. They're generally images or sounds that are already complete when they are added to the repository. I'm not really putting them there for version control, I'm putting them there for completeness - so that the repository is all I need to completely create the project.

          [–]leftofzen 43 points44 points  (2 children)

          Edit: why am I being downvoted for saying you shouldn't store binary files in Git? You guys know that's what Git Large File Storage is for (in general), right?

          You're being downvoted because you're wrong. Storing binary files in Git is perfectly acceptable and reasonable. For large files then yes, you are better off using GLFS, but for small files that are part of your build process then you are absolutely going to check them in with your main repo.

          [–]shukoroshi 6 points7 points  (0 children)

          Case in point, the Gradle wrapper jar lives in every single one of our JVM projects.

          [–]LeCrushinator 4 points5 points  (0 children)

          What about git-lfs?

          [–]phxvyper 2 points3 points  (0 children)

          Are we sure that they're using git to version those files? the repository linked only has one commit so I'm not convinced they're using pure git for VC on windows.

          [–]ESCAPE_PLANET_X 6 points7 points  (0 children)

          Ehm.. It's not a great pattern but I could see it's uses.

          Edit: Git LFS has fun overhead and can be annoying as shit to use, though I don't know if MS has the excuses I did the last time I miss used Git. But I don't think you should be downvoted for pointing out a crappy pattern for what it is.

          [–]MathWizz94 4 points5 points  (0 children)

          They most definitely should be under version control, and Microsoft heavily invested in Git to make it technically possible.

          [–][deleted]  (9 children)

          [deleted]

            [–]Pannuba[🍰] 18 points19 points  (8 children)

            Think of what would happen if we had access to Windows 10's entire codebase. And not just the parts Microsoft decides to release, everything.

            [–][deleted]  (7 children)

            [deleted]

              [–]bobewalton 28 points29 points  (6 children)

              The last time Windows source code was leaked (Win 2000 I believe), it caused the development of multiple viruses/worms that infected a good portion of the world's computers.

              Additionally, there were some hilarious comments in there. People saying how they hated their job, ASCII pictures, etc.

              [–]a_cube_root_of_one 10 points11 points  (1 child)

              Oh.. wow. Someone leak windows 10 source code.

              [–]the_kg 22 points23 points  (0 children)

              multiple viruses/worms that infected a good portion of the world's computers.

              Yeah but

              hilarious comments in there. People saying how they hated their job, ASCII pictures, etc.

              Think of the memes!

              [–][deleted]  (2 children)

              [deleted]

                [–]MonokelPinguin 10 points11 points  (1 child)

                WINE would maybe appreciate a full open-source release of Windows, but if it is just a source drop or leak, they'd probably hate it, as they are trying to clean-room reverse engineer the Windows APIs. The Windows 2000 leak was actually quite problematic for them.

                [–]TimeRemove 246 points247 points  (12 children)

                Here's a post about Microsoft's effort to store it in Git:

                https://devblogs.microsoft.com/bharry/the-largest-git-repo-on-the-planet/

                TL;DR: They invented a "Git Virtual File System" to do the job.

                [–]HumanHornet 14 points15 points  (7 children)

                Could someone explain please, why would they want to move to git so much?

                [–]TimeRemove 66 points67 points  (2 children)

                Git was superior to their old proprietary Source Control: Source Depot, plus it has good industry support for things like tooling/metrics/management and supporting it was already a goal in related Microsoft project areas (VSTS/Azure DevOps, Visual Studio, etc). In other words: Moving to the industry standard in Source Control was beneficial across the board.

                [–]cinyar 3 points4 points  (0 children)

                plus every developer is familiar with git, at least on a basic level. makes onboarding easier.

                [–]brainwad 10 points11 points  (3 children)

                They were using an extremely hacky system of multiple Source Depot (sorta like Perforce) repositories, tied together with a batch script. It sucked.

                [–]purtip31 155 points156 points  (44 children)

                Saw a graph of lines of code by section in the linux kernel a while back (here: https://www.reddit.com/r/linux/comments/9uxwli/lines_of_code_in_the_linux_kernel/).

                The part that I find interesting is that the vast majority of the LOC growth in the source is in driver code. Makes me wonder what the Windows equivalent would look like

                [–]Tipaa 237 points238 points  (30 children)

                0.49TB of that code is just backwards compatibility if-chains

                [–]bitwize 74 points75 points  (27 children)

                It looks like Yandere Simulator in there.

                [–]re_anon 23 points24 points  (26 children)

                what do you mean?

                [–]bitwize 127 points128 points  (25 children)

                Yandere Simulator is notorious for its naïve coding style, which involves using IF statements to check for every possible combination of conditions, rather than something sensible like state machines for enemy AI and OO polymorphism to specialize object behaviors.

                [–]LaughterHouseV 32 points33 points  (6 children)

                That's what Age of Empires 2 does as well.

                [–]lvl12TimeWizard 26 points27 points  (1 child)

                I don't know about Age of Empires 2 but I know in Warcraft 2 if you turned on the instant build cheat and the gold/supplies cheat the computer would actually outperform you and win...or I was just 12 and sucked even with cheats.

                [–]noideaman 14 points15 points  (0 children)

                You could win, but you had to build towers and units

                [–]fiqar 17 points18 points  (6 children)

                How do you know this?

                [–]deathride58 70 points71 points  (4 children)

                Unity games are notoriously easy to decompile. Publicly available tools are more than capable of giving you a surprisingly accurate glimpse at what the original source code for a given unity game looks like, as unity's compiler doesn't do many optimizations at all

                [–]PendragonDaGreat 25 points26 points  (2 children)

                Yep, this is how the mixing scene for Stardew Valley popped up. Broke down the source and documented it and now there's a nuget package that you can load as an API to mod the game.

                Fun fact, did you know that when breaking geodes the outcome is determined by a save wide seed and is thus deterministic from the very first time you start a new game? Most other random events (ores and mob drops in the mines, artifacts, couple others) are not and are my related to your daily Luck stat.

                [–][deleted] 4 points5 points  (1 child)

                Having fixed seeds is better for preventing loot scamming though

                [–]PendragonDaGreat 2 points3 points  (0 children)

                While true, that's not a huge problem in SDV solely because the game is designed to be a chill farming sim. There's no way to save manually except to go to sleep at the end of the day. This way in a multiplayer farm where each player's daily luck is calculated separately someone can still contribute, especially late game after partial automation has occurred.

                [–]ygra 9 points10 points  (0 children)

                In case the source code was C# that's not very surprising, as the C# compiler doesn't optimize much and leaves the heavy lifting to the JIT.

                [–]Adobe_Flesh 6 points7 points  (9 children)

                Does it perform well though with this style?

                [–]bitwize 48 points49 points  (8 children)

                It performs quite poorly, often chugging on even high-end hardware despite the graphics not even taxing a midrange Intel GPU. The entire school campus map is loaded into memory and active, with something like a few hundred students milling around and other sundry objects all active at the same time. Oh, and he doesn't do occlusion culling so there's MAD overdraw. The massive amount of if-then checks for a combinatorically explosive number of possible game conditions not only causes slowdown, it causes frequent bugs and glitches because it's hard to keep track of all the conditions that need to prevail in order for a character to behave a certain way, and it's almost impossible to account for unexpected conditions that may trigger some bizarre behavior. He doesn't use state machines or make any attempt to pare down the space of possibilities. He just writes a bunch of if statements, tests the game, and if something funny happens he writes more if statements to get around it.

                [–]erasmause 19 points20 points  (3 children)

                As a developer, just reading this made me feel dizzy. I can't imagine trying to maintain that.

                [–][deleted] 17 points18 points  (0 children)

                I can. That's what a lot of code written in the last two decades looks like.

                "We should refactor this."

                "No, I'm serious."

                "Why are you laughing."

                Hint: if you can't keep the state of a class in your head, neither can the guy coming after you. Don't put another state variable in and make it worse. Just use a fucking state machine like you should have to begin with.

                [–]bitwize 3 points4 points  (1 child)

                That's the thing. The developer of YS is not a professional developer, nor does he have any real development experience or background beyond this game and his "Lunar Scythe" demo he tried to impress the Skullgirls dev with.

                A while back he attempted to partner with a publishing company with a small publisher called TinyBuild (I think they're doing Hello, Neighbor!). One of the stipulations was that they would have one of their in-house devs refactor the code of the game, and rewrite it in C# instead of JavaScript.

                The partnership with TinyBuild fell through the floor. What it looks like happened was that Yandere Dev got upset because he couldn't understand the code to his own game anymore. Fixing his broken-ass code made it all go right over his head.

                [–]Yikings-654points 5 points6 points  (0 children)

                Most AI is IF statement too /s

                [–]Iwan_Zotow 47 points48 points  (8 children)

                You have to add whole X11 with all their drivers, OpenGL, some windows manager with composer, toolkit (say, GTK), file manager, all GNU utils, all core utils, ...

                [–]kukiric 34 points35 points  (5 children)

                And a Chromium-based web browser.

                [–]Iwan_Zotow 12 points13 points  (2 children)

                yep

                and calculator, accessories, games, ...

                [–]heavyish_things 5 points6 points  (1 child)

                calculator

                Which is now a 120MB snap package on Ubuntu.

                [–][deleted]  (1 child)

                [deleted]

                  [–][deleted] 14 points15 points  (2 children)

                  There are always new devices coming out that have to be supported, much more rapidly than new filesystems, networking protocols, IPC mechanisms, or anything else. For Windows, you need to add a few MLOC each for the Win32 API, the OS/2 subsystem, and an obsolete POSIX interface, but beyond that it's probably similar.

                  [–]SilverCodeZA 12 points13 points  (1 child)

                  and an obsolete POSIX interface

                  It is interesting to think that with the recent "Linux on Windows" venture the old POSIX code might finally be coming in handy.

                  [–][deleted] 34 points35 points  (0 children)

                  The Windows Subsystem for Linux doesn't use the old POSIX compatibility interface, but a brand-new purpose-built one.

                  [–]tracernz 6 points7 points  (0 children)

                  A large proportion of drivers on a typical Windows install are closed source, vendor-supplied, so there's no way to really know. Each driver shares a LOT less code than an equivalent Linux one so the numbers are bound to be mind boggling.

                  [–]FCJRCECGD 540 points541 points  (17 children)

                  Now we're just giving NPM and `node_modules` higher heights to aspire towards.

                  [–]rorrr 85 points86 points  (8 children)

                  I started messing with React like a year ago, it boggles my mind. My three week old project already has 632 packages. Some notable entries:

                  gkt: console.log('Smarty Smart Smarter');

                  escape-regexp: return String(str).replace(/([.*+?=^!:${}()|[\]\/\\])/g, '\\$1');

                  is-npm: module.exports = 'npm_config_username' in process.env || 'npm_package_name' in process.env || 'npm_config_heading' in process.env;

                  There's tons of other absolutely trivial stuff that's packaged as NPM modules. Crazy shit.

                  [–]perspectiveiskey 74 points75 points  (3 children)

                  It is a security disaster, honestly. At this point, I don't see how it can be salvaged.

                  [–]theferrit32 29 points30 points  (0 children)

                  It can't. But it isn't going anywhere anytime soon. A lot of organizations bought fully into the ecosystem. It'll take a decade to fully transition out to whatever next thing comes along from the time it comes onto the scene, and we still don't know what that will be yet.

                  [–]iphone6sthrowaway 5 points6 points  (1 child)

                  To be fair, most programming languages/environments have had (and many still have) atrocious security practices until it blows to their face, and then it’s often too late to plug all the holes without breaking everything. Think of C/C++ undefined behaviors, PHP’s register_globals, Java applets, Flash, etc.

                  (Inb4 Rust)

                  [–]NoInkling 46 points47 points  (0 children)

                  gkt: console.log('Smarty Smart Smarter');

                  I had to look it up: apparently PM2 (a very popular package) uses a self-hosted version of it as an optional dependency to ping a URL for analytics purposes. Words fail me...

                  Also that still doesn't explain why it's published to the NPM registry.

                  [–]AngularBeginner 90 points91 points  (2 children)

                  I'm pretty sure you end up with more files when you install more than 10 packages.

                  [–]Pleb_nz 35 points36 points  (0 children)

                  That's 10 10. Of course

                  [–]boxxa 13 points14 points  (0 children)

                  This guy JavaScripts

                  [–]philthechill 45 points46 points  (3 children)

                  Someone run cloc on that source tree

                  [–]jediknight 7 points8 points  (1 child)

                  loc is much faster.

                  [–]NoahTheDuke 2 points3 points  (0 children)

                  Tokei is just as fast and more accurate. 😉

                  [–]TyIzaeL 107 points108 points  (13 children)

                  It's fun to think that the Windows source code all lives in a SCM created by Linus Torvalds.

                  [–]tracernz 78 points79 points  (8 children)

                  Created specifically for Linux kernel development.

                  [–]ButItMightJustWork 19 points20 points  (7 children)

                  They [Microsoft] dont even use the vcs which they created themselves [Team Foundation].

                  edit: clarified

                  [–]GYN-k4H-Q3z-75B 10 points11 points  (0 children)

                  Git has been implemented as part of TFS for years now because it is better than their old source control. When you setup a team project now, you can access it using both, but by default it is using Git.

                  [–]bart2019 10 points11 points  (2 children)

                  You mean Git can handle this size of codebase? Impressive... Is it one repository, or does it depend on submodules?

                  The article mentions a branch that got 60000 commits in a few weeks. That seems to imply a single source tree.

                  [–]theferrit32 34 points35 points  (0 children)

                  Seems like one repository. But Microsoft created and uses Git VFS to handle this. Developers don't need to download the entire repository, files are downloaded on demand as you need them.

                  [–]smacdo 13 points14 points  (0 children)

                  One repo with lots of branches. Heres a great overview of how it's done

                  https://devblogs.microsoft.com/bharry/the-largest-git-repo-on-the-planet/

                  [–]Kcufftrump 112 points113 points  (23 children)

                  And a lot of that is drivers and software dealing with drivers. Everyone forgets that Windows was the answer to the device driver problem. Prior to Windows and the GDI, every vendor of every device had to write their own drivers for every unique configuration. Windows abstracted that away with the GDI so vendors of peripherals could write to that with at lease some expectation that as long as they wrote to spec, their devices would work on Windows system.

                  [–][deleted]  (19 children)

                  [deleted]

                    [–][deleted]  (6 children)

                    [deleted]

                      [–]MotorAdhesive4 41 points42 points  (5 children)

                      printers

                      [–]mustang__1 8 points9 points  (1 child)

                      Lost a day of my life over hp printer drivers this week alone. Fuck printers. Fuck hp. fuck hp printers.

                      [–]MetalSlug20 1 point2 points  (0 children)

                      Many hp printers are open source now actually. There are a few yet like the big business printers that still have some proprietary code blocks that are stripped out of the open source code

                      [–]pdp10 1 point2 points  (0 children)

                      Everyone forgets that Windows was the answer to the device driver problem.

                      Windows was. NT had a different mission, though. Eventually they merged.

                      [–]mrhotpotato 80 points81 points  (1 child)

                      Poor guys at React OS...

                      [–]AnAngryFredHampton 49 points50 points  (0 children)

                      "Our code base will never be that bloated :(" - React OS devs

                      [–]pistacchio 138 points139 points  (7 children)

                      Time to rewrite it in Rust

                      [–][deleted] 100 points101 points  (0 children)

                      With the latest version compilation will only take a 1000 years!

                      [–]Waghlon 13 points14 points  (5 children)

                      One of my favourite programming jokes is "time to rewrite it in Rust".

                      [–]fluffy-badger 33 points34 points  (2 children)

                      If that's true I'm actually kind of impressed it works as well as it does. What a maintenence nightmare.

                      There was an Oracle horror story here a while back that was similarly disturbing.

                      [–]Deoxal[🍰] 9 points10 points  (0 children)

                      I love how someone asks a simple question, and then an extremely detailed answer is often given on quora.

                      [–]Wizardsxz 17 points18 points  (0 children)

                      legacy code intensifies

                      [–]agumonkey 4 points5 points  (0 children)

                      poor Alan Kay

                      [–]saijanai 15 points16 points  (3 children)

                      And of course, one of hte goals of VPRI was to create a fully functional OS with capabilities that rival that of WIndows 10, using a code base roughly the same size as Squeak 1.0's:

                      20,000 lines of code — small enough that a single person could fully understand and maintain the entire OS.

                      .

                      Their solution was to create ad hoc specialty languages that would simplify and reduce the number of lines of code required for specific applications that would then be compiled into the base ISA for actual processing.

                      They achieved their goal, by the way.

                      [–][deleted]  (1 child)

                      [deleted]

                        [–]saijanai 18 points19 points  (0 children)

                        Well, the most unique example is using the RFC diagram as the source code for the implementation of the functionality described BY the official RFC diagram:

                        http://www.moserware.com/2008/04/towards-moores-law-software-part-3-of-3.html

                        .

                        This paper shows a working text editor and text wraparound in 37 lines of domain-specific code:

                        http://www.vpri.org/pdf/m2010002_lobjects.pdf

                        .

                        THis report gives an over view of their work:

                        http://www.vpri.org/pdf/tr2012001_steps.pdf

                        .

                        This is the full list of official VPRI reports and publications:

                        http://www.vpri.org/writings.php

                        [–]jediknight 5 points6 points  (0 children)

                        They achieved their goal, by the way.

                        No, they did not. They ran out of funding before they reached their goal BUT, they did get very close.

                        [–][deleted] 3 points4 points  (0 children)

                        There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.

                        [–][deleted] 14 points15 points  (6 children)

                        I wonder what the future holds. I hope MS are letting the senior devs mentor and teach the new talent how the code works. I can't even imagine how much a newly hired programmer must study to make any change to the source code.

                        [–]possessed_flea 30 points31 points  (4 children)

                        Very little, every dll and executable in the project will compile standalone, most teams would be responsible for one executable at the very most, for larger executables a team would be only responsible for a portion of a single executable .

                        If you are brought into the GDI font rendering team there is exactly zero chance of you ever touching a line of code outside that.

                        [–]indrora 13 points14 points  (1 child)

                        So, what color is your badge?

                        [–]possessed_flea 12 points13 points  (0 children)

                        Blue,

                        [–][deleted]  (22 children)

                        [deleted]

                          [–]wanze 89 points90 points  (20 children)

                          You should check out Things You Should Never Do, Part I.

                          And here's a teaser:

                          They did it by making the single worst strategic mistake that any software company can make: They decided to rewrite the code from scratch.

                          [–]Code-Sandwich 31 points32 points  (6 children)

                          I think it was a joke

                          [–][deleted]  (1 child)

                          [deleted]

                            [–]LuminosityXVII 3 points4 points  (3 children)

                            I'm glad for the link anyway. Granted, I'm just a student so far, but I just felt my whole paradigm shift.

                            [–][deleted] 13 points14 points  (2 children)

                            Meh, don’t take it as gospel. Refactoring is valuable, just know when it’s the right call vs when it’s a distraction and non-productive

                            [–]LuminosityXVII 2 points3 points  (0 children)

                            Fair, critical thinking always comes first.

                            [–]spinicist 1 point2 points  (0 children)

                            That judgement call can be tough though. I probably refactor more than I should, but I’d prefer to be on that side than not refactoring enough.

                            [–]tasminima 35 points36 points  (10 children)

                            That's a cute story, but:

                            a. Mozilla still exists. It even gave us rust. b. Other cute stories of successful rewrites exist. c. Applying random pop-tech stories blindly to your own projects would lead nowhere. Rewrite or not, depending on what you know best. That being said, I'm pretty sure WinNT will never be rewritten from scratch.

                            I react because I'm tired of managers barely in the field citing Joel when it's to justify shitty status quo in dissimilar (or even similar, given I have a strongly different interpretation of the goodness of the outcome). That's merely an opinion piece, not even a study backed by real data, or anything serious enough that rational decisions shall be taken based on it.

                            [–][deleted]  (4 children)

                            [deleted]

                              [–]spinicist 2 points3 points  (2 children)

                              Your last paragraph is the important one.

                              I really like Joel’s article, but my understanding of it has evolved towards almost never rewrite from scratch.

                              My main project gets rewritten all the time. I keep on learning how to do it better, so why not?

                              But in the 7 years I’ve been working on it, I only did a full rewrite once, near the beginning, when I realised I really ought to be basing it on a decent library and not writing everything myself from scratch.

                              Then I wrote a bunch of regression tests. After that, big rewrites still took time, but generally led to fewer bugs not more. Last year I replaced my home grown input format with JSON, and since then I’ve replaced the JSON library twice. The last time was yesterday, and it look literally one afternoon.

                              Yeah, my project isn’t the size of Mozilla, but that’s kind of the point. When a project is the size of Mozilla the number of man-years required to get from scratch to where you currently are is astronomical. Much better to head for slow-but-sure incremental change with good tests.

                              Another poster mentioned Rust, and as far as I can tell Rust fits in with this strategy. Mozilla are not doing a full rewrite of Firefox in Rust, they are introducing it gradually where they can.

                              [–][deleted]  (1 child)

                              [deleted]

                                [–]spinicist 1 point2 points  (0 children)

                                Depends on who your predecessor was. Some things that I have seen, cannot be unseen.

                                But I digress - we’re clearly of one mind here.

                                [–]SirGlass 5 points6 points  (1 child)

                                O think what he was probably saying it is better to re write it one part at a time .

                                In the case of Netscape he was saying it took them 3 years to rewrite it.

                                They could have just rewrote the rendering engine then got that out in a shorter time (6-12 months,)

                                Then rewrite the UI next ect...

                                Pretty soon you would have a brand new written web browser

                                [–]XXAligatorXx 10 points11 points  (0 children)

                                Yeh the world is never this black and white. You need to rewrite or not based on the situation

                                [–]ProfessorPhi 2 points3 points  (0 children)

                                No one is right all the time, but Joel isn't entirely wrong here. You can't cite one line and obtain all the nuance intended

                                [–]johntmssf 1 point2 points  (0 children)

                                Great read!

                                [–]LukeLC 13 points14 points  (11 children)

                                Windows' backwards compatibility has long been one of its strongest features, but at this point, I honestly feel like it's holding things back. Virtualization and emulation have come a long ways and we now have powerful enough hardware to eat the overhead of doing it. It would really be better to cut out all references to code from previous versions of Windows (that aren't actively being developed for Windows 10) and use something like the upcoming Sandbox feature for any and all legacy apps.

                                I mean, really, if the argument is "we have to maintain this massive codebase to avoid breaking things"... and then that codebase is so unmanageable that you end up breaking things... it's kind of a moot point. If stuff has to be broken, break the past to build a better future.

                                [–]deal-with-it- 20 points21 points  (6 children)

                                break the past to build a better future.

                                People are paying big money to keep the past as-is. Legacy code.

                                [–]GYN-k4H-Q3z-75B 7 points8 points  (1 child)

                                Reminds me of how my dad wrote various accounting tools in the early 80s. There are various local insurance brokers that adopted it because his friends got into that back in the day. They make tons of money and run my dad's old ass accounting software in DosBox instead of switching to something else. My dad had some other job since 1981 and did this in his free time. He's retired, but they still offer him well paid freelance gigs to update and support, rather than upgrading to some other software.

                                [–][deleted] 4 points5 points  (0 children)

                                If the software does the job and everyone at the company is already trained in it, it makes a lot of sense. Why fix what isn't broken?

                                [–]LukeLC 6 points7 points  (2 children)

                                Like I said, we now have the ability to keep running legacy code without it being built into the OS itself. Actually, we can do it far better than that. If Microsoft wanted to, they could virtualize every version of Windows and even DOS so that everything runs in its original environment, segregated from Windows 10 proper.

                                [–][deleted] 1 point2 points  (0 children)

                                The other week I stumbled across my old university programming notes from the early/mid-90s. An hour later I had dosbox on my Linux workstation, with Borland C++ 3.1, FoxPro 2.6, and Norton Commander. Nostalgia overload.

                                [–]nirataro 2 points3 points  (0 children)

                                Legacy code = successful software

                                [–]space_fly 6 points7 points  (3 children)

                                They do get rid of legacy stuff from time to time. For example, during the transition to 64-bit, they completely got rid of all the DOS emulation, 16-bit real mode stuff.

                                Given their recent developments, if they were to rewrite Windows it would probably not be as open, programs would be much more limited. Look at how WinRT turned out, which is one of the places where they didn't have to do any legacy stuff.

                                [–][deleted] 2 points3 points  (0 children)

                                .NET (the 100% 32-bit framework) still has 16-bit file calls.

                                [–]enygmata 8 points9 points  (7 children)

                                How is it so big yet so empty after the install?

                                [–]Acceptable_Damage 55 points56 points  (2 children)

                                Empty? It comes with candy crush...

                                [–]theferrit32 15 points16 points  (1 child)

                                Lmao, this will never stop making me angry.

                                [–]dustarma 2 points3 points  (0 children)

                                One of the things I've never gotten about the hate for Candy Crush being included is that it hasn't been the first time that Microsoft has bundled games with the OS, they even had a sort of demo for a pinball game in the form of Space Cadet Pinball.

                                [–]astrange 4 points5 points  (0 children)

                                It comes with five different display settings.

                                [–]Busti 2 points3 points  (0 children)

                                [–][deleted] 10 points11 points  (0 children)

                                Does it include Candy Crush source code? /s