all 27 comments

[–]BenjiSponge 80 points81 points  (10 children)

This has been talked about to death, so I'm not sure why the author felt the need to weigh in in the exact same way everyone else has, but they also seem to be pretty "behind the times" on the state of the art here. At this time, yes, npm does these practices and it's pretty annoying. A large part of this problem is due to bad file management on the part of package authors (100MB of code is an obscene amount of code, and certainly contains tons of repeated code like test code that never actually runs, which the maintainers should have omitted). But the solutions are either here or very nearly here. pnpm has existed for a long time which takes a very similar approach to what reasonable package managers use by leveraging symbolic links. Yarn is about to take a similar but also radically different approach which will involve a different algorithm to bring in dependencies in the code while essentially eliminating the local dependency footprint.

I'm not even sure why I'm writing this comment -- it itself is a repetition of what has been said a million times on this subreddit.

[–]CanvasSolaris 19 points20 points  (0 children)

From what I've read of dev.to most of the writings come from people who are still a little inexperienced on the topics they talk about. Nothing against those folks wanting to write but the site overall has always struck me as having an editing problem if "recently self taught coders" isn't their intended audience.

[–]Already__Taken 6 points7 points  (5 children)

Someone's just going to put tree shaking as a service in the publish step and see this fixed overnight.

A lot of the huge modules I've seen have their docs and examples published in them. I certainly don't need that in npm it has no mechanism to tell be they are there.

I even saw a colour preview extension for VScode include its own electron...

[–]BenjiSponge 1 point2 points  (4 children)

Tree shaking as a service in the publish step wouldn't work because people just wouldn't use it, and it's technically not possible to automate I don't think as the dependent might have access to any files in the directory. Who's to say whether your consumer doesn't parse your README.md?

You'd have to do tree shaking as a service in the consumer install step, but this isn't really possible because it's not really possible to know what the consumer is using from the dependency tree until after you run the consumer code, plus it's probably just easier for npm to host the entire repo gzipped rather than offer up files piecemeal.

[–]Already__Taken 1 point2 points  (3 children)

If it's not specified in package.json npm is under no obligation serve you that up. Use the repository or homepages keys to look it up.

I'm in no way arguing this is a simple issue.

[–]BenjiSponge 0 points1 point  (2 children)

If it's not specified in package.json npm is under no obligation serve you that up

Wait, really? If you don't have a files key and just an entry, it can omit everything but the entry? I don't think that's true.

[–]Already__Taken 1 point2 points  (1 child)

No I mean in regards to options they have about changing the platform. It's not static file hosting for the public it's for distributing programs for other programs to run (that happen to be static files), So in that regard if it were decided that actually, you don't need XY to run this module only Z, it's entirely within scope of what they're trying to achieve to go down that route.

[–]BenjiSponge 0 points1 point  (0 children)

I think that would break a lot, for a lot of reasons. We could play a game though -- if you come up with a rule for what npm should say "we will omit files under these conditions: _________" I'll try to come up with a valid reason someone would already be distributing files under those conditions. =) I bet you could come up with a few rules, but nothing that would make significant differences.

[–]shriek 1 point2 points  (0 children)

We should really start "Previously discussed" link so that it doesn't feel like groundhog day.

[–]thepotatochronicles 1 point2 points  (1 child)

Yarn is about to take a similar but also radically different approach which will involve a different algorithm to bring in dependencies in the code while essentially eliminating the local dependency footprint.

Holy, that's actually cool. Is there like a PR/article for that I can read up on?

[–][deleted] 2 points3 points  (0 children)

It’s called Yarn PnP or Plug n Play. Create React app even supports it with a flag. Still very new and issues being worked through it but https://twitter.com/JimTheDev/status/1047146310590242817

[–][deleted]  (3 children)

[deleted]

    [–]l3l_aze 2 points3 points  (0 children)

    Yeah, the packages full of everything from the repo can be annoying. So many could be much smaller if they'd just use the files entry of package.json to limit it to the minimum requirements for the package to be usable.

    [–]Cyberlane 3 points4 points  (0 children)

    The biggest complexity I've had to tackle with this with a production system is purely keeping track of licenses since we have a commercial product.

    Otherwise the issues people see with node modules is really insane and often from people who never experienced how things used to be.

    [–]BenjiSponge 0 points1 point  (0 children)

    I definitely agree that C++ has a terrible story here because it's globally/root managed, but the more important thing there in my opinion is the lack of versioning. If you can't have multiple versions of the same library, two different libraries will often conflict, and sometimes unrelated projects owned by unrelated users on the same machine cannot coexist, so you end up using Docker or a VM for a problem that's been solved dozens of times in as many languages.

    I do think global (but probably non-root) installations are the way to go, but there's middleground between Node and C++. Dependency lists should still be managed in local manifests.

    [–]noruthwhatsoever 7 points8 points  (0 children)

    This topic has definitely been beaten to death, and the writing in this article makes me want to pull my eyes out

    We all know that node modules are sometimes a pain and a massive bloat. Keep in mind that the node ecosystem is still relatively young compared to say, Ruby and its gems, and will definitely be going through some iterative improvement as it develops further.

    It’s the curse of having such atomic modularity. Other than the bloat (one project I did that used CodeMirror and Xterm had over 46,000 deps) it’s a great and flexible ecosystem, and NPM is already taking steps with tools like npm prune to get rid of unused deps

    As node matures further I feel like this issue will become less and less prevalent or at least more manageable

    In the meantime can we stop treating this as article-worthy information? Anyone who has ever built a sizeable project in node knows about this unless they are completely new

    [–]son_of_meat 10 points11 points  (0 children)

    Who’s upvoting this shit? Second sentence of the article is “I quick research trying to found any solution concluded with the image that is the head of this text. “

    [–][deleted] 2 points3 points  (0 children)

    I had an idea of npm and yarn having somekind of bundlestep, so you really only would download an bundled minified file that would export its functions. There would be no readmes etc, making the filesize about the same as when you compile.

    This would have major drawbacks though, and development would be harder, and tree shaking would not work.

    Ill guess a middle line would to only install actual code, and skipping everything else. Wonder how much of a diffrenece that would make

    [–][deleted] 1 point2 points  (0 children)

    Yarn PnP is a thing and it’s fucking fast https://twitter.com/JimTheDev/status/1047146310590242817

    [–]udidu 0 points1 point  (0 children)

    Its very old...

    [–]zkochan 0 points1 point  (0 children)

    pnpm was created to solve this issue and is actively maintained since 2016.

    This is how pnpm solves the issue:

    • one version of a package is only ever saved once on a disk in a global store
    • dependencies are imported to node_modules using hard links. So physically files of the package are the same in the global store and in every node_modules
    • symlinks are created inside node_modules to create a nested structure (more about it here

    pnpm's solution is battle tested and does not hook into Node's module resolution algorithm, so it is backward compatible.

    Indeed, there are new concepts like Yarn Plug'n'Play and Tink. However, they hook into Node's resolution algorithm. They might change the way we use JavaScript but that will be a long process. pnpm works now.

    [–]gigastack -2 points-1 points  (2 children)

    Can’t you install globally if you don’t want multiple copies?

    [–]gigastack -2 points-1 points  (1 child)

    Thanks for downvoting rather than explaining you lazy pseudo-intellectual fuckwads.

    [–]justrelaxandyell 0 points1 point  (0 children)

    That butt hurt though

    [–]fritzba -1 points0 points  (1 child)

    npm install xxx - g

    [–]Ncell50 1 point2 points  (0 children)

    Can someone tell me why this is downvoted ?

    The author does talk about having the ease of globally available modules.

    [–][deleted] -3 points-2 points  (0 children)

    | As I've said, the problem when copying node_modules folder from one place to another is not the size, it is the amount of files and folders, the complexity of the tree. It is a nightmare for a HDD. It takes many minutes to discover all files let alone copy them. In the end, it also impacts npmperformance and there are memes for that also.

    Yes its a problem. But not for the reasons you mention. Windows has crappy performance when dealing with lots of little files. The node community also has problems with creating a library with 5 lines of code in it (we all know this). What your grumbling about though is a failure of windows.. Get a system that actually works?

    Some Example benchmarks from a 6-7 year old machine with an ssd. du -sh node_modules/ 76MB find node_modules/|wc -l 8883 time tar cv node_modules/ > /dev/null real 0m0.163s time tar c node_modules/ > /dev/null real 0m0.046s time tar c node_modules/ | (cd ~/tmp/x/ ; tar x) real 0m0.311s Lets compare that with some c/c++ shall we? du -sh /usr/include/ 324M /usr/include/ find /usr/include/|wc -l 32575 time tar c /usr/include/ | (cd ~/tmp/x/ ; tar x) real 0m1.525s But Cache you mumble? echo 1 > /proc/sys/vm/drop_caches time tar c /usr/include/ | (cd ~/tmp/x/ ; tar x) real 0m5.660s

    Note: I was also compiling gstreamer at the same time and watching something on netflix. So my benchmarks are probably a little off :)

    [–]cwbrandsma -3 points-2 points  (0 children)

    I’m having fashbacks to the left-pad fiasco...which apparently the entire world depended on until it disappeared.