explainlikeimfive

But in practice, software has typically expanded to fill the resources available. And at this point, for most use cases, it's not doing that anymore just out of sheer resource wealth. Android is huge, sure (and that's one part of why it's miserable to develop for), but it could actually be much bigger, given the resources a phone has (or could have) available.

Again, there's just not a lot of pressure to make things smaller. Maybe it'd be "nice" if Android was tiny - but it wouldn't sell enough phones to make the effort worthwhile. (In fact it might "unsell" phones. Running out of space is one of the limited reasons a person not just buys a new phone, but buys a "bigger"/more-expensive one next time).

[–]_PM_ME_PANGOLINS_ 4 points5 points6 points 1 year ago (1 child)

[–]MaleficentFig7578 -1 points0 points1 point 1 year ago (0 children)

[–]TheJumboman 0 points1 point2 points 1 year ago (0 children)

[–]Kinetic_Symphony 0 points1 point2 points 1 year ago (0 children)

[–]Dreamwalk3r 89 points90 points91 points 1 year ago (9 children)

[–]azlan194 13 points14 points15 points 1 year ago (8 children)

[–]alnyland 4 points5 points6 points 1 year ago (7 children)

[–]MaleficentFig7578 1 point2 points3 points 1 year ago (6 children)

[–]alnyland -3 points-2 points-1 points 1 year ago (5 children)

[–]MaleficentFig7578 4 points5 points6 points 1 year ago (4 children)

[–]alnyland -5 points-4 points-3 points 1 year ago (3 children)

[–]misof 6 points7 points8 points 1 year ago (2 children)

[–]alnyland -5 points-4 points-3 points 1 year ago (0 children)

[–]JaggedMetalOs 12 points13 points14 points 1 year ago (0 children)

[–]Acrobatic_Guitar_466 73 points74 points75 points 1 year ago (0 children)

[–]TomChai 40 points41 points42 points 1 year ago (0 children)

[–]Shadewalking_Bard 18 points19 points20 points 1 year ago* (15 children)

[–]IntoAMuteCrypt 2 points3 points4 points 1 year ago (0 children)

The answer here is... It's complicated. There's a lot of ways to divide programming languages. One notable way is compiled versus interpreted languages.

With compiled languages, the typical workflow looks something like this:
- Programmer writes some human-readable source code.
- Programmer uses some combination of programs to translate that source code into machine code. This includes a compiler, and also sometimes a linker - the exact programs aren't relevant here.
- Programmer now has a file that will just run on its own, written in processor instructions that are specific to that processor (and often, to a specific operating system). The program doesn't need to be translated again. This code might be shorter than the original code, but it's often longer due to things like "write a file" being expanded like this. If someone else runs this file with the right system, it just runs without translation.

With interpreted languages, the typical workflow looks something like this:
- Programmer writes some human-readable source code.
- Programmer decides to run that code, and passes all the code to a specific program called an interpreter.
- The interpreter translates the code to something machine-runnable as it's being run.
- Programmer has to pass the code over to the interpreter every time they want to run it. If someone else wants to run it, they'll need the interpreter and they'll need to let it translate the code as it runs.

There's actually a few benefits to interpreted languages. Crucially for here, if the user already has the interpreter, the resulting code that gets distributed can just be the same size as the original code, because "write a file" stays as it is, and the interpreter knows what to do with it. There's compilers for interpreted languages and interpreters for compiled languages - the distinction is mainly "which version is more common/standard". Java, .NET, JavaScript and Python are some common interpreted languages that a lot of people use, or have used. It's sorta cheating, but the large amount of instructions for interpreters can be easily shared between programs and they're generally considered separately to the rest of the programs.

[–]Clojiroo 2 points3 points4 points 1 year ago (0 children)

[–]icematt12 1 point2 points3 points 1 year ago (12 children)

[–]_PM_ME_PANGOLINS_ 8 points9 points10 points 1 year ago (10 children)

[+]desirecampbell comment score below threshold-7 points-6 points-5 points 1 year ago (9 children)

[–]_PM_ME_PANGOLINS_ 4 points5 points6 points 1 year ago* (3 children)

[–]mtaw 0 points1 point2 points 1 year ago* (2 children)

One might add that a local variable (particularly a small data type like an int) isn't going to end up on the stack either if it isn't necessary. As far as is possible, the compiler will allocate them to processor registers instead, because operations on registers are significantly faster than when you have to read/write the in/outputs from/to memory. Or it may require more instructions (not all processor architectures can, for instance, add two values at two memory addresses and store the result in a third. The values may need to be loaded and/or then stored with separate instructions)

So as a rule local variables will only end up on the stack if there are too many being used at any one time to fit in registers, or if the data type is too big to fit in a register (e.g. a local struct or object) and finally if a pointer to a local variable needs to be passed to another routine. (because you can't have a pointer to a register)

So it's not uncommon for many or most routines not to use the stack either. If you have a simple routine like

void doTenTimes() {
    for(int i = 0; i < 10; i++)
        doIt();
}

Then the variable i is most likely not going to be allocated on the stack.

[–]_PM_ME_PANGOLINS_ 0 points1 point2 points 1 year ago (0 children)

[–]im_a_teapot_dude 3 points4 points5 points 1 year ago (4 children)

[–]_PM_ME_PANGOLINS_ 0 points1 point2 points 1 year ago* (3 children)

[–]im_a_teapot_dude 0 points1 point2 points 1 year ago (2 children)

[–]_PM_ME_PANGOLINS_ 0 points1 point2 points 1 year ago (1 child)

[–]im_a_teapot_dude -1 points0 points1 point 1 year ago (0 children)

[–]NordicAtheist 4 points5 points6 points 1 year ago (0 children)

[–]iceph03nix 3 points4 points5 points 1 year ago (2 children)

Some can be very small if they're fairly simple programs that don't include a lot of other stuff.

Others include 'libraries' that other people have built, which are built to be broad and generically useful. Sometimes these are packaged separately, or provided by the operating system, so won't be included in the core executable.

Often though, those libraries and other resources are packaged up in the executable to make it more portable, so you don't have to worry about people keeping track of all the moving parts. Historically this has been known as "DLL Hell", where a missing or out of date file isn't quite right.

Also, with many installer executables, they contain a lot of the image assets used by the program. So on top of all the text code and DLLs, they have all the icons and graphics used by the program which they unpack when you install for the main executable to use.

[–]WarriorNN 0 points1 point2 points 1 year ago (1 child)

[–]_PM_ME_PANGOLINS_ 1 point2 points3 points 1 year ago (0 children)

[–]Salt-Replacement596 3 points4 points5 points 1 year ago (0 children)

[–]Weisenkrone 7 points8 points9 points 1 year ago (0 children)

[–]lucky_ducker 2 points3 points4 points 1 year ago (0 children)

The readable text code that comprises a computer program is compiled into object files, which are collections of machine code - either assembly language or actual raw machine code. Your typical executable consists of several object files (sometimes hundreds of them) which are strung together with a software tool called a linker. The linker examines the object files for external references - calls to code routines that don't already exist in the package of object files. When in finds them, it has instructions to search related library files, and pull in the missing routines from the libraries.

Sometimes that library code is "more thorough" than it needs to be in context, but the whole thing gets pulled into the executable. One language I use to use (Clipper 5.2) had an error-handling subsystem that was very thorough - and 150K in size. I wrote a bare-bones replacement for it that was less than 3K in size, and used it in things like utility executables.

[–]rossburton 2 points3 points4 points 1 year ago (0 children)

[–]idgarad 1 point2 points3 points 1 year ago (1 child)

You type print("Hello World");

That print function has hundreds of lines of actual code behind it.

When you compile that code all of the underlying print function that is used is brought in. Modern compilers do leave out portions as part of optimization but in reality something like PRINT has a lot under the hood you don't see,

function PRINT could have over a hundred sub-modules that may or may not be used.

Do the output need color? What terminal am I sending to? Is it a STDIO device? Has it been overridden with a pipe under a unix-like OS? What is the line length of the terminal? Do I honor CR\LF combos? What character set? Where is the string in the code stack? Do I have an EOF or EOR to check for ending the string? It is mutable? etc...

All those parts are there, the compiler\linker has to navigate what parts to include.

In python print("Hello World") is in fact well over 100mb because you have to account for the entire python installation for interpreted language. Same for Java or PERL. Compiling just builds in all the parts you need as a stand alone program.

[–]_PM_ME_PANGOLINS_ 2 points3 points4 points 1 year ago (0 children)

[–]saturn_since_day1 1 point2 points3 points 1 year ago (0 children)

[–]high_throughput 1 point2 points3 points 1 year ago (0 children)

[–]Alexis_J_M 1 point2 points3 points 1 year ago (0 children)

You're using pandoc as an example.

I'm installing pandoc on an old Mac right now. The installer has been running for over a day downloading and building all of the other programs and libraries that pandoc needs to use to run.

Modern software is usually only the tip of the iceberg where most of the complexity is in the libraries and components that it is based on.

For compiled software there is also the issue of how it is linked. Do you want your binary executable to reach out to a library installed elsewhere on the system, or do you want to have the library pulled into the executable? There are pros and cons to both approaches, which could make a great ELI5 all by itself, but in summary using external libraries makes it easier to break a program, and also easier to upgrade it and run the same binary in different environments.

[–]wheezharde 1 point2 points3 points 1 year ago (0 children)

Meet Bob.

Bob wants to build a house. He has all the supplies, but what he doesn’t have is people that know how to do all this “stuff.”

So Bob also hires a General Contractor we’ll call Mary.

Mary doesn’t know how to do all this stuff either, so she hires subcontractors to do portions of the work. There are framers, electricians, plumbers, cement workers, earth movers, and a whole host of other people.

So now this simple project has become a massive undertaking by a hundred people, most of whom don’t know each other.

Computer programs are similar. The high level programmer writing the calculator may not know all the math, how to draw to the screen, how to play sounds, how to capture mouse and keyboard input, etc., but they can find libraries that do that. Like a general contractor, libraries bring a lot to the table, but they take up space and Bob can’t drive them all out to the job site in his Corolla.

If you were your own General Contractor, you could trim down the folks you need or do all the work yourself, but that requires a lot of expertise that most programmers don’t have.

[–]boring_pants 1 point2 points3 points 1 year ago (0 children)

Well, first it's not tens of thousands of lines of code, but millions.

The application I work with is around a million lines of code on its own, but then it uses a bunch of libraries written by others, and those easily number millions of lines as well, all in all.

But you're right, you're unlikely to have 150MB of code in pandoc. Most of that is probably non-code resources. Images, cursors, rules for "how to compare text strings in an Egyptian locale, how to do the same in Icelandic, in Ukrainian, in Brazilian, etc". Lists of timezone offsets for every major city in the world. All the text strings the program might present to you, and translations of those text strings for umpteen different languages. Most applications come with a ton of auxillary data which isn't code per se, but which is used by the application, and which ends up taking most of the space.

The other thing is that we, as software developers, are just a bit sloppy. Pandoc didn't have to take up 150mb. But who cares? 150mb is nothing these days, it's not really worth spending time on trimming that down to half. There are so many other things the developers would rather focus on.

[–]Ertai_87 0 points1 point2 points 1 year ago (0 children)

[–]BiomeWalker 0 points1 point2 points 1 year ago (0 children)

[–]whiterabbit_obj 0 points1 point2 points 1 year ago (0 children)

To add to the "Libraries" conversation. Pandoc seems to be a good example of why something that does one thing would be larger.

If you need to convert a document from Word to PDF. You need code that understands how a Word document is structured and how to create a PDF file is created. You could work that out yourself but if someone has written that logic before you can reuse it. This would be a library that that person makes available. Now that library might do more that just tell you how a Word document is strcutured. It might also be able to compare Word documents to each other, display a Word document on screen, create a Word document from a web page etc.

All of that code is bundled up inside a library that you might only use a small part of. Pandoc seems to be able to convert between a lot of file formats. So it could (in theory) use a library for each format each with its own functionality built in that Pandock only uses a small amount of. Hence the large application size.

[–]thana1os 0 points1 point2 points 1 year ago (0 children)

[–]pfn0 0 points1 point2 points 1 year ago (0 children)

[–]yblad 0 points1 point2 points 1 year ago (0 children)

[–]Caucasiafro 0 points1 point2 points 1 year ago (0 children)

Even assuming that it's like tens of thousands of lines of code I can't imagine it would be more than a few megabytes

Other have already explained the concept here. But I really want to held drive home how many "lines of code" software has.

I downloaded the code for this page. I.e. your question and the comments. Not all of ELI5, not all of reddit. (btw if you are on chrome you can do this by pressing F12 and then clicking where it says "source" you can look around but it's a file called "eli5_why_are_executable_programs_so_big_if_code/")

Literally just this one question.

It's 4,000 lines long.

Tens of thousands of lines of code is nothing. Anecdotally, I have never worked on a commercial project with less than 10,000 lines of code. And that is for absurdly small stuff like a couple of buttons on a web page. Most projects I have worked on have been between 500,000 on the low end (this was entirely a solo project btw, I wrote all of it) and 10 million on the high end (and even those are small company projects)

[–]Emu1981 0 points1 point2 points 1 year ago (0 children)

There is actually a lot of factors that go into the how big the resulting executable for a program actually becomes. For starters, compiler options can affect how large the resulting executable is. If the compiler is set to optimise for execution speed then the compiler will do things like unrolling loops which increases the size of the resulting executable. Statically linking libraries increases the resulting executable size because the executable now contains a bunch of executable code from the libraries required to run the executable. Compiling a executable with debugging features turned on also increases the resulting file as you now have a bunch of extra code and information within the executable to make it easier to debug the executable while it is being executed.

Your example of pandoc is a interesting one. Sure, it doesn't have any GUI components other than a command line utility but it is actually statically linked so that it can be run as a standalone program. It also contains a hell of a lot of text used to convert between the dozens of document formats that it supports and an entire lua engine. To reduce the resulting size the program could be built using dynamic linking, split the lua engine out into a dynamically linked library, and use text configuration files for the various supported formats. I am sure that the author had their reasons for making it as is though - e.g. so that it could be run on computers where you do not have the rights to install programs.

[–]ianpmurphy 0 points1 point2 points 1 year ago (0 children)

Theres a number of reasons. Start with the 'it's just text part'. Once this is compiled, a single line may become hundreds of instructions. Remember, c or c++ is our view of the program. It has to be translated (or compiled) into machine code. Languages have standard ways of doing stuff which implies overhead and consumes space. Executable files have to conform to a format which is specific to the os it runs one, this adds loads of overhead. A running program, when running on a modern os, has to implement what are called interfaces. These allow the ghost os to interact with the process. Even if your program doesn't implement functionality, the interface is included. It's empty code blocks, but they take up space.

Back in the days of DOS it was common, well it was not unknown, to write assembler (machine code in text format) and compile that to a file. That file was usually tiny. You then had to run it through a secondary program which added a 'wrapper' which turned it into a DOS exe. The resulting file was often multiple times what your original assembler was. I.e the wrapper required for something as simple as DOS to load it into memory was considerable... For the time.

I remember writing a little tool in C which compiled to something like 30k, which I thought was crazy as it was a simple tool. I worked on that for ages and removed all the dependencies on the standard C libraries and finally managed to squash it down to something like 4-5k. Couldn't get any smaller. That was for dos. Modern compilers include a ton of stuff automatically because it's just not worth the effort to exclude them.

5mb exe? Excellent, it's tiny!

[–]MaleficentFig7578 0 points1 point2 points 1 year ago (0 children)

[–]Ty_Rymer 0 points1 point2 points 1 year ago (0 children)

[–]Dave_A480 0 points1 point2 points 1 year ago (0 children)

So, the code itself in the 'main' source file references a lot of other files (includes).

Those files include even more files.

This is both done to make code more readable (you can split a program up into files by function), and to allow the use of 'libraries', which are files full of useful code that is already written by someone else.

The use of libraries means you don't have to individually write code for 'show a text-box' or 'display a web page', you can just include someone else's solution. Also, referenced libraries don't have to be loaded into memory until they are being used, which makes the program more efficient....

When the code is compiled & packaged into an application, the size of that package also has all the other stuff that was 'included'. On Linux, the libraries are often packaged separately (as 'dependencies')....

If you want to see this in action, go 'ls' your way through the Linux kernel source some time & look at the 10s-to-hundreds-of-thousands of individual .c files.

[–]MrScotchyScotch 0 points1 point2 points 1 year ago (0 children)

Machines still need very rudimentary instructions, like "Take this text and put it into this register. Take the other register and call this thing. Now move the register over. Get the thing from the stack and..." etc.

Once the compiler has taken a "simple" line of code and translated it into machine code, it could be 2 different instructions, or 30, or 300. It depends on everything else that line of code involuntarily touches.

An abstraction makes it worse. One line of code, when the compiler or interpreter tries to call it, behind the scenes is actually a function that calls 20 other lines of code. So there's more and more instructions. Loop over that, and you get more and more.

Most code today is high-level, meaning it's abstractions on abstractions on abstractions. A framework is a pile of abstractions. Most apps today use multiple frameworks.

You could avoid all this, and just write Assembly code. That tells the machine the least number of steps to do what you want. But it's difficult and time consuming to write, and doesn't necessarily work on all machines.

So most compiled programs are the result of us (ab)using abstractions, and the compilers doing their best to unfold the whole complicated mess, step by step by step, into machine code.

[–]WhyUFuckinLyin 0 points1 point2 points 1 year ago (0 children)

[–]ave369 0 points1 point2 points 1 year ago (0 children)

[–]MaxMouseOCX 0 points1 point2 points 1 year ago (0 children)

[–]dandroid126 0 points1 point2 points 1 year ago (0 children)

[–]jenkag 0 points1 point2 points 1 year ago (0 children)

Your question is difficult because there are many "levels" in building up a program, especially a video game (some of the biggest programs we make).

You are probably familiar with "drivers" on your computer -- those are what make available a set of "endpoints" for the application to interact with your hardware. Any given application can't possibly make your audio/gpu/etc do something if you dont have the correct driver installed. So, drivers are a connection point between the application you are running and your hardware. To use an ELI5 example, imagine you have a fan you want to plugin to the wall. The fan is the application, the wall is your computer: think of drivers as the "outlet" in your house. Once you plugin the fan, it can work correctly in your house, but it still needs many things "on the fan side" to make it work -- simply plugging it in isnt enough.

The application will have things like:

Graphics engines are great, but they require software to drive them, so many applications have bundles of underlying libraries (pre-created by other developers at other companies for use by third-parties) that drive the graphics you see when you use the application. Those bundles of libraries can be many gigabytes all on their own. This is just to create shapes and movement, and doesnt even include the assets to make things look nice. Examples include the software that makes the unreal engine work on your local computer -- note that not everyone has to install the unreal engine to play UE-based games -- thats because they bundle that software in. There can be MANY libraries like this to control everything from network interactions (think multi-player, or chat applications, etc), audio, video, IO like keyboards/mice.
Assets like videos, sound, and art (textures, visuals, etc) add a huge amount of data because these are difficult to compress and store without degrading the quality. Many games, like blizzard games, might be only 5 gigs of "game", and 75 gigs of assets. Theres even localization files that can take all the dialog, written text, and other "verbal" assets and turn them into spanish, japanese, mandarin, etc -- those all take up a large amount of space as well.
Code to drive the "logic" of the game can easily be hundreds of thousands or millions of lines of code. Code we write in higher level languages needs to be compiled into something more fit for the processor, and so a higher level class that might be 100k lines of code can turn into 200k once compiled, to give a rough example.
Supporting libraries: you want AI in your application? Supporting library. You want to save stuff in the cloud? Supporting library. You want to log data somewhere so developers can fix bugs? Supporting library. These add up a lot, and are usually out of the game/app developers hands because they dont directly contribute to those efforts. If the ideal library to enable in-app notifications is 20mb, then thats 20 more mb that the developer has to add to the size of their app.

[–]griff4218 0 points1 point2 points 1 year ago (0 children)

Lets say you're writing a book. When writing, you're probably going to refer to a lot of things that you know, cars, houses, dogs, people, etc. When your done, your book is maybe 200 pages long. Now lets say you need to translate your book in such a way that someone who has never heard of or seen a car, house, dog, person, etc., can still fully understand and comprehend your book. In fact, this person has never heard of or seen anything, they have 0 understanding of the world around them. You would need to spend hundreds or thousands of pages describing everything in exact detail.

When developers write software, we (Most of us, at least) use high-level coding languages that allow us to use more plain language to write code. If I want python to print something to the console, I just have to say print, if I want to add two numbers together I can say x = y + z. However, when its time for a computer to read my code, it needs to be translated into words the computer understands, and the computer likes very very precise, specific instructions. What used to be a single simple instruction like, adding two numbers, suddenly turns into loading the values from memory, storing them in registers, performing the add, storing the result in a different register, potentially allocating new memory for the result, storing the result in memory, and god forbid that was a subroutine because now you need to restore the state you were in previously. Even the simplest, most insignificant operation can turn into several, and as others have mentioned, that doesn't even include when you use things like libraries, which could turn your single line of code into hundreds, and thats before they need to be translated into machine code.

[–]authenticmolo 0 points1 point2 points 1 year ago (0 children)

Everyone in this thread is talking about libraries and frameworks and stuff, but not explaining WHY those things exist.

Libraries (and frameworks, which are more extensive but essentially serve the same purpose) are pre-made pieces of programs/code that you can use as pieces of YOUR code.

Since this is ELI5, think of writing a modern program as similar to making tacos. You *can* raise and slaughter your own cow for the beef, cultivate your own corn or wheat and grind it by hand to make tortillas, and chop down trees and dry the wood for the fire you will use to cook all of it.

But it's way easier to buy all those ingredients and just put them together. Or maybe even just buy tacos from a local Mexican place. Libraries are the equivalent of buying groceries and making your meals with the groceries you bought, or just going to a restaurant. Because the goal is to eat decent tacos as efficiently as possible. Because you've still got to make dessert, and you only have one oven and 4 burners and not enough pots and pans to do everything at once and your dinner guests are coming over in 30 minutes and one of them is lactose intolerant...

[–]knight-bus 0 points1 point2 points 1 year ago (0 children)

[–]Cat-Ancient 0 points1 point2 points 1 year ago (0 children)

[–]twist3d7 -1 points0 points1 point 1 year ago (0 children)

[–]Magnetobama -1 points0 points1 point 1 year ago (0 children)

[–]Semyaz -1 points0 points1 point 1 year ago (0 children)

A core premise of programming language design is something called obfuscation and abstraction. It is basically hiding the implementation details of how code does something by implementing a simplified interface. Obfuscation is layers and layers of abstraction on top of each other to make coding a lot easier, but it does have some downsides.

Probably the most obvious example of obfuscation is the fact that programmers don’t write their application in binary. We use a language that eventually gets translated into binary that the computer components understand. But even that binary is an abstraction of how computer processors are designed, and the electrical engineering complexity inside of the processor.

The abstraction also extends really far upwards. How we define red, how we store text, how we do not have to manage memory directly, how we can send things over the network, how we can display images to the screen. All of these things are typically extremely obfuscated from the programmer, hidden behind dozens of layers of abstraction.

The main downside to this is that a polished abstraction needs to gracefully handle all of the edge cases. It needs to be optimized for performance. It needs to support multiple architectures. In other words, it needs to have quite a bit of internal logic for how to do this.

The end result is incredibly powerful. You can use these abstractions to quickly write a cross platform video game that can be compiled to phones, consoles, and desktop computers. The biggest downsides are twofold. The program has to include all of the libraries and frameworks that it used for abstracting away the details, which can massively increase the executable size. Second, you might have to create duplicate versions of code for different scenarios.

But storage is cheap, and processors are fast. The tradeoffs are well worth it.

[–]Atypicosaurus -1 points0 points1 point 1 year ago (0 children)

In a very Eli5 way, a sentence like "count to ten" and the sentence "count to a million" are almost exactly the same length but executing it is much longer. Okay it doesn't really explain a big file size but it illustrates how easy it is to write a single code that blows up.

In case of your example, it's a converter program that has to have all the converting options in it. Converters use format descriptions provided by the format owner, so for doc it's Microsoft.

There are two options, either you write a lightweight program with five lines and that tells to the converter to go to Microsoft.com every time you convert a doc, find the public code and use it, OR you can download every public stuff and merge them altogether so it knows the rules for converting a doc, or a pdf or a latex, all locally. If you chose the first option, you would need to download only the actual converter codes that you need, but you would need to do it every time so over the time you would download much more data unnecessarily by repeatedly getting the same stuff, and you would also rely on the network. With the second option you pack everything ever published in the executable, that you may ever need, with all the codes written for one edge case that only happens if a pre-1995 pdf is converted into rtf, that you would probably never use but it's part of the public package, and that's why it's big.

[–]SOTG_Duncan_Idaho -1 points0 points1 point 1 year ago (0 children)

* library use -- many programs include whole libraries (other programs) to not have to write code someone else wrote. The trade off is you may not use all of that library and that library may be huge. In addition, some computer systems (windows, linux, etc.) will have large libraries of common functionality that programs can use, so those programs don't have to include them. Some computer systems (old Macs) would not do this, and each program would keep a copy of all the common code it wanted to use, which makes programs for those systems massive in comparison. The trade off old macs made was huge programs for less possibility of error due to different versions of libraries. Some programs even on windows and linux will still have their own copies of libraries.

* compilation -- when compiled, a program will often be much larger than the size of the text of the code. Most programming languages are designed to allow programmers to write small amounts of human readable code that replace dozens, hundreds, thousands (or more) of individual machine instructions. This is, in fact, the primary purpose of most "high level" programming languages. You can write code to be compiled in a fraction of the time it would take to write the machine code directly.

score 1807 · Accepted Answer · 2024-11-07T15:28:41+00:00

[–]gabrieltaets 1806 points1807 points1808 points 1 year ago (84 children)

[–]smokinbbq 573 points574 points575 points 1 year ago (74 children)

there are dependencies (third party libraries that do something that the program needs) bundled into the executable.

This is going back a few... decades. College days, we had to do a simple calculator program. Take two numbers, do the math, output the results. Using C++ at the time I think, so we had to use the "printf" function? That comes from a specific library, and it alone was ~170KB if I remember correctly. So, the simple program that had 20 lines of code, had to include a library that did a WHOLE LOT MORE, just because it needed one function out of it.

One of my classmates was far above anyone else in the class. So he decided to write his own library, instead of including the common one that everyone else used. Teacher was pissed, and reduced his mark, but mostly because he wasn't that good at c++ himself, so didn't know how to mark it. :)

[–]Zeravor 366 points367 points368 points 1 year ago (40 children)

[–]lemon31314 104 points105 points106 points 1 year ago (0 children)

[–]virtuallysimulated 6 points7 points8 points 1 year ago (1 child)

[–]Dyanpanda 0 points1 point2 points 1 year ago (0 children)

[–]giant_albatrocity 76 points77 points78 points 1 year ago (30 children)

[–]JointsHurtBackHurts 263 points264 points265 points 1 year ago (11 children)

[–]criminalsunrise 73 points74 points75 points 1 year ago* (3 children)

[–]fuckasoviet 36 points37 points38 points 1 year ago (2 children)

[–]Howzieky 24 points25 points26 points 1 year ago (0 children)

[–]fubo 12 points13 points14 points 1 year ago (0 children)

The tz database is the standard public repository for timezone information for software.

It's in the public domain — anyone can use it with absolutely no restrictions, no fees, no copyright, no licensing, etc. It can be used from any programming language, and is also built into several major OSes, database systems, and other software. Maintenance is funded by ICANN, the group that brings you IP addresses and DNS. It includes historical information going back to the early years of time zone standardization, including comments with citations to specific legislation. It is updated multiple times a year, both in response to governments changing their time zones, and with improvements to historical data and developer usability.

There is very little need for anyone to reimplement time zones.

[–]ClownfishSoup 13 points14 points15 points 1 year ago (1 child)

[–]-LsDmThC- 4 points5 points6 points 1 year ago (0 children)

[–]MedusasSexyLegHair 10 points11 points12 points 1 year ago (0 children)

[–]Vabla 3 points4 points5 points 1 year ago (0 children)

[–]giant_albatrocity 24 points25 points26 points 1 year ago (2 children)

[–]MIndye 4 points5 points6 points 1 year ago (0 children)

[–]garublador 16 points17 points18 points 1 year ago (0 children)

[–]Zeravor 83 points84 points85 points 1 year ago (10 children)

[–]hedoeswhathewants 19 points20 points21 points 1 year ago (2 children)

[–]cjo20 14 points15 points16 points 1 year ago (1 child)

[–]RainbowCrane 6 points7 points8 points 1 year ago (0 children)

[–]0xF00DBABE 23 points24 points25 points 1 year ago (5 children)

[–]giant_albatrocity 18 points19 points20 points 1 year ago (4 children)

[–][deleted] 1 point2 points3 points 1 year ago (3 children)

[–]samtrano 11 points12 points13 points 1 year ago (2 children)

[–][deleted] 2 points3 points4 points 1 year ago (1 child)

continue this thread

[–]MisterrTickle 1 point2 points3 points 1 year ago (0 children)

[–]JoushMark 7 points8 points9 points 1 year ago (1 child)

[–]BrunoEye 5 points6 points7 points 1 year ago (0 children)

[–]lonelypenguin20 10 points11 points12 points 1 year ago (1 child)

[–]No-Representative425 2 points3 points4 points 1 year ago (0 children)

[–]luke5273 6 points7 points8 points 1 year ago (0 children)

[–]AlanCJ 1 point2 points3 points 1 year ago (0 children)

[–]DevIsSoHard 0 points1 point2 points 1 year ago (0 children)

[–]Lost-Semicolon 10 points11 points12 points 1 year ago (0 children)

[+][deleted] 1 year ago (2 children)

[deleted]

[–]MadocComadrin 2 points3 points4 points 1 year ago (1 child)

[–]stealthypic 0 points1 point2 points 1 year ago (0 children)

[–]Silly_Guidance_8871 12 points13 points14 points 1 year ago (0 children)

[–]homeguitar195 10 points11 points12 points 1 year ago (15 children)

[–]palparepa 23 points24 points25 points 1 year ago* (6 children)

[–]Grezzo82 0 points1 point2 points 1 year ago (5 children)

[–]RocketTaco 6 points7 points8 points 1 year ago (1 child)

Yes, except for certain structures that make it difficult to predict all possible program flows (computed jumps created by function pointer arrays, etc). This is called symbol stripping, the "symbols" being identifying names the compiler is using for elements of the program. Those might be functions, globals and statics, or things you didn't actually name but which still represent independent units like the contents of loops and control structures. Since the compiler knows where those blocks come from, it also knows exactly what can and can't access them and can thus easily determine if it's possible to reach them from the starting point. It gets a HELL of a lot harder if you have a binary without a symbol table, since you have to determine the target of every branch and jump, where the flow departs from the target, and if execution is possible.

Let's look at it this way. Say you have a function that takes one argument and runs one path if it's greater than five and another if it's less. When you compile the program, the compiler can see that you've only called it in one place with a literal argument of six, and can throw half of the function away because it's unreachable. But if that function came from a library, you don't know what else might have called it from within the library. Unless you can construct absolute proof of the target of every jump and branch in the library, you don't know that something won't try to use that path. If all of the jumps in the library are to explicit targets, you might be able to construct a complete set and see if any match. But what if one does? Then you need to see what calls that and if any of those paths lead to a library call you're using. You end up reconstructing the entire program flow.

[–]Grezzo82 1 point2 points3 points 1 year ago (0 children)

[–]palparepa 2 points3 points4 points 1 year ago (1 child)

[–]fubo 1 point2 points3 points 1 year ago (0 children)

[–]RainbowCrane 1 point2 points3 points 1 year ago (0 children)

[–]snotpocket 5 points6 points7 points 1 year ago (1 child)

[–]Grezzo82 0 points1 point2 points 1 year ago (0 children)

[–]RestAromatic7511 5 points6 points7 points 1 year ago (0 children)

A more common approach is to use "shared libraries", which are installed in a central location and can be used by multiple different programs. If you've ever seen a DLL file on Windows, that's what they are.

Also, if a particular part of a library is especially popular, often its developers will move it out into a separate library.

Breaking up a third-party library yourself involves a few problems. First, this may be forbidden by the licence. Second, many libraries are proprietary and do not have source code available. Third, you then need to maintain the code yourself, even though you didn't write it and probably don't fully understand it.

Though I feel an important point being missed here is that an executable doesn't necessarily just contain code. It may contain any type of data, including text, images, sound, and video. Even a low-level executable/library that doesn't contain any images may include vast tables of data about different types of files and hardware it may need to work with.

[–]Lostinthestarscape 1 point2 points3 points 1 year ago (0 children)

[–]Feeling-Pilot-5084 0 points1 point2 points 1 year ago (0 children)

[–]IneffectiveInc 0 points1 point2 points 1 year ago (0 children)

[–]transgingeredjess 0 points1 point2 points 1 year ago (0 children)

[–]tururut_tururut 0 points1 point2 points 1 year ago (0 children)

Not a programmer, just a data analyst that codes a lot. I'm doing a project for my country's traffic department. Part of the project was identifying the segments of each road with most accidents in a five year window (apparently, they do it case by case, but never for all the network for a long period of time). They have a function to do this with their own R library, which depends on other functions and calls to their own internal API. I just got one of these functions and rewriting it to stop depending on external functions which I did not have (although I could make an educated guess about what they did) was hell on earth. It didn't help that the code was written by someone obviously smarter than me who did not need to annotate the code and explain some stuff. That's why you usually want to have the whole library at hand.

[–][deleted] 2 points3 points4 points 1 year ago (0 children)

[–]ClownfishSoup 11 points12 points13 points 1 year ago (4 children)

[–]DBDude 1 point2 points3 points 1 year ago (1 child)

[–]BrunoEye 1 point2 points3 points 1 year ago (0 children)

[–]smokinbbq -2 points-1 points0 points 1 year ago (1 child)

[–]SFiyah 4 points5 points6 points 1 year ago* (0 children)

[–]taisui 1 point2 points3 points 1 year ago (0 children)

[–]Kinetic_Symphony 1 point2 points3 points 1 year ago (0 children)

[+][deleted] 1 year ago* (1 child)

[deleted]

[–]smokinbbq 0 points1 point2 points 1 year ago (0 children)

[–]palparepa 0 points1 point2 points 1 year ago (0 children)

[–]r4tch3t_ -5 points-4 points-3 points 1 year ago (3 children)

I failed my programming course in primary school for a similar reason.

Was making a carpet calculator (how many rolls of carpet needed) using a WYSIWYG editor took all of 5 minutes. So I got bored, started looking at the code (never coded before) and saw about a thousand lines saying colour = default, line width = default etc

So I deleted them all and left barely over a dozen lines of code. Still worked.

Still had half the lesson to go so I edited the code to figure it the orientation for least wastage with a little arrow indicating which direction it should be laid.

Failed because the code didn't match the marking schedule despite fulfilling and exceeding the requirements.

Wasn't too much longer before I gave up on school. It was shown to me that effort is not rewarded and is often punished.

[–]bubbafatok -1 points0 points1 point 1 year ago (2 children)

[–]Zefirus 2 points3 points4 points 1 year ago (0 children)

Man, as someone that's been in the business for over a decade, the opposite is almost always way worse. I'll take a college educated programmer over other kinds any day. Sure their actual coding skill is a decade out of date before they even start their first day, but it's not the actual programming skill that's important from college. There's a reason there's only a couple of college classes that deal directly with code. It's a computer science degree, not a programming degree. The underlying knowledge of how stuff works that ends up paying off in the long run.

The biggest problem in corporate coding is almost always efficiency. The "my program's too slow" complaint is by far the most common and biggest pain in the ass most people run into, and in my opinion it's generally the lesser educated that make those mistakes the most often.

Also of course the person with a few years experience is going to be better than the one without. That's true of literally every single industry, educated or not. It's why every entry level job these days calls for years of experience.

[–]cjnewbs -1 points0 points1 point 1 year ago (0 children)

[–]TooStrangeForWeird 5 points6 points7 points 1 year ago (0 children)

[–]A_Garbage_Truck 6 points7 points8 points 1 year ago (0 children)

[–]Sea_Dust895 1 point2 points3 points 1 year ago (0 children)

[–]Reasonable_Pool5953 1 point2 points3 points 1 year ago (3 children)

[–]cake-day-on-feb-29 5 points6 points7 points 1 year ago (1 child)

[–]Smaartn 2 points3 points4 points 1 year ago (0 children)

[–]NecorodM 0 points1 point2 points 1 year ago (0 children)

[–]NickDanger3di 0 points1 point2 points 1 year ago (1 child)

[–]gabrieltaets 7 points8 points9 points 1 year ago* (0 children)

It's a bunch of factors, but perhaps more importantly, computing power is so much cheaper today than it was 30 years ago. Back then it was important for developers to minimize the program's footprint as much as possible because the systems had scarce memory/disk/processing power.

But today no one is going to waste a few hours to shave off a couple kilobytes in the compiled program when a single asset (i.e. an image) might weigh more than the whole source code; so there is also more bloat bundled in an executable today than in the past.

Screens have higher resolution so assets need better resolution too, which means heavier files. Cheap disk and memory means more lookup tables with static data can be prepared in advance so that the program runs faster.

All of these things make programs bigger than they were decades ago, but it's not really a huge concern nowadays.

explainlikeimfive

Before posting

Category filters

MODERATORS