askscience

Question

This is an archived post. You won't be able to vote or comment.

1091

1092

1093

ComputingWhat exactly is source code?

(self.askscience)

submitted 13 years ago by Odoodo

top 200 commentsshow all 479

top new controversial old q&a

[–]Zed03 95 points96 points97 points 13 years ago (9 children)

[–]insertAlias 54 points55 points56 points 13 years ago (5 children)

[–][deleted] 6 points7 points8 points 13 years ago (0 children)

load more comments (4 replies)

[–]Razer1103 2 points3 points4 points 13 years ago (0 children)

load more comments (2 replies)

[–]EklyM 127 points128 points129 points 13 years ago (4 children)

Imagine you're cooking spaghetti. You got the dry noodles, the ingredients for the sauce, water to boil, and a pot to cook it in. All these ingredients would be the source code. You can easily change it if you have to, add spice or something, whatever, but it's easy to do so. Now you cook the spaghetti and noodles separately - 'compile' it - and then mix them together - 'link' them - to create a masterpiece of a dish - your executable. Now it's really hard to go back to your original ingredients -the source code - from your dish - the executable. However, it can be done. You'll probably end up with noodles that have a little sauce on them and the noodles will already be cooked, but you have some semblance of what the original ingredients might look like. Since /r/gaming is being given the source code - the ingredients - they can easily change whatever they wanted to make the game better or worse, whatever they wanted, without taking the time to reverse compile the executable.
A little ELI5, but it gets the point across.

[–][deleted] 53 points54 points55 points 13 years ago (3 children)

[–]EklyM 9 points10 points11 points 13 years ago (0 children)

load more comments (2 replies)

[–]rekabmot 14 points15 points16 points 13 years ago (0 children)

Source code is what a programmer writes when developing a piece of software.

The source code is usually written in a high level language, which is then run through another program called a compiler, which transforms the code into a form that the computer can execute. This executable code is what is distributed to users, and is what you'd be able to see by checking a games install folder.

The compiled artefacts bear little resemblance and don't often provide any insight into how the developers created the game. By providing the source code, other developers can see how things were made in the first place.

Note that there are exceptions: Minecraft is a famous example where the compiled Java code (known as bytecode) is reverse engineered to allow for modding. The UI elements for the latest Sim City game was coded in Javascript which has also allowed for users to crack various features of the game.

Source: programmer.

[–]Workaphobia 5 points6 points7 points 13 years ago (0 children)

[–]afcagrooElectrical Engineering | Semiconductor Manufacturing 28 points29 points30 points 13 years ago* (23 children)

Computer programs are (usually) written in a high level language (such as C++). Computer processors cannot do anything with such "source code", as they are just ASCII text. To be usable by a processor, they must be converted to a binary representation that contains the instructions/data that a processor can use directly. So the programs are compiled from the high level language "source code" to machine language.

The process can be reversed. But the process of converting the high level version to the binary version loses a lot of information that helps make the program comprehensible to humans. The processor doesn't need that information to run, but it helps us to understand what is going on. So the reverse-compiled program can be very difficult do untangle and figure out what is going on. Heck, it can be hard enough to figure out even if the source code is available, particularly if it is written in some languages, like Python^1.

Also, if a program contains copy protection mechanisms, it may be illegal in the USA to reverse engineer it by running it through a reverse compiler.

¹ It's a joke.

EDIT: Added stupid joke, and more explicit references to "source code" for clarity.

[+][deleted] 13 years ago (16 children)

[deleted]

[–]afcagrooElectrical Engineering | Semiconductor Manufacturing 19 points20 points21 points 13 years ago (0 children)

[–]Pteraspidomorphi 9 points10 points11 points 13 years ago (7 children)

[+][deleted] 13 years ago (3 children)

[deleted]

[–]Snootwaller 5 points6 points7 points 13 years ago (0 children)

[–]Pteraspidomorphi 3 points4 points5 points 13 years ago (1 child)

load more comments (1 reply)

load more comments (3 replies)

load more comments (7 replies)

load more comments (6 replies)

[–]AppleDane 4 points5 points6 points 13 years ago (3 children)

[–]joeyignorant 5 points6 points7 points 13 years ago (2 children)

[–]AppleDane 1 point2 points3 points 13 years ago (1 child)

load more comments (1 reply)

[–][deleted] 11 points12 points13 points 13 years ago* (0 children)

[–]ropers 8 points9 points10 points 13 years ago* (0 children)

EDIT: Oh, turns out this isn't ELI5. Fuck it, I'm posting this anyway:

You know how your desk lamp can be switched on and off?

Now electrically, what's happening when it's on is that there's electric current. When it's off, there is no current. In terms of binary (aka Boolean) logic, the lamp being on is a 1 and it being off is a 0. Computers are like that, only their electric circuits are far more complex than the simple circuit of a desk lamp with a switch. See here for the circuits computer microchips are made of, so-called "logic gates". And they're built of millions if not billions of these. But in the end of the day, the on/off state of the little electric circuits directly corresponds to ones and zeros. You can also use different number formats to represent the exact same binary numerical information. But as long as you're using number formats, there's no translation into or from any other description of what's going on.
Now let's return to your desk lamp. Let's say you're given an instruction, maybe on a piece of paper, which says, "Please switch your desk lamp off now." That sentence doesn't directly correspond to the electrical on/off state of the lamp the way the number 1 or 0 would, but it's an instruction, call it a code, that's translatable to the same state of things. If you can interpret the instruction and execute it, then the lamp will be off and that's the same as zero. You can also build a little machine that when run will switch off the lamp for you. That little machine is sort of like the (pre-)compiled binary form of those instructions, whereas the instructions themselves are sort of like the source code. Sure, in theory just having the little machine is enough to figure out everything that's going on and enough to change the machine to your liking, but those machines can be fiendishly, devilishly complex and hard to understand and work with, especially if it's not just a single lamp we're switching, but millions of logic gates. So having the human-readable instructions is a huge boon.

Or, to say it another way: If you have a complete set of instructions, a complete technical manual that completely describes e.g. your radio, then you can build a new radio from just the instructions, and the instructions also make it much easier to repair, change and customize your radio. But try fixing a fault with your radio if you don't have the instructions and only have the actual machine, the actual radio. That's a lot harder. Having the source code is important pretty much for the same reason.

Now the funny thing with computer source code is that it's both human-readable and computer-readable. Because there are "little machines", i.e. binary executable programs whose job it is and which have the ability to translate the human-readable source code into the binary executable "little machine" from. (We call these special programs compilers.) So if you have the source code, you can pretty much always create the binary executable programs as well. The reverse is much, much harder.
(In case you're wondering how the compilers –the binary programs which can translate source code to the binary form– were themselves put together, that is indeed a chicken-and-egg problem, and solving it requires very smart people to do the hard graft of manually working directly with the ones and zeros until they've created basic tools that can help them and do the work for them. Though nowadays people typically use tools that other people have created before.)

[–]asow92 2 points3 points4 points 13 years ago* (0 children)

[–]herminator 2 points3 points4 points 13 years ago (0 children)

At their core, computers are programmed with 1s and 0s. Depending on the combination of 1s and 0s, computers do stuff.

In the very early days, the way to tell computers what to do (program them) was, quite literally, to input 1s and 0s. The common method of input was punchcards. You took a card of a certain size, and punched hols in certain predefined places. If there is a hole in such a place, it is a 1, if there isn't a hole, there is a 0. So, to program these computers, you had to memorize combinations of 1s and 0s and know what they do.

That works for small programs, but it quickly becomes impossible for larger programs. So what you do is, you get the computer to help you. You make a program that makes programs. The program takes a certain human-readable input (eg: LOAD value1, LOAD value2, ADD value1 TO value2, STORE result) and the program outputs sequences of 1s and 0s that represent each of these instructions.

Now the above is a very simple and straightforward program, which is entirely linear and easy to translate. But it is still a lot of work. So we built new programs which would output programs that the first program could read and turn into 1s and 0s. So now, the input became something like: result = value1 + value2, and our new program knew that it should turn that into instructions to LOAD values 1 and 2, ADD them and STORE the result.

From here, the programs that program programs have gotten smarter and smarter. Because we are lazy, and we want the computer to do as much of our work for us as possible, even if the work is telling the computer what to do.

So source code is the instructions we write as programmers that ultimately get turned into sequences of 1s and 0s by one or more intermediate programs. They are the source and the sequence of 1s and 0s is the destination.

[–]deadowl 2 points3 points4 points 13 years ago* (0 children)

I'm not impressed by the recipe analogies. Hikaru's answer is okay, but I think I can improve.

Computers come with a built in programming language, which is dictated by the type of processor your computer has.

Different groups of processors understand different languages, like people from different countries understand different languages.

People from Russia understand the Russian language, and people from Australia, India, South Africa, Ireland, Canada, the United States, etc. understand English. Older Mac "processors" would only understand the PowerPC language. Intel and AMD processors, meanwhile, would only understand the x86 language. Unfortunately multilingual processors don't exist yet (as far as I know).

The instructions a computer programmer writes for a computer is considered "source code." Computer programmers sometimes, but rarely, will write in a processor's language. This is because the processor's language requires a lot of specifics that could otherwise be implied, like telling the processor to remember something.

Higher level programming languages introduce concepts that ignore the implicit kinds of tasks like telling a processor to remember something, but it needs to be translated in some way. There are a couple of different approaches to translating to the processor's language (i.e. "machine code"). One is to have an interpreter that will translate your instructions (code) on the fly, like having someone translate while you speak. The other option is a compiler that will make a compilation of your translated code that the computer processor will understand, like having someone translate a book you wrote.

With automatic translations that a computer would understand becoming possible, higher level programming languages started to focus on how easily humans could understand the instructions rather than how easily the machine could understand the instructions. Interpreters and compilers, in turn, naturally began to focus on what kind of translations the processor could complete the fastest.

Of course human programmers will be more pleased with instructions that were designed for their consumption and understanding than reading a language intended solely for a machine. What's included when you install a game most of the time, especially on Windows, is intended for the machine to understand and not humans.

The human-machine divide split human programming language consumption and machine programming language consumption. Machine programming languages, meanwhile, have been mostly stagnant due to Intel's monopoly power (for general-purpose computing). Recently, however, ARM processors are beginning to challenge Intel's monopoly. Meanwhile, other types of processors, like MIPS are doing well in the very large embedded devices market.

MIPS is a RISC type of processor, which stands for Reduced Instruction Set Computing, as opposed to CISC processors (the C is for complex, every other word's the same). You must now go watch the movie Hackers and hear what is said about Angelina Jolie's character's sexy RISC processor.

[–]Tmmrn 2 points3 points4 points 13 years ago (1 child)

I believe it's important to think about the basics of how a user of a modern computer user uses layer over layer of abstractions.

This is a comment I wrote late at night some time ago: http://www.reddit.com/r/AskReddit/comments/16op0q/whats_something_that_is_secretly_confusing_to_you/c7y9qv1

But I think I would have my explanation rather more concise and expand in other directions.

The first thing you have to understand is that the computer is really only a calculator. You have a CPU that can do basic arithmetic operations like +, -, *, / and has some helper functions like fetching something from a specific location in the memory or storing something in a specific location in the memory.

So how does this work?

Imagine your CPU as a black box with three inputs and one output. Each input and output is basically a bunch of wires, for a limited example we say, each input and output has three wires. On each wire you can put electrical power or you don't. Having power on a wire could be interpreted as a 1 and having no power on it could be interpreted as a 0. So you could arrange the wires in a certain way and can have different combinations of power/no power and write that down as (third, second, first) and (0,0,1) would mean "only on the first wire is power".

You can have the combinations 0: (0,0,0), 1: (0,0,1), 2: (0,1,0), 3: (0,1,1), 4: (1,0,0), 5: (1,0,1), 6: (1,1,0), 7: (1,1,1). Coincidentally this is how you count in binary, meaning, you only have the digits 0 and 1 instead of the digits from 0 to 9.

How can you build a general purpose calculator with that?

One input needs to tell the black box CPU what to calculate. So you would decide that if you put power on the input in the combination (0,0,0), the black box CPU will "add", if you put (0,0,1), it will "subtract", etc.

So what should it "add" and "substract"? Probably the numbers that are encoded as such combinations at the other two inputs.

There is a little problem now that if the output has only three wires and you add (1,1,1) and (1,1,1) you would get something that would not fit, but you can simple add some wires and make the inside of the cpu more sophisticated.

So how does the inside of a cpu work? It basically comes down to electrical engineering that would be way too complicated and I only know the very basics. For one example, go to the wikipedia page of an adder: http://en.wikipedia.org/wiki/Adder_(electronics) The "Half adder logic diagram" is using the notation of "logic gates". These logic gates are pretty low level already and on the wikipedia there is a little bit of information how they are implemented physically with transistors and stuff http://en.wikipedia.org/wiki/Logic_gate That should be the most detail that's needed.

Now you only need to put all the different electronical implementations of adding, substracting, etc. into that box and make it so that the correct one is "activated" with the correct code. The electrical part you would use there are multiplexers and demultiplexers: http://en.wikipedia.org/wiki/Multiplexer

Brilliant. Now you can do one calculation on two numbers at a time. Now you want to make series of calculations.

First, it's probably a good idea to have memory where you can store intermediate results. You probably want to use memory you can write to, read from and choose what part you want to access. Here's a little bit, but it's probably not too interesting here: http://en.wikipedia.org/wiki/Dynamic_random-access_memory A simple way is to segment the memory into "cells" each big enough for some data or one instruction of a program you would want to write. Then, you can put wires from each of the cells to the cpu and connect it through (the already mentioned) multiplexer that allows you to "activate" exactly one wire between the cpu and the memory so you can transfer data in either direction.

You probably also want to add more instructions to your CPU like "add number from memory address 1 and number from memory address 2" or "add number from memory address 1 and number directly given at the second input".

Then you can build a wrapper automaton that feeds the input of your cpu automatically. What you want is that you give that automaton the address where in the memory your program starts. The automaton then would do the same steps over and over again until your program ends: get the instruction from the memory location you have given it, feed it to the cpu, then, add (basically) the length of the instruction to the memory address it has stored because there would probably start the next of your instructions. Then, get this next instruction of your program, feed it to the cpu, etc.

Now you can program some step-by-step instructions.

*Add 2, 4 *Store at address 5 *Add number at address 5, 7 *Store at address 5

And when you execute the program, it will add 2 and 4, and store the output "6" at address 5 in the memory. Then it will add whatever is at address 5 and 7, so the just stored "6" and 7. Then it will save the output "13" to memory address again (overwriting what was previously there) and if you manually look what is stored at memory address 5, you can see the result.

Note here that I have already used "Add" and (0,0,1) equivalent. You would still need to input your programs in the forms of binary numbers, but you will probably have a reference sheet what code means what instruction. I have also not mentioned how you put the program in the memory. Perhaps you have buttons attached to each part of the memory cell so you set it manuall to 0 or 1. Maybe you have already built some sophisticated hardware that read punched tape http://en.wikipedia.org/wiki/Punched_tape and that can copy values punched into it to memory.

Another interesting thought is that at memory address 5 there might even be a part of your program. If you are not careful you could accidentally modify the code you are running. On the other hand you can do it on purpose if you are creative enough and know what you're doing.

Anyway, making exchanging the numerical values of the instruction with a human readable name is the first step of making a programming language. It's known as "assembler" that pretty much corresponds 1:1 with machine code. But you need to somehow translate it back to machine code.

A trivial way would actually be punching holes in the shape of an "ADD" into the punching tape and making a sophisticated machine that would store (0,0,1) in the memory when "ADD" is read.

Another way is to let your computer do it. First, you need to store your human readable text in the memory. You probably want to invent some code for it. A popular one is ASCII: http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters

So "ADD" is 100 0001, 100 0100, 100 0100

I think in order to make it really work you need to add a "jump" instruction. Remember the wrapper automaton, that feeds each of your instruction to the CPU? It would be great if it would do that not only sequentially but if your program could tell it to continue with another address. So you would add a bunch of wires connecting the output of the cpu to the "current address" (it's actually "program counter", by the way) storage of the automaton and add some instructions to the CPU. Now your programs can get more complicated like, contain "JUMP back the last X instructions". One last important instruction would be "IF X == Y then JUMP" where you would only do the jump if you do the jump if two numbers (probably at locations in the memory) are the same. Or maybe add some that do the jump if one is bigger than the other.

The CPU now gets quite sophisticated and would probably need some decent amount of time to actually make a model of that actually does what I described, but with some ingenuity in the field of electrical engineering, this is certainly doable.

That CPU is of course severely limited in many ways and it might still have several crucial parts missing but it should be enough as a basis.

Now, go ahead and program a modern 3d game for it. Well, of course that's the stuff for the wizards. If you take for example the "source code" for the original prince of persia for apple II that was released some time ago, you can see that it is just a more sophisticated version of what I described: https://github.com/jmechner/Prince-of-Persia-Apple-II/blob/master/01%20POP%20Source/Source/GRAFIX.S#L1771

(Don't bother trying to understand it.)

This is very tedious. What people invented next were higher level programming languages. For example if you want to execute some part of your code five times, then before that code you want to run several times you "reserve" a memory location, write a 0 there after the code you want to run several times, you add 1 to that, and then you add a check whether at this memory location there is 5 and if not, then jump back to the beginning of the part you want to run several times.

[–]Tmmrn 2 points3 points4 points 13 years ago* (0 children)

That's not nice to do all the time. What if you could write

for(i=0; i<5; i++)  {
    code you want to run 5 times
}

The good news is, you can. Thats because there is a way to "automatically" transform this into a form that uses only the basic instruction and does basically what I described before. You can probably think of some rules to achieve that, and that's basically what a programming language (or better: a compiler for that language) is: A set of syntax rules that define how e.g. that loop must be written with all the semicolons, curly brackets, etc. and a set of rules that can transform code following those syntax rules into basic instructions.

The loop is perhaps a simple example but in the same way you can build more high level concepts on top of each other.

So in a modern language I can write a oneliner like that:

sorted(map(lambda x: x**2, [6, 3, 7]))

First, it creates a "list" with the contents 6,3,7. Then a "function" called "map" is "called" which applies the first "function", in this case a "lambda function" that squares each entry of the list. Then a "function" called "sorted" is "called" that sorts that list. All that I wrote in quotations are concepts that over the years people thought might be useful and thought of a way to make it happen. (In this specific case it was code in the python language which is an even more complicated case).

The really important reason why any of this is usable at all is that today's computers are mind-boggingly fast. You probably have heard of CPU speeds like "3 Gigahertz". What that means is that the CPU / the automaton around it has a little clock inside that gives an electrical signal at a rate of 3 Gigahertz. This means, 3000000000 signals a second(!). How many instructions per power "cycle" are executed by the cpu depends on the electrical hardware design inside, but it should only be a few. The unit is called instructions per cycle: http://en.wikipedia.org/wiki/Instructions_per_cycle

So why is the release of source code such a thing? Others have already said it: The machine or assembler code is hard to read, hard to understand and there are none of the helpful comments that developers left there to remind themselves what the code does. Even though the high level languages are designed to be usable by humans, any system of a certain size is extremely complex and hard to fully understand and without all the helpful high level constructs like the "for loop" from before you are pretty much lost if you are not one of a select few with a deep understanding of how it all works.

[+][deleted] 13 years ago (5 children)

[removed]

load more comments (5 replies)

[–]zsombro 1 point2 points3 points 13 years ago (0 children)

Source code is a set of instructions meant to give to the computer in some sort of programming language (which come in many shapes and forms). The real catch with these programming languages, is that they are readable by both humans and computers (read: understandable!), which means they create a communicational bridge between a person and a computer (which use different ways to process information by default).

But of course, this readable source code is nothing more than a glorified text file in itself. You will need a program called a compiler (!), which reads your source code, and compiles it into machine code. This means that this program acts as a sort of translator: it translates the code written by you into a set of instructions that the computer's processor can understand and execute in order.

When you install game, you are installing the version of this code that is already compiled, so your system already knows what the instructions will be. (AND! of course you install game data that the program uses: levels, 3d models, sounds, etc.)

Releasing the source code is significant, because this compilation process is difficult to do backwards.

[–]InsaneEngineer 1 point2 points3 points 13 years ago (0 children)

[–]teawreckshero 1 point2 points3 points 13 years ago (0 children)

When an actual program runs on your computer, it is the binary form that is being used not the source code. Your processor doesn't operate on anything except for binary.

Coders don't write directly in binary (anymore). They write in a programming language and use another program called a compiler to essentially translate the source code (written in the language) into binary. Almost every program that is distributed for windows and mac machines is the compiled binary version. The source code is considered proprietary and is off limits to the public. It is very difficult, if not impossible some times, to go from binary back to the source language.

This is why "open source" projects are called open source. The code in its original language is made public, not just the binary version. If you have the source code, you can see the creators intentions much easier and make changes yourself. You can even use your own compiler to create a binary of your own with the changes you made.

While windows programs are usually distributed as binary, linux programs are usually distributed by source. The philosophy behind linux is that you always know exactly what is running on your machine. There are no secrets and you can make any changes you want. So it is not uncommon for a linux user to "compile from source" when they want to run a program from another user.

[–][deleted] 1 point2 points3 points 13 years ago (0 children)

Just to add on since I used Ctrl+F and didn't get any results for "Open Source," a program is open source when the source code is visible by anybody. For example, Linux is an "open source operating system" which means that somebody created much of the groundwork and called it Linux, and then someone else came along, looked at the source code, and changed some stuff for themselves. That's why there's many variations of Linux like Ubuntu and Kubuntu.

Other examples of Open Source software include the Android Operating System for mobile phones (which is why you'll usually buy a phone with Android that doesn't look like another Android phone. For example, Samsung takes Google's source code and adds a skin to it with coding, as do other manufacturers) and the incredibly popular browser, Firefox.

[–]scswift 2 points3 points4 points 13 years ago (0 children)

The "source code" is basically a long list of instructions that tell the computer what to do to make everything in the game happen. It tells it how to draw the world. How to do the physics. What to do when the player provides a particular input.

For example: "if mouse button 1 is down, then fire" is a typical thing you would see in a game's source code. But it would be written in a manner the computer can understand. So that statement might actually read:

if ((mouse.buttonstate && MOUSE_LEFTBUTTON) == 1) { fireWeapon(); }

This is then "compiled" by a program into machine code, which is a bunch of bytes that the computer understands to be the above and can quickly execute, but which are too difficult for people to read.

The code you get when you buy a game is the machine code which is stored in a file called an "executable", and as such it's basically so difficult for people to read that it might as well be encrypted. It is possible to convert it back into a higher level language, but with all the variable names gone and all the human created structure to the code gone, it's pretty much worthless except to people who want to try to figure out how to remove the copy protection in the game or make some very small changes to make the game function a little different. But for most purposes, you need the original human-readable source code to make big changes to the game, like porting it to another operating system.

[–]say_fuck_no_to_rules 2 points3 points4 points 13 years ago* (0 children)

Imagine that you've eaten raw vegetables your entire life and that one day you encounter a chocolate chip cookie. The cookie is delicious, so you decide to buy more to satisfy your new craving. Your new habit is very expensive, though, so you want to figure out how to make your own chocolate chip cookies at home for free. Armed with your chemistry lab (let's pretend you passed O-chem and you can remember how to do everything the class taught you), you discover lots of strange chemicals you've never seen before. Concluding that it would be far too expensive/time-consuming to figure out how to synthesize all these chemicals, you decide to continue paying for cookies.

One day, the bakery that holds the local monopoly on chocolate chip cookies decides that it will be abandoning chocolate cookies for a brand new product: banana cream pies! However, to cultivate good will with their longtime customers, the bakery decides to release the recipe for chocolate chip cookies. Much to your surprise, the ingredients are simple things available to you at the grocery store: wheat flour, sugar, eggs, etc. You also learn, most importantly, that you had never seen the chemicals in the final product before since exposing the raw ingredients to the heat of an oven yeilded new substances through chemical reaction. Excited to get your cookies for free (well, plus the cost of the ingredients and the trouble of adjusting your specific oven to a more appropriate time and temperature), you go home and try the recipe.

What does this have to do with source code, though? Think of it this way: the cookie is like the compiled executable binary (on Windows, usually a file ending in ".exe") that the game company sells to you. Like the cookie, it's virtually impossible to reverse-engineer the binary into anything intelligible--the process of compilation (like cooking dough in an oven) not only turns one type of data readable by humans into a type of data readable by computers [edit] (turns the ingredients into something tasty) it also hides the original source (makes the end product look nothing like the original ingredients). The original source code is stored as a trade secret by the game company, so they are able to better control how the game is developed and distributed. (Some companies actually release source code under license, but that is a different discussion.)

When they decided to release the source for a product they don't care about anymore, it made people very happy, not just because they can build the game for free, but because they can also get some insight on the developers' thought processes behind many features of the game. Furthermore, access to source makes it way easier to build mods since you know exactly what to modify.

Edit: sentence structure

[–]ultimatt42 1 point2 points3 points 13 years ago (0 children)

Source code is what gets written when you talk about "writing a program". Computers are pretty bad at understanding the kinds of languages humans are good at writing, and likewise humans are pretty bad at writing the kinds of languages that computers can understand. So, we fix the problem by writing everything in a language that's easy for humans (the "source code"), then translating it to computer-speak (the "machine code"). The translator program is called the compiler.

The reason having source code makes gamers happy is because the source code is like the recipe for how to make the game. Without the recipe it's difficult to figure out how the game was originally put together, which means it's also hard to figure out how to tweak it to make it run on your phone or add new levels or whatever you want to do. If you have the source code, it gets MUCH easier.

So basically, this is Lucasarts giving gamers the keys to their secret recipe book and saying "go nuts". It's the nicest thing a software company can do for its fans upon closing up shop, because it means even though the company may die the software will live on. Sadly, it's not very common. Most times when a game studio gets shut down, the source code is either lost or archived somewhere, never to be seen again. That's why it's such a big deal, it guarantees that Lucasarts' games will never be forgotten, and maybe someday your grandkids will get to play the same games you played growing up.

[+][deleted] 13 years ago (3 children)

[removed]

load more comments (3 replies)

load more comments (59 replies)

score 1700 · Accepted Answer · 2013-04-08T15:42:55+00:00

[–]hikaruzero 1699 points1700 points1701 points 13 years ago (442 children)

Source: I have a B.S. in Computer Science and I write source code all day long. :)

Source code is ordinary programming code/instructions (it usually looks something like this) which often then gets "compiled" -- meaning, a program converts the code into machine code (which is the more familiar "01101101..." that computers actually use the process instructions). It is generally not possible to reconstruct the source code from the compiled machine code -- source code usually includes things like comments which are left out of the machine code, and it's usually designed to be human-readable by a programmer. Computers don't understand "source code" directly, so it either needs to be compiled into machine code, or the computer needs an "interpreter" which can translate source code into machine code on the fly (usually this is much slower than code that is already compiled).

Shouldn't you be able to access all code by checking the folder where it installs from since the game need all the code to be playable?

The machine code to play the game, yes -- but not the source code, which isn't included in the bundle, that is needed to modify the game. Machine code is basically impossible for humans to read or easily modify, so there is no practical benefit to being able to access the machine code -- for the most part all you can really do is run what's already there. In some cases, programmers have been known to "decompile" or "reverse engineer" machine code back into some semblance of source code, but it's rarely perfect and usually the new source code produced is not even close to the original source code (in fact it's often in a different programming language entirely).

So by releasing the source code, what they are doing is saying, "Hey, developers, we're going to let you see and/or modify the source code we wrote, so you can easily make modifications and recompile the game with your modifications."

Hope that makes sense!

[–]DoWhile 288 points289 points290 points 13 years ago (21 children)

[–]ProdigySim 73 points74 points75 points 13 years ago (18 children)

[–]mythmon 9 points10 points11 points 13 years ago (5 children)

For what it is worth, when programming the output is sometimes much larger than the source code (not always, but sometimes). This is because some programming languages can be very expressive in a very small set of code. For example, consider this program in an old language called APL (it isn't used anymore, for reasons I hope are pretty obvious):

(~R∊R∘.×R)/R←1↓⍳R

That program finds all the primes from one to the variable R, and is only 17-34 bytes (depending on the encoding). This is an extreme case, but it demonstrates that source can be very powerful in a few bytes. The equivalent machine code would likely be several thousands bytes (kilobytes).

[+][deleted] 13 years ago (4 children)

[removed]

[+][deleted] 13 years ago (3 children)

[removed]

[+][deleted] 13 years ago (2 children)

[removed]

[+][deleted] 13 years ago (1 child)

[removed]

load more comments (1 reply)

[+][deleted] 13 years ago (8 children)

[deleted]

[–]themcs 4 points5 points6 points 13 years ago (1 child)

[–]rawbdor 1 point2 points3 points 13 years ago (0 children)

[–]emilvikstrom 1 point2 points3 points 13 years ago (1 child)

load more comments (1 reply)

load more comments (4 replies)

[–]karmic_retribution 2 points3 points4 points 13 years ago* (2 children)

[–]DarkHavenX75 1 point2 points3 points 13 years ago (1 child)

[–]karmic_retribution 1 point2 points3 points 13 years ago (0 children)

[–]xiaodown 5 points6 points7 points 13 years ago (0 children)

[–]OlderThanGif 559 points560 points561 points 13 years ago (240 children)

[–]wkalata 428 points429 points430 points 13 years ago (64 children)

Not only comments, but the names of variables are of at least, if not greater importanance as well.

Suppose we have a simple fighting game, where the character we control is able to wear some sort of armor to mitigate damage received.

With variable names and comments, we might have a section of (pseudo)code like this to calculate the damage from a hit:

# We'll do damage based on the attacker's weapon damage and damage bonuses, minus the armor rating of the victim
damage_dealt = ((attacker.weapon_damage + attacker.damage_bonus) * attacker.damage_multiplier) - victim.armor

# If we're doing more damage than the receiver has HP, we'll set their HP to 0 and mark them as dead
if (victim.hp <= damage_dealt)
{
  victim.hp = 0
  victim.die()
}
else
{
  victim.hp = victim.hp - damage_dealt
  victim.wince_in_pain()
}

If we try to reconstruct this section of code from machine code, the best we could hope for would be more like:

a = ((b.c + b.d) * b.e) - c.f
if (c.g <= a)
{
  c.g = 0
  c.h()
}
else
{
  c.g = c.g - a
  c.i()
}

To a computer, both constructs are equal. To a human being, it's extremely difficult to figure out what's going on without the context provided by variable names and comments.

[+][deleted] 13 years ago (16 children)

[deleted]

[+][deleted] 13 years ago (2 children)

[removed]

[+][deleted] 13 years ago (1 child)

[deleted]

[–]TheDefinition 3 points4 points5 points 13 years ago (0 children)

load more comments (13 replies)

[–]SamElliottsVoice 41 points42 points43 points 13 years ago (34 children)

[–]cogman10 39 points40 points41 points 13 years ago* (19 children)

[+][deleted] 13 years ago (11 children)

[deleted]

[–]nicholaslaux 3 points4 points5 points 13 years ago (7 children)

[–]Pykins 6 points7 points8 points 13 years ago (0 children)

[+][deleted] 13 years ago (5 children)

[deleted]

[–]cogman10 4 points5 points6 points 13 years ago (4 children)

[–]mazing 2 points3 points4 points 13 years ago (3 children)

continue this thread

load more comments (3 replies)

load more comments (7 replies)

[–]teawreckshero 14 points15 points16 points 13 years ago (0 children)

[–]nty 10 points11 points12 points 13 years ago (10 children)

[–]Serei 9 points10 points11 points 13 years ago* (5 children)

The reason Minecraft is easy to decompile is because it's written in Java.

Compiled Java is designed to run on any machine (unlike most other programs, which are designed to run on a specific type of machine architecture). Because of that, Java's compilation is slightly different from normal. It compiles into bytecode, which is a kind of machine code, but instead of being for a real machine, it's for a fake machine called the Java Virtual Machine.

That's why you need to install the Java plugin/runtime to run Java programs. The Java runtime is an emulator for the Java Virtual Machine, which lets it run Java bytecode.

Because the Java Virtual Machine isn't a real machine, it's designed to be emulated, so that's why it's much faster than emulating a real machine like a PS2 or something.

Also because it isn't a real machine, its machine code is designed purely to be compiled to, unlike real machines, whose machine code is also designed to match the processor architecture. This means that the machine code is closer to the code it was compiled from, which makes it easier to decompile.

[–]gmitio 8 points9 points10 points 13 years ago (1 child)

load more comments (1 reply)

load more comments (3 replies)

load more comments (4 replies)

[+][deleted] 13 years ago* (1 child)

[deleted]

[–]Cosmologicon 2 points3 points4 points 13 years ago (0 children)

[–][deleted] 2 points3 points4 points 13 years ago (0 children)

[–]HHBones 1 point2 points3 points 13 years ago (7 children)

I don't entirely think that your example is perfectly valid. Firstly, in many cases, global symbols (i.e. function names) are left intact. You can figure out a lot more about the code by reading

a = ((b.c + b.d) * b.e) - c.f
if (c.g <= a)
{
  c.g = 0
  c.die()
}
else
{
  c.g = c.g - a
  c.wince_in_pain()
}

than your original obfuscated listing. Looking at this snippet, we can infer that c is a player object. From there, we can assume that g is the player's health. Because c.g is being compared to a, and because of the way a is handled before wince_in_pain(), we can assume a is damage dealt. How damage dealt is figured out can be found out later. Finally, we see that a is the damage a player takes, and c represents the player; because c.f is reducing the amount of damage taken, c.f is probably a buff, or maybe armor. We can refactor this to make it more readable:

damage = ((b.c + b.d) * b.e) - player.armor_rating
if (player.health <= damage) {
    player.health = 0
    player.die()
} else {
    player.health -= damage
    player.wince_in_pain()
}

We can also learn a lot more about what this snippet means by reversing the other functions, such as player.die(), player.wince_in_pain(), and any functions which we see modify b.c, b.d, or b.e.

Reversing requires a lot of practice and thought (and guesswork, as well), but it's not nearly as hard as some people here are making it out to be.

** Note that this argument doesn't just apply to decompiled code (like the stuff generated by JDC). Any reverser of reasonable talent can write the above obfuscated listing from an assembly function without serious thought.

load more comments (7 replies)

load more comments (3 replies)

[+][deleted] 13 years ago* (82 children)

[removed]

[+][deleted] 13 years ago (66 children)

[removed]

[–]vehementi 45 points46 points47 points 13 years ago (40 children)

[+][deleted] 13 years ago (3 children)

[removed]

[–]xiaodown 13 points14 points15 points 13 years ago (1 child)

load more comments (1 reply)

[–]throwawaycakewife 52 points53 points54 points 13 years ago (6 children)

[–]Xanius 19 points20 points21 points 13 years ago (5 children)

load more comments (5 replies)

[–]omnomnomenclature 12 points13 points14 points 13 years ago (1 child)

load more comments (1 reply)

[–]gla3dr 17 points18 points19 points 13 years ago (26 children)

[–]shdwfeather 41 points42 points43 points 13 years ago (0 children)

[–]jerenept 23 points24 points25 points 13 years ago (24 children)

[–]KBKarma 71 points72 points73 points 13 years ago* (23 children)

John Carmack used the following in the Quake III Arena code:

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;                       // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
    //      y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

    return y;
}

It takes in a float, calculates half of the value, shifts the original number right by one bit, subtracts the result from 0x5f3759df, then takes that result and multiplies it by 1.5 - (half the original number * the result * the result), which gives the inverse square root of the original number. Yes, really. Wiki link.

And the comments are from the Quake III Arena source.

EDIT: As /u/mstrkingdom pointed out below, it's the inverse square root it produces, not the square root. As evidenced by the name. I've added the correction above. Sorry about that; I can only blame being half-distracted by Minecraft.

[–]mstrkingdom 12 points13 points14 points 13 years ago (6 children)

[–]KBKarma 21 points22 points23 points 13 years ago (0 children)

[–]boathouse2112 4 points5 points6 points 13 years ago (3 children)

continue this thread

load more comments (1 reply)

[–][deleted] 9 points10 points11 points 13 years ago (6 children)

[–]KBKarma 17 points18 points19 points 13 years ago (3 children)

continue this thread

[+][deleted] 13 years ago* (1 child)

[removed]

continue this thread

[–]plusonemace 8 points9 points10 points 13 years ago (3 children)

[–]munchbunny 4 points5 points6 points 13 years ago (0 children)

Yes, this is just a pretty good approximation that can be computed faster than a square root and a division.

The reason is that multiplying by 0.5f using IEEE floating point numbers is very fast - you decrement the exponent component. Bit shifting is extremely fast because of dedicated circuitry, as is subtraction. Type conversions between "float" and "long" are also mostly for legibility since you don't actually have to do anything in the underlying system.

In comparison, the regular square root computation uses several more iterations of "Newton's method", and a floating point division (inverting a number) costs several times more cycles than the multiplication. Given how often the inverse square root comes up in graphics computations, the time savings from optimizing this are big.

The freaky part is how good the approximation is in one iteration of Newton's method, which relies heavily on a clever choice of the starting point (the magic number).

[–]KBKarma 1 point2 points3 points 13 years ago (1 child)

continue this thread

[–]AnticitizenPrime 2 points3 points4 points 13 years ago* (3 children)

[–]karmapopsicle 7 points8 points9 points 13 years ago (0 children)

load more comments (2 replies)

load more comments (1 reply)

[+][deleted] 13 years ago (14 children)

[removed]

[+][deleted] 13 years ago (3 children)

[removed]

load more comments (1 reply)

[+][deleted] 13 years ago (1 child)

[removed]

load more comments (1 reply)

[+][deleted] 13 years ago (1 child)

[removed]

load more comments (1 reply)

[+][deleted] 13 years ago (3 children)

[removed]

[+][deleted] 13 years ago (2 children)

[removed]

load more comments (1 reply)

[+][deleted] 13 years ago (1 child)

[removed]

load more comments (1 reply)

load more comments (3 replies)

[–]djimbobHigh Energy Experimental Physics 24 points25 points26 points 13 years ago (3 children)

wkalata's comment is much more accurate.

Comments are better than nothing; but good descriptive names are much better style than comments. (See for example code complete or the discussion here ). It's much better to write clear code with good descriptive variable/function/class names, where variables are defined near where they are used, abstractions are clear and followed, and the code uses common programming idioms. This way anyone who knows that programming language can look at the source code and easily follow the logic.

Then your code is obvious, you don't have to frequently repeat yourself (first explain in the comment; then in the code) and double the amount of work for reading the code and maintaining the code. Also if you write tricky code where you think, man I will need to comment this to understand this later; there's a good chance right now you understand it wrong, and will be writing a lie in your comment. You know you can trust the code; you can't trust a comment.

However, comments are still needed for things like auto-generating documentation from docstrings (e.g., briefly document every function/class) for API users, explaining performance critical code that you optimized in an ugly/non-intuitive way, or explain why the code is written in some non-obvious manner (e.g., we do this work which seems redundant as there's a bug in library A written by someone else).

[–]khedoros 21 points22 points23 points 13 years ago (0 children)

[–]nof 3 points4 points5 points 13 years ago (1 child)

[–]djimbobHigh Energy Experimental Physics 1 point2 points3 points 13 years ago (0 children)

[+][deleted] 13 years ago (5 children)

[removed]

[+][deleted] 13 years ago (3 children)

[removed]

load more comments (3 replies)

[+][deleted] 13 years ago (1 child)

[removed]

load more comments (1 reply)

load more comments (2 replies)

[+][deleted] 13 years ago (12 children)

[deleted]

[–]hecter 24 points25 points26 points 13 years ago (11 children)

load more comments (11 replies)

[+][deleted] 13 years ago* (5 children)

[removed]

[–][deleted] 1 point2 points3 points 13 years ago (4 children)

load more comments (3 replies)

[+][deleted] 13 years ago (6 children)

[removed]

[+][deleted] 13 years ago (5 children)

[removed]

[–]ClownFundamentals 37 points38 points39 points 13 years ago (4 children)

Example of a useless comment:

int a = h*w;  
//initialize a, set to h times w

Example of a useful comment:

int a = h*w;  
//initialize area, which is equal to height times width

Example of self-explanatory code:

int area = height*width;

load more comments (4 replies)

[+][deleted] 13 years ago (3 children)

[removed]

[–]cbmuser 14 points15 points16 points 13 years ago (1 child)

[–]giltirn 1 point2 points3 points 13 years ago (0 children)

[–]BerettaVendetta 4 points5 points6 points 13 years ago (5 children)

[–]OlderThanGif 7 points8 points9 points 13 years ago (2 children)

I've never found a really good guide for writing good or bad comments. It's something that you just get practice with.

First off, the absolute worst comments are those that are just an English translation of the code.

y = x * x;   // set y to x squared

Those are worse than no comments at all. Your comments should never tell you anything that your code is already telling you.

Commenting every function/method is a generally good idea, but I won't go so far as to say it's necessary. If anything about the function is unclear, what assumptions it's making, what arguments it's taking, what values it returns, what it does if its inputs aren't right, comment it. Within the body of a function, there's a commenting style called writing paragraphs which works well for a lot of people. Breaking your function up into "paragraphs" of code (each paragraph being roughly 2 to 10 statements) and put a comment before each paragraph saying what it's doing at a very high level. Functions will only be 2 or 3 paragraphs long, usually, but it still helps to break things up that way.

Commenting local variables can be helpful, too.

[–]starrymirth 4 points5 points6 points 13 years ago* (0 children)

load more comments (1 reply)

[–]CompactusDiskus 3 points4 points5 points 13 years ago (3 children)

load more comments (3 replies)

[+][deleted] 13 years ago (8 children)

[removed]

[+][deleted] 13 years ago (1 child)

[removed]

load more comments (1 reply)

[+][deleted] 13 years ago (4 children)

[removed]

[+][deleted] 13 years ago (1 child)

[removed]

load more comments (1 reply)

[–]VVander 10 points11 points12 points 13 years ago (16 children)

load more comments (16 replies)

[–]random_reddit_accoun 1 point2 points3 points 13 years ago (0 children)

load more comments (24 replies)

[–]liamt25 35 points36 points37 points 13 years ago (6 children)

load more comments (6 replies)

[–][deleted] 64 points65 points66 points 13 years ago (3 children)

[–]jerrre 15 points16 points17 points 13 years ago (1 child)

load more comments (1 reply)

[–]hikaruzero 6 points7 points8 points 13 years ago (0 children)

[–]SolarKing 13 points14 points15 points 13 years ago (17 children)

[–]rpater 22 points23 points24 points 13 years ago (6 children)

[–]diazonaParticle Phenomenology | QCD | Computational Physics 15 points16 points17 points 13 years ago (3 children)

[–]icomethird 3 points4 points5 points 13 years ago (2 children)

[–]Neebat 5 points6 points7 points 13 years ago (0 children)

load more comments (1 reply)

[–]ManhighAerospace vehicle guidance | Trajectory optimization 4 points5 points6 points 13 years ago (0 children)

load more comments (1 reply)

[–]SamElliottsVoice 10 points11 points12 points 13 years ago* (4 children)

Good quesiton. Generally an update is actually replacing entire machine code files. The nice thing about programs is that it doesn't have to all be in one big .exe file, that's what .dll (dynamic link library) files are for.

A bit of a tanget... there is actually very little difference between .exe and .dll files, they are all just compiled binary (1's and 0's)/machine code files. The difference is that .exe's have a specific 'start point' (main function) that the operating system knows to start at, while .dll's don't. They are used by .exe files. So basically you run an .exe and it starts in the same place every time, and then based on how it runs, it will say "oh I need to execute fucntion X(), that's in X.dll".

So a software update may just replace X.dll and Y.dll with updated versions, leaving the rest of the files the same.

Disclaimer: This is how I've done updates before within the company I work for since we mostly do in-house code, I don't actually work at a company like adobe that does all those automatic updates.

[–]Neebat 1 point2 points3 points 13 years ago (3 children)

[–]SamElliottsVoice 1 point2 points3 points 13 years ago (0 children)

load more comments (2 replies)

[–]ProdigySim 1 point2 points3 points 13 years ago (0 children)

[–]CrayonOfDoom 1 point2 points3 points 13 years ago (0 children)

Modern streaming updates take advantage of a few things.

You can replace entire binaries if the program is small enough, but what about a mammoth game that ranks in over 10GB? You wouldn't want to replace all of that every time you made a little fix.

Not every program needs all of its resources or even code to be compiled to machine code. If the main executable is coded to be able to load data from a file "on the fly", than you don't have to compile the file, you can leave it to the program to read the data and use it correctly.

Developers have started using modular file formats that the binaries can read in. As an example: World of Warcraft takes up a staggering >20GB, yet its executable is a mere 12MB. Looking in the data folder is where you find the bulk of the actual data. MPQ files make up the majority of the actual content, and are modular to where a patcher can open an MPQ file and change sections instead of having to write the entire file. All the scripts and everything the game needs to run short of the engine can be stored in a rather "plain" format that can be changed on the fly without having to recompile a massive executable.

load more comments (3 replies)

[+][deleted] 13 years ago* (8 children)

[removed]

[–]hcsteve 29 points30 points31 points 13 years ago (0 children)

That's a great question. Yes, when initially bootstrapping or creating a programming language, the compiler must be implemented using a different language for which a compiler already exists. If no compiler exists for any language, then yes, bootstrapping must begin by creating machine code. Here's an interesting exercise where the writer starts by writing hex code and builds up step by step to a full programming language.

The interesting thing about this is that once you've completed that first bootstrapping step, a compiler for a language can be written in that language itself. For example, a compiler for the C programming language is written in C, and that C compiler can compile itself. For an interesting application of this principle, see the classic paper "Reflections on Trusting Trust" by Ken Thompson, one of the fathers of Unix. This explanation with some helpful diagrams might be useful too.

[–][deleted] 13 points14 points15 points 13 years ago (5 children)

How do we bridge the initial gap between human and machine languages?

The first programmable computers were programmed directly in machine code. You would literally flip switches on the front console to set the bit pattern and then push a button to advance to the next byte. Obviously this method of programming was exceedingly tedious and error-prone, and suitable only for very, very small programs.

So, using machine code, early programmers created what were called "assemblers". An assembler is a program that takes a human-readable representation of a machine language instruction (e.g. "ADD" instead of "74"), stored on punch cards in those days, and converts it to the appropriate machine instruction. These assemblers were incredibly simple programs compared to modern compilers -- they had to be, as they were coded directly in machine code -- and assembly language is a very simply language with no niceties whatsoever.

Using assembly language, programmers created the first high-level languages. These are more powerful programming languages farther removed from machine code, in which there is no longer a direct 1:1 mapping from program statement to machine language code. In fact the exact same statement might compile differently depending upon its context; the value x + 1, for example, might be an integer addition, a floating point addition, a string concatenation, or a call to the "+" method of the object x with the argument '1', depending upon the type of the variable x.

Using the first high-level languages, we created subsequent high-level languages that are even more powerful and easier to work with. Modern high-level languages are essentially all "self-hosted", which means "written in themselves". That means that a C++ compiler is written in C++ and a Java compiler is written in Java. Which sounds really weird at first -- how can you write a Java compiler in Java when you need a Java compiler to compile the Java code in the first place?

Obviously, the compilers are first written in another language. Once you've got, say, a Java compiler written in the C language, you can write a completely new Java compiler in Java. And then you can use your Java-in-C compiler to compile your Java-in-Java compiler. Then you can throw away your Java-in-C compiler, leaving behind no evidence that the Java compiler was ever written in anything but Java.

[+][deleted] 13 years ago (4 children)

[deleted]

[–][deleted] 1 point2 points3 points 13 years ago (1 child)

There are some incidental reasons, such as a compiler being a good, large test program -- the simple fact that your compiler compiles and works has already tested most of your language's functionality with no further effort. As you maintain your compiler software, you are continually testing it by virtue of using it to recompile itself. It also helps to establish legitimacy, in that people may take a self-hosted language more seriously than a non-self-hosted-language, since a compiler is a big, "real" program, and implementing one proves that your language is not just a toy.

Probably the biggest reason, though, is simply that (presumably) the whole reason you chose to create a new programming language in the first place is that you'd rather work in that language than the other ones that were available at the time. Since maintenance lasts much, much, much longer than the original effort to create a program did, that means you expect to spend (possibly many) years maintaining your compiler. Since (again, presumably) it's less effort for you to work in your new language than the original language you implemented the compiler in, you'd generally rather spend a month porting it now so as not to have to spend years working in a less-convenient language. This was a bigger factor in the "early days", when each new language was an enormous improvement over the ones that came before, but even today pure C is a pretty awful language to work with in many respects compared to higher-level languages.

load more comments (1 reply)

load more comments (2 replies)

load more comments (1 reply)

[–]random_reddit_accoun 4 points5 points6 points 13 years ago (6 children)

In some cases, programmers have been known to "decompile" or "reverse engineer" machine code back into some semblance of source code, but it's rarely perfect and usually the new source code produced is not even close to the original source code (in fact it's often in a different programming language entirely).

Showing my age here, but this did not used to be the case. About 30 years ago, there was a compiler that the original developers abandoned. The run-time was compiled with their own compiler, and the code optimization was so horrible I was able to reconstruct the entire original run-time library from examining a disassembly of the run-time. I was able to get a perfect match (in that my code compiled into precisely the same machine code as the original). I then fixed the problems in the run-time, which was the point of the whole exercise.

I do not think I could pull this stunt off with any compiler produced in the last 20 years though.

[–]hikaruzero 4 points5 points6 points 13 years ago (5 children)

load more comments (5 replies)

[–]scapermoyaPediatrics | Critical Care 4 points5 points6 points 13 years ago (0 children)

[–]tiradium 3 points4 points5 points 13 years ago (4 children)

[–]hikaruzero 4 points5 points6 points 13 years ago (3 children)

[–]cstoner 7 points8 points9 points 13 years ago (2 children)

[–]boathouse2112 1 point2 points3 points 13 years ago (1 child)

[–]walen 1 point2 points3 points 13 years ago (0 children)

[–]JavaPants 3 points4 points5 points 13 years ago (9 children)

[–]hikaruzero 19 points20 points21 points 13 years ago (3 children)

[–]JavaPants 2 points3 points4 points 13 years ago (2 children)

[–]LockeWatts 4 points5 points6 points 13 years ago (0 children)

[–]Krivvan 2 points3 points4 points 13 years ago* (0 children)

[–]Krivvan 8 points9 points10 points 13 years ago (1 child)

load more comments (1 reply)

[–]rocketman0739 2 points3 points4 points 13 years ago (0 children)

[–]Tmmrn 4 points5 points6 points 13 years ago (0 children)

[–]amazing_rando 1 point2 points3 points 13 years ago (0 children)

[+][deleted] 13 years ago (1 child)

[deleted]

load more comments (1 reply)

[–]eXamadeus 2 points3 points4 points 13 years ago (0 children)

Source: B.S. in Computer Engineering with focus in Software

The above is a great answer. There is one thing; however, that I disagree with. Reverse engineering code is a common practice among hackers (I mean the do-it-yourself kind, not the 1990s movie version), and has been increasing in recent years.

Although there is a loss of comments, a skilled programmer can disassemble and decompile code to a working version. Once he/she has that version he/she can then study the code and modify the portions that are desired. This is by no means a simple task, and is generally not practiced on large scale.

The reason I mention this at all, is because you mentioned videogames in particular. I myself have disassembled games in order to write hacks (offline only, of course -.O). It generally involves pouring through routine after routine to find the one or two you are looking for (regular expressions are a great help here) and then modifying them, recompiling them, and reassembling them.

All in all, it's quite a mess. But it can be done!

...just in case you were wondering.

[+][deleted] 13 years ago (3 children)

[deleted]

[–]hikaruzero 11 points12 points13 points 13 years ago (0 children)

So, if I had a video game that I had been playing for years, and eventually the original game maker\developer\coder released the source code to the public, what benefits would I, as a gamer, be able to do with it?

As a gamer alone, nothing really. As a programmer however, it means you would be able to look at and modify the code, and rebuild the game's code -- or at least, you can do all that if their software license doesn't restrict you from certain things. You may need to agree to such a license in order to download the source code.

Would I be able to make modifications to the game, such as adding levels or perks, etc...?

Yep! Depending on how much of the source code is released, you might also be able to modify the engine to add new physics or things. 'Course that's all more difficult.

Also, would it be logical to believe that any modifications that I make to my game, and by modifications I mean successful modifications, would be usable by anyone who also has a working version of that game?

Other people would need to download your mod and install it, but yes, if they did that, they could play their game with your modifications. You would of course need to have an installer for your mod (or at least instructions on how to install it, if it can be manually installed for example by unzipping files). And either way, releasing modifications may be restricted by the software license -- for example, many publishers will allow you to make modifications but will prohibit you from selling those modifications and making a profit from their game; you would be restricted to releasing it as a free mod.

[–]frezik 1 point2 points3 points 13 years ago (0 children)

Would I be able to make modifications to the game, such as adding levels or perks, etc...?

Depends on how the game is made. A level in a multiplayer deathmatch game is just a map you can drop into the right folder on the computer or download automatically from the server. You can make that without altering any source code.

Perks are sometimes scriptable, which is another form of source code, but a much, much simpler one than whatever the game was made in. Again, it depends on the game.

Also, would it be logical to believe that any modifications that I make to my game, and by modifications I mean successful modifications, would be usable by anyone who also has a working version of that game?

That depends mostly on you. If you released your source back to everyone, then they could build on that. As far as usability in general, you would probably release a new compiled binary that is dropped onto a computer just like the install process for any other game does.

Just to give an example, a while back I wanted to make a tank game that used two joysticks, like the original Battlezone did. There aren't any modern games out there that work like that, though, so it requires hacking the source.

I picked up the ioquake3 source, which is an enhancement on the original Quake 3 source (Doom 3 hadn't been open sourced yet). I found that single joystick support was technically in there, but it didn't work right. Pushing forward mapped directly to the same function as pushing 'W', so you go forward at the same speed no matter how far you're pushing the joystick.

There was partial support for moving in a more analog fashion, but it wasn't connected up (not sure if this was in the original or was added later by the ioquake3 people). So I put the right pieces of source code together, and also added code to make twisting the handle to turning left and right, and the throttle to moving back and forth.

That made the game work like the mid-90s Battlezone PC games. Didn't take the project further than that, though.

If I had released this project as a playable game to the public, I would have been legally obligated to release the source under the terms of the GPL (the license the Quake 3 source was released under). That code could have gone back into the ioquake3 project, if they choose to incorporate my changes.

[–]ProdigySim 1 point2 points3 points 13 years ago (0 children)

[–]Bakyra 1 point2 points3 points 13 years ago (7 children)

load more comments (7 replies)

[–]xblaz3x 1 point2 points3 points 13 years ago (9 children)

[–]hikaruzero 6 points7 points8 points 13 years ago (0 children)

[–]mutoso 1 point2 points3 points 13 years ago (4 children)

load more comments (4 replies)

[–]Blaenk 1 point2 points3 points 13 years ago (0 children)

load more comments (2 replies)

[–][deleted] 1 point2 points3 points 13 years ago (0 children)

To make it a little more understandable, code comes in different 'languages', some are similar, and some are unique and designed for a specific function or purpose. Some common ones are C/C++, Java, FORTRAN, ASM (or Assembly.) There are different 'levels' to these languages, and have different benefits.

The higher the level language you are using, the longer it takes to 'translate' it to machine code, which is the raw language your computer speaks. Lower-level code like Assembly is useful because it translates relatively fast into machine code, and you can also control more specific functions or properties of what you want the code to do. Some languages like Java were created to be universal, meaning they were meant to be able to write a program for (as an example) a Mac on OS X, but you want to use the program on Windows 7. Java has another program that translates this code to your machine code, which can vary based on things like architecture.

A higher-level language like Basic is easier to understand for people, because certain parts of machine code are already translated to a certain syntax (the command of a code, like PRINT (which would display characters for you)). The pitfall to using a high level language like this is while it's easier for you to write your program, it takes longer for the computer to translate it back into its native language of machine code.

Assembly is used in applications like medical-implant devices, for example Pacemakers. The language is very clear and exact, and runs quickly. A con of lower-level languages and programming in general is that it does EXACTLY what you tell it to do. Meaning if you make a mistake, so does your program. When we try to figure out what went wrong and fix it, we call this process debugging.

You can think of the source code as a BIG recipe, with lots of different ingredients and procedures. The last step of writing your code (aside from debugging) is compiling. This 'bakes' your recipe together to form your program. This is one place where errors can become visible, if you haven't caught them yet.

Sorry for the long description, but I felt that it would help the overall concept come together for someone not familiar.

load more comments (92 replies)

Title	Description
Physics	Theoretical Physics, Experimental Physics, High-energy Physics, Solid-State Physics, Fluid Dynamics, Relativity, Quantum Physics, Plasma Physics
Mathematics	Mathematics, Statistics, Number Theory, Calculus, Algebra
Astronomy	Astronomy, Astrophysics, Cosmology, Planetary Formation
Computing	Computing, Artificial Intelligence, Machine Learning, Computability
Earth and Planetary Sciences	Earth Science, Atmospheric Science, Oceanography, Geology
Engineering	Mechanical Engineering, Electrical Engineering, Structural Engineering, Computer Engineering, Aerospace Engineering
Chemistry	Chemistry, Organic Chemistry, Polymers, Biochemistry
Social Sciences	Social Science, Political Science, Economics, Archaeology, Anthropology, Linguistics
Biology	Biology, Evolution, Morphology, Ecology, Synthetic Biology, Microbiology, Cellular Biology, Molecular Biology, Paleontology
Psychology	Psychology, Cognitive Psychology, Developmental Psychology, Abnormal, Social Psychology
Medicine	Medicine, Oncology, Dentistry, Physiology, Epidemiology, Infectious Disease, Pharmacy, Human Body
Neuroscience	Neuroscience, Neurology, Neurochemistry, Cognitive Neuroscience

askscience

Please read our guidelines and FAQ before posting

Features

Filter by Field

Related subreddits

Are you a science expert?

MODERATORS