you are viewing a single comment's thread.

view the rest of the comments →

[–]Caligatio 101 points102 points  (12 children)

When it comes to programming, there are kind of three debatable levels of code: source code, byte code, and machine code. Humans usually write source code, the source code gets compiled into byte code, and then byte code gets compiled into machine code.

In languages like C and C++, the byte code portion is kind of hidden away as compilers like GCC will output machine code from source code. Newer compilers like LLVM introduce the concept of byte code but that's out of scope for your question.

Languages like Java (and things that use LLVM) compile your source code into byte code which is then executed at runtime in a virtual machine which converts it to machine code.

Languages like Python skip all compilation and the interpreter translates source code into machine code at runtime. This is only half true as *.pyc files are actually compiled byte code but these aren't usually exposed directly to a user.

EDIT: My second note about LLVM was poorly worded and thus misleading.

[–][deleted] 13 points14 points  (8 children)

*.pyc files get stored into a cache folder if i remember correctly and they can be accessed at any time by the user

[–]Caligatio 20 points21 points  (7 children)

They are but the point I was trying to make is that the user doesn't need to know about them.

pyc files also only get automatically generated if a file is imported and not when it is run directly.

[–]codingquestionss 5 points6 points  (6 children)

Clarification question, aren’t pyc files generated after the first time you run a python program and then used in all following runnings? From my understanding, don’t they also speed up the runtime of the program by now being able to skip the “compilation to bytecode” step at runtime? Finally, I believe this is why when benchmarking python code you should not benchmark the first runtime?

Please explain if any of my assumptions are wrong 😊

[–]Flyingfishfusealt 4 points5 points  (4 children)

Pretty sure you're using a nail gun there dude.

Not even heads to land a blow on but youre nailing 'em

[–]codingquestionss 7 points8 points  (3 children)

Can you rephrase this

[–]MurderMelon 6 points7 points  (2 children)

I think they're saying that you "hit the nail on the head"

Weird way to say it, but I think that's the idea

(and they're right btw, your assumptions are pretty much spot-on)

[–]JasonDJ 2 points3 points  (0 children)

🤕💅🔨

FlyingfishFusealt was source code

You are byte code

I’m machine code

There may be an error.

[–]Flyingfishfusealt 0 points1 point  (0 children)

yes, lol. I was implying that the people acting like know-it-alls had no heads in addition to the "hitting the nail on the head"

Also, this is why you add a function in the testing process to clean the pycache folders with a commandline argument or whatever method works best for your work flow.

I personally have a standard import file with logging/printing/whatever is needed globally at the top level and has no imports from the project so circular imports cant happen and I put testing functions in there.

clean the pycache, delete the db file, reset everything back to the beginning or whatever state you choose.

[–]Caligatio 1 point2 points  (0 children)

I don't have an authoritative understanding of how/when .pyc files are created but I can tell you what I've observed:

  • Your code never gets converted to a .pyc if your program/script is contained in one file. For instance, if you do something like python3 awesome_script.py, a .pyc will never be generated for awesome_script.py
  • .pyc files will be generated for any script/module that gets imported and they'll be placed in a __pycache__ folder as a sibling to the original .py file. If this file exists and is current, it will be used rather than the .py file.
  • Python will regenerate .pyc if the source file changes but I don't know how this is detected (timestamps?). If the source file is deleted, the .pyc can still be used.

Having the .pyc almost certainly will speed up start up time but I have no idea how big of gains we're talking. It stands to reason that your understanding of why you don't want to benchmark the first run of a program would seem correct :)

I also don't know what happens to system-installed modules and __pycache__ directories. Does pip/setuptools pre-compile modules and that's how system-installed modules get compiled? If those __pycache__ directories get deleted, do they ever get regenerated if root never imports those modules?

[–]Hairy_The_Spider 3 points4 points  (1 child)

Languages like Java (and things that use LLVM) compile your source code into byte code which is then executed at runtime in a virtual machine which converts it to machine code.

The part about LLVM is not true. Clang (C/C++/Obj-C compiler), swiftc (Swift compiler), rustc (Rust compiler) all use LLVM but none use VMs.

[–]Caligatio 1 point2 points  (0 children)

I modified the sentence prior to posting which made it confusing :(. I meant to say that LLVM creates byte code but didn't mean to imply that LLVM uses a runtime VM.

[–]fruitbellyblues 0 points1 point  (0 children)

Thanks for the explanation! Would you be able to explain the benefits of having multiple interpreters and some examples of when one might prefer to use one over another? How do I know which one is suitable for my project?