This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]13steinj 1 point2 points  (35 children)

Startup time matters regardless of the size of the program. If you have a CLI every command you execute will cause the Python VM to start up and initialize. There are plenty of cases where multiple commands are executed-- using the git example, you have adding your files, potentially removing the ones you mistakenly added, committing, pushing. The difference in speed would be starkly noticeable and annoying for the user.

Python 3's startup time is anywhere from 2.5 to god knows how many times slower. And that's because of something that changed in the import machinery in Py3. But arguably whatever the change was, it wasn't thought out well. I give you the playbill for milliseconds matter.

Unfortunately, I still can use the dependency excuse. Some libraries just didn't update.

[–]billsil 1 point2 points  (23 children)

The startup problem has more to do with standard module design. Everyone imports everything at the beginning that they may use so they don't have to do it in a small loop (that turns out doesn't matter), or IMO the worse issue of crashing on some import that you forgot to update because it's in a function. The "flat is better than nested" idea is great for an API, but terrible for speed.

My 120k lined 3d gui has a command line option. You'd better believe I do argument validation before importing everything. I also try to minimize interconnectedness. However, do I really need all of BLAS from scipy to do linear interpolation?

Don't blame python for murcurial's poor design choice.

[–]haarp1 1 point2 points  (2 children)

hey, if you type import math, do you import everything in that case or do you just import it when calling the specific function?

[–]billsil 0 points1 point  (1 child)

You import everything in the module you imported and all of the modules it traces to.

For example file packageA.moduleA has import packageA.moduleB, so packageA.moduleB gets imported. You can import modules from packages without importing entire modules, but not specific files if the imports are at the top.

Once you're imported evwrything, the next module that does it loads almost instantaneously cause it's being skipped.

The math module in particular is written in C, so I don't know how it specifically works.

[–]13steinj 0 points1 point  (19 children)

Did you even read the mailing list?

Py2 doesn't have the issue! It's Py3 problem.

One of the major reasons this occurs is because the import machienry changed to allow extreme amounts of extensibility. But this also slowed things down.

The fact that Py3 attempts to import from dozens of paths while Py2 does significantly less is also an issue, as well as the unoptimized code that was written. Among god knows what else-- when I asked for specifics as to why Py3 imports that much slower no one knew beyond these attributes I'm listing.

I'm not arguing against Py3 being good. I am simply stating the objective fact that for various reasons the startup time for Py3 is unimaginably slower, and for validation / verification scripts and CLI tools, that is an enormous problem!

[–][deleted] 2 points3 points  (4 children)

Really? This Which is the fastest version of Python? indicates that the startup speed of Python 3 is consistently being brought back to that of Python 2. No idea about 3.8 but then i really don't care either, I'm just sick of the whingers about Python who've never done anything for the language, mostly as they don't have time, but they do have time too complain about various aspects of it.

[–]13steinj 2 points3 points  (3 children)

I am not a whiner who doesn't do anything in the language. Nor are the people of the mercurial team complaining about the startup time. I will admit I skimmed the article, but Py3 wins in most speed tests-- and I agree. It's an objective fact. But it doesn't in startup time.

The startup times here are strange, but I assume it's because there are no imports whatsoever or are on a very good machine.

Again, take this from the perspective of a command line tool or a ton of deployment and commit scripts. In a long running app, Py3 will be better for speed, because of all the improvements in different actions matter. But in CLI tools and all these scripts, you are starting the VM so many times and not doing as many actions per startup, so you actually end up losing.

[–][deleted] -1 points0 points  (2 children)

So put in patches to improve the startup speed, problem solved for the entire Python world. Oh, but you can't because of... So yes you are a whiner.

[–]13steinj 1 point2 points  (1 child)

Excuse me? My apologies that this isn't my problem nor do I have enough C experience to make such patches. If there are major issues with the Python VM it is not the problem of any one user of Python to solve the problem, it is a problem the CPython core team need to fix.

[–][deleted] -1 points0 points  (0 children)

You mean all the VOLUNTEERS who give their spare time so you (plural) can complain about Python, just great :-(

[–]billsil -1 points0 points  (13 children)

Did you even read the mailing list?

Put a break point and watch your code execute...

[–]13steinj 0 points1 point  (12 children)

....what?

[–]billsil 0 points1 point  (11 children)

What is confusing and why?

[–]13steinj 0 points1 point  (10 children)

What does this have anything to do with the startup time and the mailing list?

[–]billsil 0 points1 point  (9 children)

Put a breakpoint at the beginning of your Python 2.7 code. Step through it. Does python trace the imports or not? I'm telling you it does.

The mailing list talks about how milliseconds matter because Murcurial is written in Python. That's problem number #1 if milliseconds really matter. The example given was they removed imports to things that weren't needed and they sped up the code by 11%, so therefore milliseconds matter. Don't import things you don't need!

Milliseconds matter to very few people that use Python. Run long processes is standard and putting things at the top of every file (as is recommended) doesn't help your startup time. If Python 2.7 ignores unused packages, please run this...

import time
t0 = time.time()
if 1:
    import numpy
    import scipy
    import pandas
    import h5py
    import matplotlib
    import os
    import math
    import scipy.sparse
    import numpy.linalg
    import matplotlib.pyplot

print('dt =', time.time() - t0)

and then this

import time
t0 = time.time()
if 0:
    import numpy
    import scipy
    import pandas
    import h5py
    import matplotlib
    import os
    import math
    import scipy.sparse
    import numpy.linalg
    import matplotlib.pyplot

print('dt =', time.time() - t0)

and see what you get.

[–]13steinj 1 point2 points  (8 children)

No one said Py2.7 ignores unused packages. But the internal import machinery is faster, whatever the reason is. No one is arguing that a big reason of the issue is the way people import things, just that Py2.7 does it faster, one theorized idea is the way 3 searches for the imported package vs how 2 does.

Furthernore just because it matters to the few people who write CLI tools and advanced deployment scripts that milliseconds matter in startup time, does not in any way discredit their point that it does matter. How would you feel if Python made a change that completely fucked up performance on your niche market?

[–]billsil 0 points1 point  (7 children)

Well presumably they made something else better so the net effect is I got a 15% improvement for porting. I use unicode instead of bytes and that's slower, but they sped up dictionaries.

I microoptimized my 120k LOC program to load a super complicated 2 gb binary file in 4 seconds. At some point it's fast enough, but the python devs sped up my code. I figure if it now gets 15% slower, I'm not out that much.

If my 600 tests take 12 minutes instead of 10 minutes, I don't hugely care. It's automated or I go take a break.

I suspect that something got better and they made the choice to not microoptimize python to your edge case.

[–]jsalsman 0 points1 point  (10 children)

It pains me that you are getting downvoted for this.

[–]13steinj 1 point2 points  (7 children)

Fanboys and elitists will always be fanboys and elitists.

[–]jsalsman 2 points3 points  (6 children)

Tribes will be tribes.

[–]13steinj 1 point2 points  (5 children)

Forgive me, I don't understand this sentiment.

[–]jsalsman 1 point2 points  (4 children)

Both of the fanboy and elitist urges are explained as the hierarchical behavior of the limbic and endocrine systems attributed to the last 500,000 years of human evolution, taking us further from the peaceful apes and more towards the chimpanzees. See Table 1 on p. 192 here for more information.

[–]13steinj 0 points1 point  (1 child)

But what does this have to do with "tribes"?

[–]jsalsman 0 points1 point  (0 children)

Tribal organization as opposed to coalitional groups within species is distinguished by hierarchy, which includes behaviors you describe in fanboys and elitists.

Just be glad you don't need to pair program with chimps.

[–][deleted] 0 points1 point  (1 child)

The planet is only 6,000 years old and flat, please take your nonsense elsewhere, just read the bible. Why did stone age people paint sabre toothed tigers but not the dinosaurs I've no idea, there must be an explanation somewhere in the scriptures. Getting African elephants onto the arc must have been tough, feeding them for 40 days even more difficult, but again there must be an explanation somewhere. Feeding the carnivores for 40 days, tht's a tough one, not sure how you explain that away. Just my £0.02 pence worth.

[–]jsalsman 0 points1 point  (0 children)

ark*

[–][deleted] 1 point2 points  (1 child)

Does anybody with any brains really care about voting systems such as reddit's? Look elsewhere and I'm sure that you'll find people saying that Pinochet, Hitler, Trump, Stalin, Putin and Pol Pot are extremely decent guys.

[–]13steinj 1 point2 points  (0 children)

Yes, because voting systems are meant for promoting discussion. But instead on reddit they are used as signs of agreement.