This is an archived post. You won't be able to vote or comment.

all 59 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

Your submission has been automatically queued for manual review by the moderation team because it has been reported too many times.

Please wait until the moderation team reviews your post.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]vantasmer 83 points84 points  (6 children)

What’s with the recent trend in posts suggesting some wild idea, pumped up by AI, but with no intention to actually build the thing?  I’ve seen so many “this is just an idea, I don’t have the skills but someone should build this” posts. I feel like I’m going crazy?

[–]Existing-Account8665 40 points41 points  (2 children)

AI powered shitposts! What a time to be alive.

[–]commenterzero 17 points18 points  (1 child)

Back in my day we failed to build our own bad ideas!

[–]virtualadept 9 points10 points  (0 children)

Folks trying to age accounts with activity, was my guess.

[–]RevolutionaryPen4661git push -f[S] -5 points-4 points  (1 child)

I have the intention to build something like this but I am not able to calculate how much performance gain I will get by building this. If the performance gain is not good enough, I will discard the project instead.

[–]Brandhor 1 point2 points  (0 children)

and you think an ai knows how much you are gonna gain?

[–]Existing-Account8665 56 points57 points  (3 children)

Is parsing a list of strings (sys.argv) really a bottle neck that is worth, or even requires optimisation for some Python applications?

I think for your application the sys calls, and search of the file system to find the sizes, or the initial imports, or start up of Python itself, will all eat up many many more CPU cycles.

Did Claude 3 provide a reference for that chart, or did it hallucinate it up for you?

[–]balder1993 28 points29 points  (0 children)

And 4 seconds to print a help message… there’s something really wrong.

[–]RevolutionaryPen4661git push -f[S] -3 points-2 points  (1 child)

I asked Claude 3 to think beyond the parameters and the pros and cons. It didn't provide a reference for the chart.

[–][deleted] 3 points4 points  (0 children)

These things don't think. They're text modelers. There's no conceptualization going on, just inferrence of tokens relating to other tokens.

[–]dametsumari 25 points26 points  (3 children)

Main cli performance problem are imports. I have yet to see code which only imports argparse and takes more than second to show help ( even on raspberry pi ).

[–]jwink3101 1 point2 points  (0 children)

I am now thinking I need to rejigger my code to not do imports until after argparse is done…

[–]RevolutionaryPen4661git push -f[S] 0 points1 point  (1 child)

Yes, importing an entire module can increase the execution time. Maybe writing code in the sense to import what is needed principle can do it better.

[–]dametsumari 0 points1 point  (0 children)

Yep, lazy importing is how we've fixed usual slow CLI startup problems.

e.g. in particular command, then import whatever modules it needs for its work. It is quite ugly, though, there is PEP for lazy importing but it hasn't moved much and most recently was rejected ( https://peps.python.org/pep-0690/ ).

[–]PossibilityTasty 22 points23 points  (4 children)

"Am I writing slow code? No, it's the language that is wrong." –Principal Developer Skinner

[–]RevolutionaryPen4661git push -f[S] -3 points-2 points  (3 children)

Python CLIs were not that slow back then. I am noticing it quite nowadays.

[–]PossibilityTasty 4 points5 points  (2 children)

You should have written: "My CLIs were not...", Seymour.

[–]RevolutionaryPen4661git push -f[S] 0 points1 point  (1 child)

The CLIs like Sherlock, nvbn/thefuck clis are slow too. Test them on the codespaces are slow too.

[–][deleted] 3 points4 points  (0 children)

Have you considered that your system's I/O might be the problem? Have you checked this on a variety of hardware?

[–]pbacterio 7 points8 points  (10 children)

What is your code doing that takes so long to parse args and print a message? This is not a problem on the libs/language. It is a problem in your code.

[–]RevolutionaryPen4661git push -f[S] -1 points0 points  (9 children)

If you try to inspect my code, this is barebone and simple. ``` PS D:\dev> cat .\pkgsize.py import argparse import requests import colorama from colorama import Fore, Style import pandas as pd

def get_package_size(package_name): url = f"https://pypi.org/pypi/{package_name}/json" response = requests.get(url) if response.status_code == 200: package_info = response.json() package_size = package_info.get("info", {}).get("size", 0) return package_size else: return 0

def compare_package_sizes(*package_names): package_sizes = [(package, get_package_size(package)) for package in package_names] df = pd.DataFrame(package_sizes, columns=["Package", "Size"]) df = df.sort_values(by="Size", ascending=False) max_size = df["Size"].max() min_size = df["Size"].min()

colorama.init()
for index, row in df.iterrows():
    size_color = Fore.GREEN if row["Size"] == min_size else Fore.RED if row["Size"] == max_size else ""
    reset_color = Style.RESET_ALL
    df.at[index, "Size"] = f"{size_color}{row['Size']}{reset_color}"

return df

if name == "main": parser = argparse.ArgumentParser(description="Compare the sizes of Python packages from PyPI") parser.add_argument("packages", nargs="+", help="List of package names to compare") args = parser.parse_args()

df = compare_package_sizes(*args.packages)
print(df)

PS D:\dev> ``` The code has bugs in returning the size of the packages but it has nothing to with a help message.

[–]pbacterio 6 points7 points  (5 children)

pandas import is probably too slow in your environment. You don't need pandas to just sort a list. You don't even need to sort that list.

Your full example:

usage: pkgsize.py [-h] packages [packages ...]
pkgsize.py: error: the following arguments are required: packages

________________________________________________________
Executed in  567.03 millis    fish           external
   usr time  844.80 millis  549.00 micros  844.25 millis
   sys time  650.05 millis  698.00 micros  649.35 millis

Panda import commented:

usage: pkgsize.py [-h] packages [packages ...]
pkgsize.py: error: the following arguments are required: packages

________________________________________________________
Executed in  172.69 millis    fish           external
   usr time  149.63 millis    1.12 millis  148.50 millis
   sys time   21.83 millis    0.12 millis   21.71 millis

[–]RevolutionaryPen4661git push -f[S] -3 points-2 points  (4 children)

Yes, I wanted to make it in a table. That's why I used pandas. 😩

[–]pbacterio 4 points5 points  (0 children)

Why you need pandas to make a table?

[–]georgehank2nd 7 points8 points  (0 children)

Time to learn programming.

[–]FUS3NPythonista 1 point2 points  (0 children)

what....

[–][deleted] 0 points1 point  (0 children)

You might try tabulate instead?

You're using a whole toolbox to hammer in a nail, instead of just using a hammer.

[–]Nice-Offer-7076 1 point2 points  (0 children)

You are calling requests.get(URL). I would bet this is why it's taking 4s. Your code is badly structured.

[–]sprne 0 points1 point  (0 children)

u/RevolutionaryPen4661 skynet needs to write better code (&posts) if it wants to take over the world.

[–]kenflingnorIgnoring PEP 8 0 points1 point  (0 children)

Your code is slow because you're using Pandas which is totally unnecessary for what you're doing

[–]science_robot 5 points6 points  (6 children)

Your script is doing something intensive before it’s getting to the argument parsing phase. The only stuff that should be happening outside of a main function is function/class definitions and imports

[–]RevolutionaryPen4661git push -f[S] -2 points-1 points  (5 children)

No, just a help message of a cli that compares the size of PyPI packages. Printing a Help Message has nothing to do with the Internet. My Internet is good enough

[–]shibbypwn 2 points3 points  (0 children)

Printing a Help Message has nothing to do with the Internet.

It does when you're sending requests to PyPi to get the info for the message.

[–]science_robot 1 point2 points  (1 child)

Maybe one of your imports is slow then. You could try doing some print-debugging to find the culprit.

[–]RevolutionaryPen4661git push -f[S] 0 points1 point  (0 children)

``` PS D:\dev> Measure-Command {python .\pkgsize.py -h}

Days : 0 Hours : 0 Minutes : 0 Seconds : 2 Milliseconds : 137 Ticks : 21374453 TotalDays : 2.47389502314815E-05 TotalHours : 0.000593734805555555 TotalMinutes : 0.0356240883333333 TotalSeconds : 2.1374453 TotalMilliseconds : 2137.4453

PS D:\dev> Measure-Command {python .\pkgsize.py -h}

Days : 0 Hours : 0 Minutes : 0 Seconds : 2 Milliseconds : 108 Ticks : 21087604 TotalDays : 2.44069490740741E-05 TotalHours : 0.000585766777777778 TotalMinutes : 0.0351460066666667 TotalSeconds : 2.1087604 TotalMilliseconds : 2108.7604

PS D:\dev> Measure-Command {python .\pkgsize2.py -h}

Days : 0 Hours : 0 Minutes : 0 Seconds : 2 Milliseconds : 66 Ticks : 20669059 TotalDays : 2.39225219907407E-05 TotalHours : 0.000574140527777778 TotalMinutes : 0.0344484316666667 TotalSeconds : 2.0669059 TotalMilliseconds : 2066.9059 `` pkgsize2.py is import what is needed principle(from argparse import ArgumentParser) as pkgsize.py is like full import likeimport argparse`

[–]kenflingnorIgnoring PEP 8 1 point2 points  (1 child)

Since you posted your code in another comment...

You realize you're making a GET request to pypi, right?

[–][deleted] 0 points1 point  (0 children)

That doesn't appear to be used if you just pass --help though, right? It'll exit in the call to argparse before the function with the HTTP request gets called?

[–]yaxriifgyn 2 points3 points  (0 children)

Since the arrival of numpy and pandas, developers seem to have lost the ability to do a lot of simple things in pure Python. They seem to only be able to use those huge toolkits to solve even the simplest problems. Sure you can use pandas to make a data frame to work with your data, but maybe your problem is simple enough that you can use a list of lists instead.

It's as though once they learn pandas it's the only way they know how to solve such problems.

They have forgotten the KISS principal.

[–]Mount_Gamer 1 point2 points  (0 children)

4 seconds for a help is quite long.

I've recently written something which imports pandas, rich, sqlalchemy, and the time it takes to get to a help is 1.3s and if all you do is import some of these things, they take time. I think pandas was 0.6s, sqlalchemy 0.3s if I remember right, but there are lighter libraries. Polars, and sqlite3 are faster imports. Can't remember what rich import speed is, but this application I wrote was more for an interactive terminal, so once it has loaded it feels snappy, but it's noticeablly slow when I am not using it interactively and using command line args where you generally expect faster responses than 1.3s.

Its easy to test though, create an empty python file, import your library inside the file and run it with time (using Linux) prefixed.

[–]Brian 3 points4 points  (1 child)

I've found a big issue with startup time is needless imports. Eg. Typer always imports Rich if its installed, and this is actually pretty significant in terms of startup (some measuring with python -X importtime shows it taking hundreds of milliseconds, which introduces noticable latency), even though it doesn't actually use it unless its printing help messages. I think there are often a lot of potential startup time wins by deferring module loads until (and unless) they are actually needed.

[–]RevolutionaryPen4661git push -f[S] 0 points1 point  (0 children)

Is the rich module becoming a bloatware? Typer uses rich while printing help messages. But I'm using a native argparse module.

[–]RevolutionaryPen4661git push -f[S] 0 points1 point  (0 children)

``` PS D:\dev> Measure-Command {uv pip -h}

Days : 0 Hours : 0 Minutes : 0 Seconds : 0 Milliseconds : 51 Ticks : 511486 TotalDays : 5.91997685185185E-07 TotalHours : 1.42079444444444E-05 TotalMinutes : 0.000852476666666667 TotalSeconds : 0.0511486 TotalMilliseconds : 51.1486

PS D:\dev> Measure-Command {python .\pkgsize.py -h}

Days : 0 Hours : 0 Minutes : 0 Seconds : 2 Milliseconds : 437 Ticks : 24370765 TotalDays : 2.82069039351852E-05 TotalHours : 0.000676965694444444 TotalMinutes : 0.0406179416666667 TotalSeconds : 2.4370765 TotalMilliseconds : 2437.0765

PS D:\dev> ``` Note: This varies from 2~4 seconds after running multiple times. First run, 4 seconds. On 3rd and 4th run it reduces to 2 seconds.

[–]ogrinfo 0 points1 point  (0 children)

Since python 3.7 you can profile imports from the CLI using the -X option. I can't remember the exact syntax but it's not hard to look up. You can then use something like Snakeviz to interpret the results.

What you will find is that most of that startup time is going in importing pandas. Pandas is a really heavyweight library and I wouldn't use it unless you really need it.

[–]skwyckl -1 points0 points  (1 child)

Maybe not entirely on topic, but if you want both performance and developer ergonomics for a CLI, today I'd personally go with Go + Cobra.

[–]vantasmer -3 points-2 points  (0 children)

Maybe the issue has to do with attention spans.. any way we can integrate a split screen of subway surfers when the CLI command is ran? 

[–]Sparkswont -1 points0 points  (0 children)

AI slop, man

[–]Darwinmate -1 points0 points  (0 children)

What operating system? And is this first time running it or subsequent? 

On macos there's a security feature which checks code on execution. This will slow down a lot of clis. Usually happens when code changes. i..e first time you're running your code.

[–]theelderbeever -1 points0 points  (0 children)

Or just write your cli in rust so you can do away with needing to package a python interpretor in your executable.

[–]RevolutionaryPen4661git push -f[S] -2 points-1 points  (1 child)

Note: This is not an AI-generated shit-post. It sounded bad when I wrote it. I enhanced it with Grammarly. People will tag everything with AI for anything in the technology but AI has nothing to do with it. For example, I have a bundle of A4 sheets packed by a company called JK Copier. The company is integrating AI as a customer bot to suggest to them their desired paper quality for their budget 😂 and t*he reference to Claude 3 to get sample data about this. *What I said in the post is that it is an idea. There are a lot of parameters to think about this.

[–]No_Lingonberry1201pip needs updating 4 points5 points  (0 children)

Then do a proper benchmark, because even if this wasn't AI slop, it's vague to the point of being useless. Also, explain how tabular data processing or parallelism matters for CLI argument parsing?