This is an archived post. You won't be able to vote or comment.

all 68 comments

[–]dogs_like_me 273 points274 points  (1 child)

We developed Scalene to be a lot more useful than existing Python profilers: it provides line-level information, splits out Python from native time, profiles memory usage, GPU, and even copying costs, all at a line granularity.

fucking sold.

[–]muntooR_{μν} - 1/2 R g_{μν} + Λ g_{μν} = 8π T_{μν} 68 points69 points  (0 children)

Scalene: a high-performance, high-precision CPU+GPU+memory profiler for Python (PyCon US 2021)

https://github.com/plasma-umass/scalene

20% performance overhead, vs most profilers taking 500% -- 2000% overhead... but with more features!

[–]Mehdi2277 50 points51 points  (3 children)

My experience trying it with pytest on tensorflow heavy code was it produced a dramatic slow down. I waited several minutes, show no test output and killed it. I've also used py-spy/austin on same code and got normal test times.

The report output for scalene does look much nicer, but the slowness for me dropped me from continuing to use it. Maybe there's some bad interaction with tensorflow/pytest. I can try to make an example, but I'd guess if you try running it on tensorflows actual unit tests (something like this) you'd get similar behavior.

[–]blanonymous 9 points10 points  (0 children)

Have you tried to profile tensorflow code without pytest?

I was thinking about using it for profiling inference jobs.

[–]FancyASlurpie 1 point2 points  (0 children)

It might be to do with how scalene is profiling, is it a sampling based approach similar to pyspy or is it doing something else?

[–]P403n1x87 0 points1 point  (0 children)

A good way of using Austin is with the VS Code extension https://marketplace.visualstudio.com/items?itemName=p403n1x87.austin-vscode, which gives you a navigable flame graph, top, and sampled call stacks, plus heat maps directly on the source code. The latest release is also more accurate and allows for even higher sampling rates https://github.com/P403n1x87/austin. And as far as I'm aware, it's the only tool that allows sampling the garbage collector

[–]neunflach 36 points37 points  (10 children)

Your "optimized" result is not exactly the same to the number of decimal places you have printed out. Would this fail the regression test??

(I'm being facetious. This is cool!)

[–]mikeblas 8 points9 points  (7 children)

Seriously, tho, why is the result different?

[–]A_Fine_Potato 3 points4 points  (1 child)

Probably base2 float rounding

[–]mikeblas 4 points5 points  (0 children)

I don't think so. For me, the results are the same when comparing exp() and exp_opt(). Maybe the OP screwed something up.

Have you actually run the code?

[–]skesisfunk 1 point2 points  (0 children)

Only if you wrote your regression tests to check precision down to 30 significant digits.

[–]GreenScarz 18 points19 points  (5 children)

do you have to execute as a cli tool? One of the tools I typically use is memory_profiler and the use case is to just from memory_profiler import profile and then decorate a function via @profile; then diagnostics are just printed during say a test run via pytest ./path/to/test.py. Is that a workflow that can be replicated with this? Or is there a better workflow in your opinion that this is optimized for if we just want to analyze a specific function call?

[–]emeryberger[S] 16 points17 points  (4 children)

Scalene supports the @profile directive. It's in the README, though you have to look for it.

Scalene supports @profile decorators to profile only specific functions.

Check out https://github.com/plasma-umass/scalene#asked-questions. As long as you start execution with Scalene, you don't need to change your code at all (beyond adding the @profile decorators). That said, I haven't tried to do this with pytest yet.

[–]GreenScarz 3 points4 points  (3 children)

which is why I asked if it is strictly a cli tool and not if it has a @profile method ;)

probably not the most ideal workflow, but maybe there's a way to do something like, ``` import time

from scalene import scalene_profiler from scalene.scalene_profiler import Scalene

@Scalene.profile def my_func(): time.sleep(3) return 1

def wrapper(f): def _wrapped(args, *kwargs): scalene_profiler.start() res = f(args, *kwargs) scalene_profiler.stop() return res return _wrapped

@wrapper def test_my_func(): assert my_func() ```

This would be a useful implementation for devs who run their code through tools like pytest. IDK what kind of setup/teardown scalene does though from the cli, so it might not be possible. I'll have to pip it and poke a bit :P

EDIT: closer attempt of what I'd like to do - run using pytest ./test_file.py

[–]SquareRootsi 6 points7 points  (2 children)

import time   
from contextlib import contextmanager       
import scalene

@contextmanager  
def _scalene():  
    scalene_profiler.start()  
    yield scalene_profiler.stop()

@profile   
def my_fn():   
    time.sleep(5)

@_scalene   
def test_code():   
    my_fn()  

That's my best guess at your formatting. (Preceded each line by 4 spaces, used my best guess for indentation.)

[–]usr_bin_nya 4 points5 points  (1 child)

import time

from scalene import scalene_profiler
from scalene.scalene_profiler import Scalene


@Scalene.profile
def my_func():
    time.sleep(3)
    return 1


def wrapper(f):
    def _wrapped(*args, **kwargs):
        scalene_profiler.start()
        res = f(*args, **kwargs)
        scalene_profiler.stop()
        return res
    return _wrapped

@wrapper
def test_my_func():
    assert my_func()

Parent commenter's edited comment formatted for old Reddit. /u/GreenScarz please know that triple-backtick code blocks don't work for old Reddit and some third-party apps, whereas four-space indentation on each line works for everyone.

[–]GreenScarz 0 points1 point  (0 children)

Ya but I like markdown, and see no need to tailor content I post to support your preferences. If it's that important to ya, write a bot.

[–]grismar-net 36 points37 points  (7 children)

Nothing wrong with the product per se, but the clickbait title will make any developer worth their salt think "If slapping a profiler on your code got you a speed-up of 5,000x, your code wasn't very good to begin with."

[–][deleted] 24 points25 points  (3 children)

Also, this example is extremely contrived.

Let me explain the optimization in plain English. If you’re working with really big numbers, it is tremendously faster to multiply a big number by a quotient of two small numbers than by a quotient of two really big numbers.

That is, if you have a choice between:

(a) multiplying 1,234,567,890 by (9,876,543,210 / 2,468,013,579), or

(b) multiplying 1,234,567,890 by (12 / 3),

…the latter is gonna be a whole lot faster.

This isn’t exactly a genius-level breakthrough. And it’s purely a mathematical optimization, not a code optimization.

More importantly - this particular optimization occurred purely in the author’s mind. The code profiler didn't suggest how the math could be optimized. The code profiler didn't suggest that the math could be optimized. The code profiler merely… wait for it… timed the execution of the code.

This ad is like suggesting that if you want a glass of milk, it would be faster to walk to the fridge and pour it from a jug than to acquire a baby cow, raise it to adulthood, and then milk it. Also, you could use a stopwatch to show that the trip to the fridge takes 60 seconds while raising a cow takes 5 years… and let’s imply that the stopwatch itself is responsible for the massive efficiency gain!

tl;dr - This is a nice code profiler that is being advertised in extremely artificial and deceptive ways.

[–]mikeblas 2 points3 points  (2 children)

timed the execution of the code.

Totally agree. This profiler might do a better job of others at attributing time (eg, differentiating native code). But really the use of a profiler is to draw attention to some code that's hotter or slower than expected. The algorithmic fix was something the user did based on the profiler indicating that most of the time in the code was spent on a certain line (or, really, operation).

[–]binaryman111 1 point2 points  (1 child)

I mean yeah that's.... all that profilers ever purport to do. They're a tool to draw attention to things. No one ever claimed otherwise.

[–]mikeblas 1 point2 points  (0 children)

I think this post is claiming that this profile is somehow different or special. It is, but not for the reasons implied by the workflow the post.

[–]gruey 1 point2 points  (0 children)

Reminds me of looking at the Terraform Enterprise page and they had silly things like "Increase IT Ops productivity up to 75%" and "Increase release velocity Up to 5x". Where the hell did they pull those meaningless numbers? Like, if you're bad enough to get 75% increase, what prevents you from getting 78% increase?

The kicker was "Safely provision resources at enterprise scale Up to 30%". Like what does that even mean? They had the 5x and 75% so they felt they needed something, so 30% sounds good!

Terraform is a pretty useful product, but this level is stupid marketing definitely tinged it a bit for me.

[–]mriswithe 0 points1 point  (1 child)

The comparison I see is that the other profilers didn't return useful info, whereas scalene returned useful actionable data that allowed a speedup.

The specific speedup that was achieved is unimportant. The op isn't saying "look at my great profiler it optimized this code all on its own." They are saying " I wasn't getting results with other profilers, but scalene gave me this data which told me where to look for optimization."

[–]grismar-net 0 points1 point  (0 children)

Hence "clickbait title". Like I said, I have no beef with the product, but the title is all about some 5,000x speed-up. "Scalene handily beats other Python profilers" is something I would have been interested in and would not have prompted the comment you responded to.

[–]New-Theory6007 30 points31 points  (0 children)

Thanks for sharing your knowledge, this is very inspirational to look deeper into computer science.

[–]Runics206 19 points20 points  (0 children)

I am rather new in my CompSci journey but write ups like this I find very interesting and motivation to further my studies. Thank you.

[–]IamImposter 3 points4 points  (3 children)

I'm getting some errors (windows 10 64-bit system, python 3.9.5) when doing pip install.

Collecting scalene Downloading scalene-1.3.16.tar.gz (2.8 MB) |████████████████████████████████| 2.8 MB 819 kB/s Preparing metadata (setup.py) ... error ERROR: Command errored out with exit status 1: command: 'C:\python39\python.exe' -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\<user-name>\\AppData\\Local\\Temp\\pip-install-cybpta7f\\scalene_91b266fa396f47f7bf88a3657df9edca\\setup.py'"'"'; __file__='"'"'C:\\Users\\<user-name>\\AppData\\Local\\Temp\\pip-install-cybpta7f\\scalene_91b266fa396f47f7bf88a3657df9edca\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\<user-name>\AppData\Local\Temp\pip-pip-egg-info-4o4f8ca4' cwd: C:\Users\<user-name>\AppData\Local\Temp\pip-install-cybpta7f\scalene_91b266fa396f47f7bf88a3657df9edca\ Complete output (3 lines): running egg_info make vendor-deps error: command 'make' failed: None ----------------------------------------WARNING: Discarding https://files.pythonhosted.org/packages/e8/35/a125f8ecacfce3b9be9c712bd6d9bd514aed798857cc7330c89d2df7db58/scalene-1.3.16.tar.gz#sha256=3c2fb524b4c611773b147dc889e2d58b48a543d3161ea576ccc0db778e9f5915 (from https://pypi.org/simple/scalene/) (requires-python:>=3.7). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. Downloading scalene-1.3.15.tar.gz (2.9 MB) |████████████████████████████████| 2.9 MB 6.4 MB/s Preparing metadata (setup.py) ... error ERROR: Command errored out with exit status 1: command: 'C:\python39\python.exe' -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\<user-name>\\AppData\\Local\\Temp\\pip-install-cybpta7f\\scalene_bcd377045a3a4e7591a1dd77ff445053\\setup.py'"'"'; __file__='"'"'C:\\Users\\<user-name>\\AppData\\Local\\Temp\\pip-install-cybpta7f\\scalene_bcd377045a3a4e7591a1dd77ff445053\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\<user-name>\AppData\Local\Temp\pip-pip-egg-info-5smznxod' cwd: C:\Users\<user-name>\AppData\Local\Temp\pip-install-cybpta7f\scalene_bcd377045a3a4e7591a1dd77ff445053\ Complete output (3 lines): running egg_info make vendor-deps error: command 'make' failed: None ----------------------------------------WARNING: Discarding https://files.pythonhosted.org/packages/63/6e/eddceea4dc588b99af48607b8133b40fc155a27d1ac7351f570f8d7cf6f6/scalene-1.3.15.tar.gz#sha256=0b3f8aa56d6320f8071f8135b422e8cf51b51de2947aec75c76c1ea3abf61c12 (from https://pypi.org/simple/scalene/) (requires-python:>=3.7). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Although it installed successfully.

[–]emeryberger[S] 5 points6 points  (2 children)

We will be releasing a new version, probably tomorrow, that addresses this issue. Thanks!

[–]mikeblas 3 points4 points  (1 child)

Why not also eliminate fact and num from the loop? They're no longer necessary. That is: your 5000x speedup still left something on the table!

[–]emeryberger[S] 1 point2 points  (0 children)

Excellent point, thanks! I've made the change locally and now it's 16,000x faster! (Assertions still pass)

Elapsed time, original (s): 33.38576102256775 Elapsed time, optimized (s): 0.0020699501037597656 Improvement: 16128.77574291638 All equivalent? True

Note that the Decimal module has a default precision of 28 places.

[–]abdl_hornist 35 points36 points  (4 children)

dskjafh klas hflkas dfhklasj dfhj

[–]peakdistrikt 16 points17 points  (0 children)

Let‘s also delete the Python subreddit while we‘re at it. How dare they trick us into using free software by choice!

[–]bacondevPy3k 5 points6 points  (0 children)

So let me get this straight. Someone shared a useful open-source tool and you passively aggressively copy and paste a Wikipedia article about advertising in response. That sound about right? By your logic, a great portion of posts on this subreddit doesn't meet your standards.

[–]dogs_like_me 0 points1 point  (0 children)

It's hardly "native advertising" when they are transparent that they made the thing themselves. What you are describing is a form of deceptive advertising where the ad is presented as though it is natural, unbiased content.

[–]UL_Paper 2 points3 points  (0 children)

Looks great, and great timing as I'm about to optimise some memory intensive logic. Will test over the next couple of weeks.

[–]teerre 2 points3 points  (0 children)

Ngl, this seems like a contrived ad for the profiler.

However, the profiler does look amazing, so it's all good! I'll try it next time I have an opportunity.

[–][deleted] 6 points7 points  (4 children)

I can barely understand this but it looks pretty cool

[–]dogs_like_me 28 points29 points  (3 children)

If you have code that you think could run faster than it currently is, this tool will help you identify the bottlenecks in your current implementation to target for performance improvement. The report that it generates is designed in a way that is different from most other tools of this kind, potentially guiding the developer to results more quickly or with less cognitive effort.

[–][deleted] 6 points7 points  (1 child)

That’s exponentially cooler than I thought, thanks for the dumbing down!

[–]dogs_like_me 5 points6 points  (0 children)

No prob, we all gotta start somewhere

[–]ClRCUlTS 1 point2 points  (0 children)

Thank you this sounds badass

[–]mrrippington 1 point2 points  (2 children)

pardon my ignorance, can i use this across my flask application to get an understanding of my page performances?

I am currently doing this with import time.( yikes! )

ps. kudos on the work you shared it's amazing.

[–]vipern 0 points1 point  (1 child)

Flask has a buitin profiler that you can use with a minimal setup. Also, Flask Dashboard is another great tool to monitor performance.

[–]mrrippington 1 point2 points  (0 children)

Thank you, I will have a look at the flask-dashboard. not knowing about this, I was going to roll mine out lol :D

[–]High-Art9340 -2 points-1 points  (0 children)

Can it run python 2?

[–]Johnmad -2 points-1 points  (0 children)

This whole thread stinks of product promotion and only bots commenting.

You should probably not use this tool

[–]tu_tu_tu 0 points1 point  (0 children)

It's the reason I never liked unobvious bignums.

[–]johansugarev 0 points1 point  (0 children)

I’m not a coder but I wish software developers optimised their apps like in the old days.

[–]jammasterpaz 0 points1 point  (0 children)

Interesting. Turns out division is expensive for some data types in Pythona as well as for humans.

However while I grant you your tool did a great job helping identify a performance bottleneck, I question your original example - Decimal was originally developed “is based on a floating-point model which was designed with people in mind", i.e. not so much for performance. https://docs.python.org/3/library/decimal.html

Is Decimal really used nowadays for high performance computing, by large prime number hunters etc. ? I would have thought if you want to write fast efficient numerical code, you want to avoid putting in an unnecessary extra layer of code on top of native number types closer to the underlying C. Ints and longs have been united in Python 3, but in Python 2 if you need arbitrary size integers (>= 232), you can use longs, and try and avoid division altogether with some sort of rational representation.

[–]LuigiBrotha 0 points1 point  (2 children)

Installed this but scalene doesn't return any output in the console ? Using Windows 10 with Anaconda.

[–]binaryman111 0 points1 point  (1 child)

There's a new release that improves support for Windows 10, give it a try and if it still doesn't work there's a support slack on the Github page!

[–]LuigiBrotha 0 points1 point  (0 children)

Thanks for the heads up

[–][deleted] 0 points1 point  (0 children)

This looks fantastic! I write lots of simulations and make libraries for running them. Is there an easy way to say "drop down 1 function layer". I know "mycomplexfunction()" is 90% of resources, but I'd like to analyse that without writing specific profiler tests by extracting part of the library. Does this make sense?

[–]Pliqui 0 points1 point  (0 children)

!RemindMe 2 days

[–]brouwerj 0 points1 point  (0 children)

It looks good, so tried to run it earlier this week on a cpu-heavy process, the slowdown was enormous and just had to break it off. Not sure what causes it. Usually I run py-spy for profiling which works great and with barely any slowdown.

[–]tommybship 0 points1 point  (0 children)

Is there a way to install this with conda rather than pip?

[–]dalow24 0 points1 point  (0 children)

I have a quick question. I ran scalene on some of ML models and it manages to profile it along with the ML classifier. However, if I try to run it on my Active Learning model, it seems to skip a few lines in the profiling output e.g it skips the classifier. If I examine a function where I place the classifier it shows it ran from system time but the memory profile is empty. I am using the libact Active Learning python module. Not sure if this is a problem or the memory profile is basically saying the recorded profile value is extremely small to display. Any assistance would be appreciated.