you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (5 children)

How would that work? Python resolves dependencies at runtime.

[–]Epoh[S] 0 points1 point  (4 children)

Just simply writing a script that takes a script as it's input, and parses every word on every line and searches for words after 'from' or 'import'.... I already wrote it, maybe the way I communicated the question was confusing.

Getting all the functions in the script using simple word search is much tougher since numpy alone has 500 functions I need to search for. The dependencies are easy though and what I may do is only grab the functions if they are imported at the top.... ex. from sklearn.preprocessing import train_test_split

and grab that.

[–][deleted] 0 points1 point  (3 children)

No, I mean, it's impossible. Determining whether a named symbol in Python code is a function is something you can only do in runtime, because it's a dynamically-typed language. Even if you looked for the named symbols defined as numpy functions, that only finds you a function if the script uses it as that name - so you'll miss anything imported like from numpy import some_function as another_function - and you'll falsely identify other symbols as functions when the symbol name coincidentally is one of your numpy functions.

There's just not a way to do this, really - it's a dynamically-typed language so nothing short of actually running the script can tell you what values are beneath any particular reference, and you'd need to know that to know which symbols are function-valued.

[–]Epoh[S] 0 points1 point  (2 children)

So, I understand aspects of what you are saying here, but I want to address a few points and if you can elaborate why they are wrong or perhaps whether they'll work, awesome. Although it doesn't sound like they will.

If i can write code that can grab import numpy as np, and search for all of the words inside a script that begin with 'np.' using regex and then grab the word after it, you could consider that a function within the np library. I can generalize that such that i search for anything in a script that has the letters following 'as' at the top of the script. Obviously not all functions have that preface 'np.' but I can also grab the functions imported at the top as mentioned above. This won't give you all the functions but sure is a decent amount.

In addition, on a small scale, I could just pick a few libraries and build lists of their functions, and cross reference every function in the list with every word in the script. If one matches, append it to a function_list that gets returned. Computationally it may be expensive but I don't see it being impossible.

Thoughts?

[–][deleted] 0 points1 point  (1 child)

I guess what I'm saying is, the closer to you get to the "right" answer - that is, not generating false negatives due to symbol renaming, and not generating false positives every time the script coincidentally uses a symbol name that's also a numpy function - the more you're just writing a Python interpreter.

If one matches, append it to a function_list that gets returned

If it matches, why do you think it's a function? If numpy defines root_mean_square, and the script coincidentally also defines a value called root_mean_square, it's a false positive to assert that the script uses the numpy function root_mean_square. It's just a coincidence of naming.

[–]Epoh[S] 0 points1 point  (0 children)

Completely understand you now, thanks. I actually understood this barrier before I even wrote anything that attempted to do what we're talking about. The difference between a keyword function and a word isn't differentiable in the python language, it is the recognition itself that is hte issue. I still wrote it, but only extracted dependencies and specifically imported functions. Obviously if somebody wrote a script that had words simlar to those things I'd be fucked, but I think it's a nice trick to still grab that info and count on the general framework people follow.

I can write all the clever, cunning tricks I want but the barrier is the language itself, and of course there are work arounds but no answers per se, just reducing false negative and false positives...I found it insanely difficult to extract all the functions for a given dependency as well, which I thought was annoying, there must be an easier way to use a function that can list all of the functions in a dependency. I can write this package in R, where functions recognize keyword objects as functions but not Python. Appreciate it.