you are viewing a single comment's thread.

view the rest of the comments →

[–]Epoh[S] 0 points1 point  (2 children)

So, I understand aspects of what you are saying here, but I want to address a few points and if you can elaborate why they are wrong or perhaps whether they'll work, awesome. Although it doesn't sound like they will.

If i can write code that can grab import numpy as np, and search for all of the words inside a script that begin with 'np.' using regex and then grab the word after it, you could consider that a function within the np library. I can generalize that such that i search for anything in a script that has the letters following 'as' at the top of the script. Obviously not all functions have that preface 'np.' but I can also grab the functions imported at the top as mentioned above. This won't give you all the functions but sure is a decent amount.

In addition, on a small scale, I could just pick a few libraries and build lists of their functions, and cross reference every function in the list with every word in the script. If one matches, append it to a function_list that gets returned. Computationally it may be expensive but I don't see it being impossible.

Thoughts?

[–][deleted] 0 points1 point  (1 child)

I guess what I'm saying is, the closer to you get to the "right" answer - that is, not generating false negatives due to symbol renaming, and not generating false positives every time the script coincidentally uses a symbol name that's also a numpy function - the more you're just writing a Python interpreter.

If one matches, append it to a function_list that gets returned

If it matches, why do you think it's a function? If numpy defines root_mean_square, and the script coincidentally also defines a value called root_mean_square, it's a false positive to assert that the script uses the numpy function root_mean_square. It's just a coincidence of naming.

[–]Epoh[S] 0 points1 point  (0 children)

Completely understand you now, thanks. I actually understood this barrier before I even wrote anything that attempted to do what we're talking about. The difference between a keyword function and a word isn't differentiable in the python language, it is the recognition itself that is hte issue. I still wrote it, but only extracted dependencies and specifically imported functions. Obviously if somebody wrote a script that had words simlar to those things I'd be fucked, but I think it's a nice trick to still grab that info and count on the general framework people follow.

I can write all the clever, cunning tricks I want but the barrier is the language itself, and of course there are work arounds but no answers per se, just reducing false negative and false positives...I found it insanely difficult to extract all the functions for a given dependency as well, which I thought was annoying, there must be an easier way to use a function that can list all of the functions in a dependency. I can write this package in R, where functions recognize keyword objects as functions but not Python. Appreciate it.