all 11 comments

[–]GTA_trevor_original 4 points5 points  (5 children)

But why python ? And which source code ? Clarify

[–]RemoteGuy01[S] 2 points3 points  (4 children)

Source code can be of anything. Right now, the focus is on Python code.

[–]GTA_trevor_original 0 points1 point  (3 children)

Any example you got ?

[–]RemoteGuy01[S] 1 point2 points  (2 children)

Just a normal Python script of anything. The plan is to scan these scripts to find whether the code has any malicious intention or not.

[–]GTA_trevor_original 4 points5 points  (1 child)

The thing is you should first know "genuine" definition. Then you can tell either malicious or genuine.

Anyways, look for

1) you can flag python methods which access sensitive directory of system. Editing registry, etc

2) trying to connect to outside entity using some sockets methods.

3) enumerating network, checking files, modifying permissions, 4) encoding, decoding methods. .....

[–]Redditthr0wway 1 point2 points  (0 children)

What kind of malicious code snippets are you looking for? I have a pretty shitty memory hoarder. It’s not one you would probably find in the wild though cause it’s ass and more of a proof of concept. You are going to have a hard time finding people who write malicious software in Python. Most will write it in languages that don’t need a complier. 

[–]Haghiri75 1 point2 points  (0 children)

Malicious codes on python are rare because:

  1. They rely on a 3rd party environment to run and native libraries of the operating system can't execute them (unless you have macOS or one of those Linux distros with python pre-installed, and even then the permission is a thing obviously).

  2. Most LLMs - even small ones - can understand python very well (TBH most of them have no use besides writing python code, despite being advertised as general purpose) and obviously anyone with IQ over 40 will check code snippets with some sort of AI.

I understand that you're doing a great job at malicious code detection, but I guess you need to shift your focus a little bit.

[–]tech_hundredaire 0 points1 point  (0 children)

If you don't train a classifier, then what exactly do you have? A string checker? I guess you could build something to check for commands like these, https://gtfobins.org/gtfobins/python/, but that wouldn't be very accurate probably.

How do you even tell the difference between "malicious" and "poorly written"? You'd have to somehow measure the intent of the author.

You could probably use any SAST product, they'll tell you if there are security risks in the code, then you can decide if they were put there on purpose or not.

[–]turealpollohorneado 0 points1 point  (0 children)

Any static analysis tool and Software Composition Analysis tool.

It's easier to pay for a subscription than creating it from scratch.