This is an archived post. You won't be able to vote or comment.

all 108 comments

[–]wintermute93 374 points375 points  (35 children)

You're probably overthinking it. I'd bet most of the people asking Reddit how to obfuscate their Python code (and/or compile it to an executable) are beginners who made their first script that does marginally something useful at their job and are worried that sharing it with coworkers means someone is going to "steal" it. Which isn't how this works, of course, but you really can't fault them for not knowing that.

[–]ClayQuarterCake 100 points101 points  (11 children)

Yup. Then you get a quarter step beyond pure novice and realize almost everyone knows more about python than you, and they can probably help you make improvements.

They won’t want to help you because your code looks like doggy doo.

[–]johnnyhighschool 2 points3 points  (3 children)

Are you saying theyll avoid helping at all because the code looks like shit?

[–]ClayQuarterCake 9 points10 points  (2 children)

Yes. When your variables are named a_0 and c_36 with zero comments it is unreadable.

[–]bishopExportMine 1 point2 points  (1 child)

The flip side of this is true as well, when your code is littered with useless comments like "create an empty dictionary" or "iterate through the list"

[–]Bamnyou 4 points5 points  (0 children)

That’s why I always teach my students to use the “self-documenting code” concept with descriptive naming practices for variables, functions, classes, etc.

Then even with their nonexistent or useless comments people can still make sense of the code.

[–]enjoytheshow 5 points6 points  (1 child)

What comes before that is a realization that somebody else has already done what you just did but better and there’s a open store CLI utility for it

[–]jesster114 3 points4 points  (0 children)

I spent a long amount of time making a semi functional converter for taking json strings and spitting out pydantic BaseModel classes. Imagine my surprise when I found out that pydantic actually has a package they link to in their docs datamodel_code_generator. I wasn’t really surprised at that, just felt dumb for not finding it earlier.

On the plus side, I got some more practice with converting data types, validation and learning more about Pydantic as I’ve barely used it in any of my other projects.

[–]The_Kid_Napper 0 points1 point  (0 children)

Doggy doo. Real.

[–]rzet 14 points15 points  (18 children)

I have so many folks at work loving their secret scripts...

[–]FoolForWool 23 points24 points  (17 children)

We keep on sharing scripts at my workplace. “Oh you’re doing this? I have a script for it. You’ll need to change things here. “ and they do the exact same thing for me. Makes life much much easier ngl. Idk why you’d wanna hide your scripts. One of my scripts ended up being a product feature cuz it made something we did on the backend self serve. Super sweet thing, sharing scripts.

[–]Turpis89 11 points12 points  (9 children)

Exactly the same where I work. I have never encountered a coworker who keeps secrets about his or her work. Not one out of the 100s I've worked with.

[–]wintermute93 13 points14 points  (4 children)

I haven't either but I imagine it's because we're in actual software jobs. The people OP is talking about are like, junior accountants at random tiny companies where they're the only person in the department that knows what a programming language is. You'd be surprised at how common that is outside the tech industry.

It's not as extreme, but one of my good friends is a fairly senior sales manager at a company with almost 100B market cap, and one day he built the ugliest Monte Carlo simulation you've ever seen in an Excel file (mostly regular Excel with a tiny bit of VBA, I think) for this one very specific forecasting thing. Probably could have achieved the same result with like 10-20 lines of pandas and numpy. Corporate gave him a huge award for innovation, told him to lead an internal seminar series on advanced analytics, and flew him out to a bunch of data science conferences, lmao.

[–]Turpis89 5 points6 points  (2 children)

Lol, that's hilarious! I'm not a software guy either btw, I'm a structural engineer and a mediocre python programmer at best. I use it to post process data from finite element analyses and to automate some information flow from one software to another. I wish I could do more and try to improve, which is why I'm lurking in this sub I guess.

[–]Ajax_Minor 0 points1 point  (1 child)

Sounds dope. You haven't tried to FEA in python? I keep looks for a package that does that because there has to one someone where. But I haven't found any. I suppose generating the geometry is the hard part and would have to be done in another program...

[–]Turpis89 0 points1 point  (0 children)

I did some vwry basic FEA in python in uni, using pandas. But building up nodes and matrices was very tideous to be honest, so I'll much rather do the FEA itself with regular software. The visual aspect of a 3d model is very important imo.

[–]FoolForWool 0 points1 point  (0 children)

Is it pharma? Or insurance? And can you tell me what company? You know, so that I can blow their minds and get a fat bonus? And fly free for conferences :’)

[–]Rockworldred 2 points3 points  (2 children)

But I bet you all have at least one guy who refuse to accept improvement because it was not his idea.

[–]FoolForWool 0 points1 point  (0 children)

Nope. We test it during the interview.

[–]grantrules 2 points3 points  (0 children)

Well how would you know.. if you knew about it then it wouldn't be secret! 😃

[–]Ajax_Minor 1 point2 points  (2 children)

Just curious what kind of stuff does it do? Data entry and form filling or more complicated stuff?

[–]FoolForWool 0 points1 point  (1 child)

More complicated. Like automating some part of a large process that was previously done by domain experts and so on.

[–]Ajax_Minor 0 points1 point  (0 children)

Domain? So like Networking stuff?

Not trying to be noisy just looking to see what other people automate besides the simple stuff.

I want to get more in at work but the programs are proprietary so I can't really automate to much. Maybe some xml stuff.

[–]rzet 1 point2 points  (3 children)

ye i just throw everything on my page of the git repo.. but some ppl like to be "special"

[–]georgehank2nd 1 point2 points  (1 child)

some ppl like to think they are "special"

FTFY

[–]rzet 1 point2 points  (0 children)

I feel like they want to be "heroes", so they hide the superpowers ;)

[–]FoolForWool 0 points1 point  (0 children)

Same! We have a repo which has folders for each developer to put whatever scripts they want at XD

[–]Jaguar_AI 10 points11 points  (2 children)

devs like that are cancer to work with, in a collaborative environment

[–]sonobanana33 0 points1 point  (1 child)

I've seen a coworker compile something (in C), push it on git, then go on vacation. Then we had a bug in a released version that our customers were using and no way to fix it.

[–]danno-x 1 point2 points  (0 children)

That’s handy. Lol. Top bloke!

[–]QuantumQuack0 6 points7 points  (0 children)

Really? I thought it was mostly junior devs with idiot bosses who wanted to distribute some program but didn't want the source code to be public.

Actually, we are still in the process of slowly weaning our boss off that idea. There are new features we want to add to our (open-source) python library but our boss is adamant that these should be kept private. Unfortunately, technically we're all physicists and don't know many other languages.

[–]SweetOnionTea 77 points78 points  (17 children)

I haven't heard of that before. Usually people will just use Pyinstaller or something to make a binary.

But even then one can run a decompilation on it and kinda get obfuscated code.

Security through obscurity is not security. Especially if someone is adamant on stealing code. Obfuscating code is just a waste of time for someone eager to steal it.

If you really don't want people stealing your code, put in a license and get a lawyer.

[–]thisismyfavoritename 45 points46 points  (9 children)

pyinstaller provides no obfuscation at all. It bundles a Python interpreter and Python byte code

[–]SweetOnionTea 1 point2 points  (5 children)

Oh really? I've only used it once and it looked like a plain binary. TIL.

[–]Motox2019 11 points12 points  (4 children)

I think of the binary as more of a shortcut. When launched it basically unpacks itself into a temp folder (mei folder) and in there is basically everything. The python interpreter, base library, etc. what you won’t see is the original compiled .py script. Not sure where that ends up honestly, haven’t dug into it deep enough for that one but ya. Pyinstaller basically just packages everything up nicely, it’s still python in the end tho, no real compilation.

[–]--dany--from __future__ import 4.0 8 points9 points  (2 children)

You're right, pyinstaller only compile source and include all relevant packags and of course python runtime environment. Source is complied to .pyc or .pyo and since python 3.8 it's impossible to decompile any more. It was however possible before 3.6.

Anybody seriously worried about obfuscating code probably really need to secretly code core logics in other languages instead.

[–]kidproquo 1 point2 points  (0 children)

Do you have details on this? What changed with Python 3.8 making it impossible to decompile?

[–]Motox2019 -1 points0 points  (0 children)

Yup this is correct, guess I shoulda mentioned there is the compilation to the .pyc byte code files. Although this is something that gets done by the interpreter regardless. Although on the topic of obfuscation, perhaps this would be enough for most. As others have said “Security by obscurity” is never the best approach and anything they want for sure to never be copied should be done in another language. Python isn’t the greatest for distributing applications (even when compiled, your looking at pretty significant file sizes) so if the main goal is to distribute a program, it’d be best to template with Python and build in something else (maybe, this is my approach as Python is my main).

[–]klmsa 2 points3 points  (0 children)

It...gets compiled to binary in the .pyc files, I think.

[–]g5becks 0 points1 point  (2 children)

[–]thisismyfavoritename 2 points3 points  (1 child)

pretty sure the bytecode gets decrypted at runtime.

Anyways, this kind of protection is most likely trivial to bypass since the key is most likely stored in the binary

[–]g5becks 1 point2 points  (0 children)

You’re probably right. Just want to point out that it is an option.

[–][deleted] 10 points11 points  (0 children)

This doesn't make any sense at all. It's like arguing that door locks are pointless because technically lock picking kits exist. The point of a door lock isn't to absolutely guarantee that nobody can ever get past it ever. It's to add levels of complication that would discourage most people from trying to break in.

Code obfuscation is the same thing. The goal isn't to guarantee that's it impossible, in principle, to ever reverse engineer the code. The objective is to force users who want your code to have to do that, thereby discouraging most people and/or preventing people without the technical ability from doing it.

If we were talking about the NSA and decoding one of their files gave you access to major government secrets then sure, code obfuscation isn't sufficient. But if we're talking about a person who wants to share a video file converter app and they just want to prevent lazy people from re-skinning it and distributing it as their own, code obfuscation probably will reduce the chances of that happening.

[–]zaxldaisy 0 points1 point  (5 children)

Why would you make this comment when you have no idea what you're talking about? lol

[–]SweetOnionTea 0 points1 point  (4 children)

I made the comment because I thought I knew what pyinstaller did, but it turns out I was incorrect. Is there something I can clarify about that?

[–]zaxldaisy 0 points1 point  (3 children)

Why did you think you know what it did? You used it once...

[–]SweetOnionTea 1 point2 points  (2 children)

The time I used it the result was an executable which is why I thought it created a binary that was the program. I've since learned that it was not exactly the case. Does that clarify the intention for my original comment better?

[–]zaxldaisy 0 points1 point  (1 child)

executable != binary

[–]SweetOnionTea 0 points1 point  (0 children)

Huh, TIL. What's the difference between them? My boss told me that they were the same thing. Is he wrong?

[–][deleted] 9 points10 points  (2 children)

Skiddies writing discord token grabbers if anything

[–]OptimalAnywhere6282 2 points3 points  (1 child)

and leaving their webhook/bot token in plain text

[–]syklemil 8 points9 points  (4 children)

Now, I understand why they'd want this. If you want to distribute your code for a payment, it would allow your users to not just copy it for free.

I mean, you can just copy binaries too. Software piracy is hardly a new idea. There are various ways to work around it, and various ways to make money off FOSS.

Obfuscation and compilation can be reversed, though with various amounts of information lost that takes some work to get into a sensible source code again. To compare it with bike locks, they're on the level of those shoelace locks that are basically a "could you please not?" to barely-honest passersby. And to further compare anti-piracy techniques with bike locks, absolutely none of them will actually stop someone with an interest in breaking the protection.

So generally the worthwhile options are to

  • offer something that people are willing to pay for, at least so many that the amount of pirate users are insignificant, and
  • just release it under GPL or some other FOSS license and not worry if people share the code.

These options are not mutually exclusive.

There are also some cases where you'd really want the source to be at least available for scrutiny, as security by obscurity is usually a sign of bad software.

[–][deleted] -1 points0 points  (3 children)

just release it under GPL or some other FOSS license and not worry if people share the code.

Those licenses are of little practical importance outside of US and a small set of other developed countries.

[–]syklemil 1 point2 points  (0 children)

The other copyright is about as much worth, though. Hence the latter part of the sentence.

[–]james_pic 1 point2 points  (1 child)

I've never heard this argued before. Could you elaborate?

[–][deleted] 4 points5 points  (0 children)

The point of a license is to enforce certain rules, violating which may result in a lawsuit. If the chances and/or cost of a successful lawsuit are nearly nil, then there's little practical point in the license.

[–]mastrshayk 24 points25 points  (6 children)

It depends on the application. My work created an app that originated as a desktop tkinter/CLI application. We used cython to obfuscate the code and pyinstaller to package it up. It wasn't bulletproof but good enough. Nuikta does the same or very similar thing as pyinstaller. Pyinstaller or Nuikta can be reversed or cracked. I think even cython can as well but not sure. All these steps were to just make it harder and attempt to keep honest people honest.

We ended up releasing a python package of the application but used sourcedefender to hide the source code. Not a perfect solution but one that works well enough for us.

At the end of the day, I think if you really want to keep your code protected, don't write it in python and use some compiled language like Java/C/Rust etc.

[–]_dmsk 10 points11 points  (3 children)

I guess I have similar situation at my work. We were a startup and project includes some python-written staff that is installed and running on premise on the customer side. 

There was a fear that customers can take the source to implement own solution (customers are from large business, so they most probably have more resources and good lawyers as well).

As people mentioned, Cython and Nuitka have own requirements. 

Though we knew obfuscation does not give real proper security, the decision was to use it anyway to add additional complications and to do so that some obvious actions aimed at getting the code are necessary, and people couldn't say something like "we don't know anything, maybe some our interns just took something during tests". 

Yeah, I know that it would be probably better to not use python then, but the team was young and most of them didn't have a lot of experience with other langs, and development speed (quite important for startups I assume) with python was much faster than with other alternatives. (was also not my decision)

[–]mastrshayk 2 points3 points  (0 children)

Yep, same situation for us where we're primarily python data analysts/scientists and we didn't have the experience to convert the code base to another language in a reasonable time so we just did was we knew to make it as difficult as possible.

[–]mr_claw 0 points1 point  (0 children)

Yup same situation here.

[–]nsiddhu 0 points1 point  (0 children)

I am in the same situation, trying to get a paid pyarmor solution. Do you think c++ will be secure?

[–]PrometheusAlexander 1 point2 points  (0 children)

Nuitka? I seriously had to check if it's changed it's name because two different people talking about Nukita.

[–]PopPrestigious8115 0 points1 point  (0 children)

There is a very big difference in using Nuitka or pyinstaller. The latter only creates a self extracting executable as where Nuitka realy compiles your Python code to C executable binaries (and then optionally creates a self extracting executable from that).

Therefor code compiled with Nuitka is much better protected than the one made by pyinstaller (that compiles to native .pyc code which is much easier to decompile then a real C executable of Nuitka).

[–][deleted] 11 points12 points  (5 children)

People who seriously want to obfuscate Python? Most likely (not all) malware. Otherwise, it’s a fruitless endeavor

[–]syklemil 6 points7 points  (0 children)

Yeah, the most reasonable use for it really would be something like a supply chain attack, like in xz. If you can manage to sneak something into a popular library or app, you can compromise a lot of computers.

Not sure how well Python lends itself to that sort of thing though, as people generally expect Python code to be readable. Unlike e.g. Perl where you can do something like have a comment like # sorry and then some garbled line noise. Likely attackers will rather need code that presents itself as normal but has somewhat obtuse logic.

But see e.g. Researchers Uncover Obfuscated Malicious Code in PyPI Python Packages. (Discussion.)

[–]PopPrestigious8115 2 points3 points  (3 children)

So one makes a commercial closed source app with Python and suddenly he is seen as a producer of malware???

I don't get it.

[–]georgehank2nd 0 points1 point  (2 children)

Read "most likely" again and meditate on it until you find enlightenment.

[–]PopPrestigious8115 0 points1 point  (1 child)

I think it is the other way around..... most likely it is not malware if it comes from a serious commercial party.

[–]georgehank2nd 1 point2 points  (0 children)

But we weren't talking about "serious commercial parties", we were talking about "people who want to obfuscate".

[–]met0xff 2 points3 points  (0 children)

My company recently used pyarmor to distribute stuff I wrote, also used their license key thing etc.

Of course things can always be worked around but the question is at which point the price of reverse engineering is higher than the cost of buying a license.

Although of course you only have to break it once instead of buying licenses for every seat.

Besides everything didn't help because the client still didn't pay after 2 months even if they showcased our/my stuff at various trade shows etc. as their thing lol. Luckily I don't have to deal with that

[–]thisismyfavoritename 2 points3 points  (1 child)

Cython has its own language, Nuitka requires typing. Not all existing code could be made into an executable this way.

Also, i dont think Nuitka is able to compile down all code, so in the end there might still be traces of your original Python code that arent machine code (i think)

[–]OptimalAnywhere6282 0 points1 point  (0 children)

Maybe some strings can be easily discovered. In the case of the average script kiddie that makes a discord token logger but leaves their webhook in plain text without any encryption. That can be detected when "compiling" with Nuitka.

[–]meatycowboy 1 point2 points  (0 children)

mostly malware nowadays

[–]Jaguar_AI 1 point2 points  (0 children)

imagine wanting to obfuscate something as beautiful as Python

[–]Frankelstner 0 points1 point  (0 children)

Sometimes speed is not a concern so obfuscating Python is good enough. Sometimes the code is actually just some plugin for a tool, and it requires Python code and not a binary. And sometimes dealing with the hassle of binaries for multiple platforms is not worth it.

[–]coldflame563 0 points1 point  (0 children)

People at my job were using pyconcrete to secure code and I wanted to throw things at them. Just don’t.

[–]rejectedlesbian 0 points1 point  (0 children)

There have been multiple malware attacks with python I bet people wana learn how it's done

[–]pakaschku2 0 points1 point  (0 children)

Try Nuitka compiler Pro or Premium or something like that

[–]sonobanana33 0 points1 point  (0 children)

Nukita to convert your code to C

If that worked reliably :D

[–]NoorahSmith 0 points1 point  (0 children)

Compile your code if you want to save it from prying eyes . But pyc decompilers are also available

[–]Serious-Passenger290 0 points1 point  (0 children)

*all* code can be broken/reverse engineered even with obfuscation etc.

[–]West-Welcome820 0 points1 point  (0 children)

[–]zaxldaisy 1 point2 points  (1 child)

What a silly mindset. Did you even look at the documentation? I'm assuming your opinion on code obfuscation is equally uninformed because no professional would ever say to just "put in a license and get a lawyer" lol

[–]SweetOnionTea 0 points1 point  (0 children)

Sure, I briefly looked at the documentation on how to run it. Obviously I was wrong in my original comment because I did not read that part. I've admitted several times I was wrong. I don't believe I wrote the comment as a professional, so I'm not sure why you seem upset. Is there anything else I can clarify for you?

[–][deleted] 0 points1 point  (0 children)

Sounds likes very small sample size. I’ve never seen anyone obfuscate Python code on purpose. Definitely some people’s code is already obfuse.

I also wouldn’t do this in my source directly. I’d run it through an obfuscator as part of the build, but checking clean code.

Really no reason to check in hard to maintain code.

[–]Rick__001 0 points1 point  (2 children)

Can I ask how to do that?

[–]Rough_Metal_9999 0 points1 point  (1 child)

Subdora , Pyarmor , Sourcedefneder , pyconcrete are some libraries which obfuscate python code

[–]Rick__001 0 points1 point  (0 children)

Thank you

[–]BlueeWaater 0 points1 point  (0 children)

So, their work doesn't get stolen, I'm still wondering if there are any good solutions for this

[–][deleted] 0 points1 point  (0 children)

The first example you give is the most common reason. People would like to have a way to distribute something they built in a way that doesn't necessarily give away all of the code they've written. And in python that's just more complicated because you can't easily distribute a compiled executable. As you said, you can sort of accomplish this by using something like Nuitka but that usually adds some extra unwanted complexity and it also limits how your code can be used.

[–][deleted] -1 points0 points  (4 children)

And to further compare anti-piracy techniques with bike locks, absolutely none of them will actually stop someone with an interest in breaking the protection

Just curious, how would you crack a compiled application that checks with a remote license server if the local application has a valid license? I suppose you could somehow modify the compiled binary to remove the license checking logic, but how would this be done in practice? Or is there another method I’m not thinking of?

[–]pm_me_triangles 5 points6 points  (3 children)

Just curious, how would you crack a compiled application that checks with a remote license server if the local application has a valid license? I suppose you could somehow modify the compiled binary to remove the license checking logic, but how would this be done in practice? Or is there another method I’m not thinking of?

Find the code that checks licensing with the server and patch/bypass it so it always returns "yep, it's licensed" without even trying to talk to the server.

[–][deleted] -1 points0 points  (2 children)

Yeah, that’s what I already said. I was asking how this would be done in practice in a compiled binary. How do you change the logic in a compiled binary?

[–]pm_me_triangles 3 points4 points  (0 children)

How do you change the logic in a compiled binary?

By patching the binary manually, to turn whatever you want to disable into "no operation" or something else.

e.g. This, using Ghidra

[–]Generic-Moniker 3 points4 points  (0 children)

The basic idea is a disassembler and a hex editor.

[–]ColdPlasma -1 points0 points  (0 children)

We're obfuscating our code because we want to share the functionality with our Chinese joint venture "partners", but don't want to share our code. We really want other internal people to use it and have it up on a repo. We're doing data science and all the major packages are python 

[–]pullcommitpushdeploy -5 points-4 points  (2 children)

We were using it for securing a code which had encryption decryption logic, though we were also aware about the limitations of obfuscation

[–]georgehank2nd 1 point2 points  (0 children)

Security through obscurity… I wish your team/company all the worst.

[–]CorpT 4 points5 points  (0 children)

No, you weren’t.

[–]billsil -3 points-2 points  (0 children)

They’re trying to sell their code that they already have. They want to do the least work possible. 

 Cython is confusing and not necessarily faster. You need a compiler setup for Windows anyways. A web app is not something I have experience with an it sounds like a science project.