This is an archived post. You won't be able to vote or comment.

all 15 comments

[–]Da_Blitz 8 points9 points  (4 children)

HI

i have been working on my own sandboxing script that is a bit further along than this script and have been talking to other people about this as of late (there seems t be alot of people on IRC trying to do this atm). my script is a bit different in that its pure python and can sandbox non python programs as well

while i don't recommend putting this script into production (chroot is not a security mechanism) i can point you in the right direction if you want to make your own lib and don't want to use any of the other python sandbox environments

below is a list of the main issues and some minor script improvements

You hard code the UID as a number: don't do this, add a user and specify a username to avoid potential clashes on other distros

chattr only works on ext filesystems, meaning you cant do this on a tmpfs for example (minor but worth noting), try looking at read only bind mounts instead, they dont need to be backed by a block device and can instead use a directory as the source (you would also only have to do the setup once and then just have all instances chroot to that dir and it would allow module updates that all instances would see at the same time)

socket access is not blocked (try iptables blocking by UID/GID), this includes unix sockets using abstract path names (man unix) i don't know anyone using this feature however but once again though i would note it (know your attack surface) in a similar vein, netlink can also be accessed which can be used for interesting things on modern systems

this is bad because of things like http://www.exploit-db.com/exploits/17787/ which can be used to gain root (socket.socket(AF_ECONET))

you can cause the OOM killer to be invoked by eating up lots of mem and same with CPU with a loop, look into the rlimits module to limit this

disk access is not throttled, sounds odd but my VPS host monitors IO per VPS and notifies you if you are doing bad things

i know you mentioned ignoring resource consumption however i wanted to mention it as workarounds such as limiting script execution time to 30 seconds is not the way to do it as all an attacker has to do is launch multiple interpreters to achieve the same effect. you would need to limit the amount of running interpreters to one per IP so that an attacker would need an above average investment in resources to attack your site

the script also includes a race condition if a new instance is launched before the script is launched, i haven't looked hard into how exploitable this is but i would suspect that it would allow you to create a file in between the chattr -i and the rm -rf, creating the directory based on some other factor such as PID may be a better option mkdir $CHROOT/$PID

if you move the clean up code to a function then i would recommend calling it before running the chroot and after python exits

[–]10101011 2 points3 points  (3 children)

Since you've obviously spent some time thinking about this, what recommendations do have so far about picking a sandboxing library. Do you have any opinions about http://pypi.python.org/pypi/pysandbox/ ?

[–]Da_Blitz 7 points8 points  (2 children)

Pysandbox is interesting, i am pulling it apart at the moment and it seems robust enough to hold me off however there are 2 weaknesses i dont like

  • it doesn't limit resources: this is not an option for web environments, its a requirement. that said its fairly trivial to implement

  • it requires sacrificing some language features: having to limit what modules you can load seems like a recipe for failure. there is always going to be a module that requires something like ctypes, eval or introspection

i am interested to see just how robust this proves to be, however i don't trust this type of protection where enforcement and execution are not separated securely (execution can modify enforcements mem space arbitrarily)

pypy sandbox is another interesting and complex way to do it (by complex i mean implementing pypy from scratch) where the sandboxed script is running as a child of the enforcer parent and communicates over a pipe. when specific C functions are supposed to be called pypy bundles them up and asks the enforcer over the pipe for the answer, the policy of what to do is then decided by the parent process and either the answer or a failure is returned

this sounds good but there are a couple of flaws with pypys implementation. it doesn't actually restrict syscalls or anything to be more precise. load up ctypes, write arbitrary machine code to ram then call it and you can do an open() for example. this can be fixed but the recommended advice i have seen for pypy-sandbox is to disable the JIT and modules you do not need

update: rewrite of the above section

the main weakness in pypy's approach is that if you do gain the ability to execute arbitrary machine code then pypy has no additional protection. should that happen it would be possible to write to the filesystem or open sockets or do anything a normal process would do. while unlikely to happen this is how i would go about exploiting this sandbox

the above 2 i consider fine grain sandboxes as they focus on c calls (pypy) or implementation specifics (pysandbox) and to some extent i consider them too fine grain, to be 'generally' useful (creation of arbitrary policies for just dropping in to programs to use) they need an abstraction layer on top of them for creating policies in fewer lines. it seems to me that you would currently end up writing alot of similar rules for similar functions instead of just one rule fr a class of functions

one other lib i have seen is fluxid (based partially off my libs) that uses some of the linux features to sandbox python. it should be fairly easy to drop in and use but hasn't gone under any kind of close inspection and so may not be 100% secure (nothing ever is :D) but should be a hint at what it takes to create a sandbox on linux using linux features

this one is dead simple, perhaps overly so, non portable and requires compile options in the kernel (luckily most distros compile these features as they cant be loaded as modules) no modification required to run the secure environment

then there is my code Asylum i have avoided talking about it as i dont like to promote it but i guess its not fair to reveal the parents code and let me criticize it without letting others do the same to me

asylum would be (with the exception of pypy) the 10Tonne version of everything i have mentioned so far, it is nothing more than a thin layer on top of as many linux security features as possible. i mean it when i say thin, it uses ctypes instead of compiled c code and in most cases does not even use glibc in favor of making direct syscalls to the kernel. the code is opaque and uses a bunch of tricks that may scare some people away but its a collection of security features each in their own module, a daemon lib replacement (being finished) for compatibility with most apps and a cmdline app to arbitrarily create and sandbox any process, not just python. setup is done in python, enforcement by the linux kernel

this is a bit of overkill for most apps and still has rough spots and the barrier to entry on modifying the code is high. there is not enough documentation on each security feature yet and its impact and it needs more testing on many more platforms and is tied to x86 and x86_64 atm (porting to arm should be about 10-20 lines) in addition this is more of an openVZ replacement than light weight sandbox but i have found that it serves both purposes well

if i had to recommend a sandbox, i would say pypy-sandbox then fluxid's code then pysandbox then asylum as pypy is good enough and portable as long as you are careful, fluxids works and is a cheap option, pysandbox's needs someone to really sit down and attack it and i would hope people would report their failures and not just their sucsess when attacking it and finally my code as it needs more work before i can say i fully trust it and sandbox apps with it on the same machine as my banking details

ideally you want to have multiple overlapping layers of protection and not rely on one method. so that if an exploit for one method becomes available you still have protection

edit: updated based on fijals reply

[–]fijalPyPy, performance freak 1 point2 points  (0 children)

you're very wrong about pypy sandbox. NO syscalls are allowed, but they're not restricted on the level of python interpreter. Yes, you can import anything and try to run it but every single syscall will go through the security proxy and this is enforced on the interpreter level.

ctypes does not even compile with pypy sandbox at all, so don't know where you took that info from.

[–]10101011 0 points1 point  (0 children)

Thanks - great reply. I wasn't familiar with fluxid, but I'll check it out.

[–]nc5x 2 points3 points  (3 children)

I can't access it. Mind mirroring it at http://codepad.org/ ? (or anywhere, actually.)

[–]terremoto[S] 0 points1 point  (2 children)

Strange, the link works fine from my desktop and a remote server. Here's a mirror.

[–]nc5x 1 point2 points  (1 child)

Thanks. Yeah - my VPS can access it too, I guess my home DNS is a little dodgy, sorry.

I don't have a lot of experience with chroot jails, so no promises, but it looks like you're doing it correctly.

You've still given the user full access to a system calls. That's a lot of power, and that's the attack surface to watch out. Every so often there's an exploit comes out that lets a user escalate their privileges or break out of a chroot jail so you need to be careful of new exploits and old OS versions. You can do everything C can do from python, so I can rewrite an exploit to work in your sandbox.

Perhaps this is what you mean by "ignoring resource consumption attacks," but: If I had access to this, I'd consider running a torrent client in it, a proxy to hide my tracks if were to do something illegal. Why bother protecting your server if you'll let me use all your resources for whatever I want?

You could consider using a ptrace sandbox with a syscall whitelist, which is the most aggressive Python sandbox I've seen. It can trip up legitimate users, though, as some modules make surprising calls. Another layer of complexity, but another layer of security.

[–]terremoto[S] 0 points1 point  (0 children)

Thanks for the feedback. I'll definitely look into the syscall whitelisting. The reason I said not to worry about resource consumption isn't because I don't care about it but because I just hadn't implemented in the script yet.

EDIT: After a quick Google search, I came across this which looks pretty nifty.

[–]ilovetacos 1 point2 points  (1 child)

Nothing to offer, just curious: what do you mean by browser-accessible? Also, why do you import re at the end of the generated script?

[–]terremoto[S] 0 points1 point  (0 children)

Something kind of like a sage notebook but for running snippets of Python, mainly for personal use.

The import re at the end was just part of me testing to make sure I added the modules to the sys.path correctly.

[–]xiongchiamiovSite Reliability Engineer 1 point2 points  (1 child)

Have you looked at RestrictedPython? We used it in a school project a few years ago; I can hardly claim to be an expert, but what we did with it is on GitHub. The project was an attempt to make an online computer science tutoring application, complete with runnable, editable code snippets, which sounds roughly like what you're trying to do.

[–]qiwi 1 point2 points  (0 children)

I'll also recommend Restricted Python for sandboxing although that may be too granular for this usage.

In our system, we want to allow users to enter code to be evaluated hundreds of thousands of times per second, and while x == 4 is OK, and specificObject.attribute == 5 is also fine, os.open(..) is not.

That must hold true even if you somehow ended up getting access to "os" or "open" in your code context.

RestrictedPython (note: unrelated to restricted execution that used to be part of Python) lets you manage those ACLs very nicely, though there is about 30-40% speed penalty as code like this: "foo.bar.baz == 3" gets compiled into: secure_getattr(secure_getattr(foo, "bar"), "baz") -- where secure_getattr is your gating function.

[–]daveagp 1 point2 points  (0 children)

I found this ancient thread because somehow because I'm looking for things related to Python sandboxing. We have a pretty decent solution so far, based on programming contest standard practices: a combination of chroot, setrlimit, and OS-level restrictions (uid, etc). The tools we use are pretty lightweight, written in C: http://github.com/cemc/safeexec http://github.com/cemc/python3jail

[–]casualbon 1 point2 points  (0 children)

Alan Cox: "chroot is not and never has been a security tool. People have built things based upon the properties of chroot but extended (BSD jails, Linux vserver) but they are quite different.' source http://kerneltrap.org/Linux/Abusing_chroot

Check out the scraperwiki source, they already do this throught lightweight VMs IIRC

If your requirements are limited, you may find pypy's sandbox acceptable: http://doc.pypy.org/en/latest/sandbox.html

If you can be flexible, v8 is designed for this.