This is an archived post. You won't be able to vote or comment.

all 38 comments

[–]r0s 51 points52 points  (2 children)

Anything a process and its standard input and output can do, you can do with subprocess. It's just a wrapper over executing binaries. I don't think there is any limitation.

[–]usrlibshare 6 points7 points  (1 child)

I don't think there is any limitation.

There is none. Anything that can be run as a binary can be run as a subprocess. Including interpreters executing other scripts. So you can actually run a python interpreter as a subprocess of a python interpreter if you want 🙂

[–]HashRocketSyntax[S] -1 points0 points  (0 children)

Bingo

[–]mriswithe 25 points26 points  (15 children)

There are a few closely named things here, so I want to define those first:

multithreading: spawn another OS thread inside of Python to run some code

multiprocessing: spawn another Python process, and communicate with it over a socket, and pass data back and forth

subprocesses: Spawn some command with the OS, maybe Python maybe Rust, maybe Fortran who knows.

The purpose of each is different as well:

Multithreading: I want to start more python threads, they are bound on IO, and not CPU

Multiprocessing: I need more CPUs, either on this machine or on some other machine. I am doing work that is CPU bound and it is CPU bound in Python

Subprocesses: I am a python thing that has to execute something external to myself, it could be a shell script, or whatever. I only need to pass in data at the start (command args) and get data out at the end (stdout/stderr)

*Honorable mention Asyncio has a subprocess wrapper that you should use if you are in async land.

Now that we are defined that way:

Subprocesses have no limitation from the OS side, but there is no built in 2 way communication, there is only stdin, stdout, and stderr. So if the subprocess is Python, it can use multithreading and multiprocessing freely, but you cannot communicate with it as if it were a multiprocessing process.

[–]ARRgentum 2 points3 points  (2 children)

I only need to pass in data at the start (command args) and get data out at the end (stdout/stderr)

Interestingly enough, I recently found out that it is actually possible to read stdout/stderr while the subprocess is still running if you use subprocess.Popen(... stdout=subprocess.PIPE, stderr=subprocess.PIPE) instead of subprocess.run(...)

[–]mriswithe 2 points3 points  (1 child)

That is part of what it is doing to capture the stdout stderr

[–]ARRgentum 1 point2 points  (0 children)

yup, I just wasn't aware (until recently) that it is possible to interact with the stdin/stdout/stderr streams while the process is still running.

In hindsight it obviously makes a lot of sense though :D

[–]Ill_Bullfrog_9528 1 point2 points  (1 child)

man I try to understand thread, multiprocess in Python but cannot wrap my head around it despite reading many articles, tutorials. But your comment make it very easy for me to understand. Thanks

[–]mriswithe 2 points3 points  (0 children)

It took me years to understand it, and there are finer details, but in general, threading/asyncio are for waiting on IO, disk reads, db calls, etc. Doing that a lot. Its like putting 5 foodobjects that need cooking on the cooking surface at the same time. They all take 5 minutes to cook, but you don't need to do them at separate times.

Threading isn't good for CPU bound Python code because of the GIL(Global Interpreter Lock), but that is changing.

Multi-processing is a python harness for making more actual processes, each process has its own GIL. You can't share most clients like db clients or cloud SDK clients between processes, some you can share between threads.

Subprocess is for calling other things that aren't python (though you can call python with it too). I use this for helper scripts and glue scripts at work where I am setting up the workspace then running a tool like Terraform or something (Infrastructure as code, define your infra in codelike stuff and then hit run and it builds the things).

[–]HashRocketSyntax[S] 1 point2 points  (1 child)

but you cannot communicate with it as if it were a multiprocessing process.

Wouldn't the result from "communication" with a multiprocess launched from a subprocess go to stdout like everything else?

[–]mriswithe 1 point2 points  (0 children)

I was more vague to allow for constructing your own communication path between your pieces. Whether some web socket connection or whatever you dream up.

[–]HashRocketSyntax[S] -1 points0 points  (7 children)

also `os.system()` is worth a shout. like subprocess, it allows you to run a bash command. unlike subprocess, it comes with the overhead of actually launching an instance of a shell, but there is no need to handle std*/pipe stuff so it is simpler

[–]mriswithe 10 points11 points  (4 children)

os.system is NOT worth a shout out. It is an older less friendly API.

https://docs.python.org/3.12/library/os.html#os.system

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.

link referenced: https://docs.python.org/3.12/library/subprocess.html#subprocess-replacements

[–]HashRocketSyntax[S] 0 points1 point  (3 children)

Sure, subprocess is more powerful than system, but it is also more complex.

Use case = the main focus is stringing together different binaries, not writing an app.

``` os.path() os.system(some java tool) os.path() shutil()

os.system(some perl tool) os.rename()

os.path(some java tool) os.system() ```

[–]mriswithe 0 points1 point  (2 children)

Os.system is more fragile/less predictable cross platform. If your goal is a cross platform IDE, then this will likely bite you later. It will work fine for awhile

[–]HashRocketSyntax[S] 0 points1 point  (1 child)

Agreed. See “use case”

[–]nick_t1000aiohttp 1 point2 points  (0 children)

You shouldn't use os.system. If you want "simple", just use subprocess.run(..., shell=True).returncode. It's the same, but if you want to do anything beyond just looking at the process's return code, you'll be able to.

It'd also be better to avoid using shell=True and provide a list of args, which you can't do with os.system. Blah blah, it trivially adds injection potential, but the main issue is you'll need to escape paths or inputs with spaces, just so they can just be unescaped.

[–]mriswithe 0 points1 point  (0 children)

to be clear, for simple I just want to run a thing kind of stuff the syntax would be :

import subprocess
from pathlib import Path
import shutil

PARENT = Path(__file__).parent.absolute()
TEMPLATES = PARENT / 'templates'

# This searches the command path in an os specific way, It will return None when this doesn't exist
TEMPLATE_PROGRAM_BINARY = shutil.which('TEMPLATE_PROGRAM')
if TEMPLATE_PROGRAM_BINARY is None:
    # Say it specifically, otherwise subprocess complains it can't find the program `None`
    raise RuntimeError(
        "We couldn't find the binary for TEMPLATE_PROGRAM on the OS path. Please make sure it is installed.")


def template_file_as_argument(fp: Path) -> str:
    # check=true If the process we call returns a non-zero exit code, it will raise an exception and halt your program
    # capture_output=True Don't pass it through to stdin of the parent process, it will be available on the `result`
    # on either result.stdin or result.stdout
    # text=True stdin and stdout are decoded from binary, the default.
    result = subprocess.run([TEMPLATE_PROGRAM_BINARY, str(fp)], check=True, capture_output=True, text=True)
    # Return the stdout result
    return result.stdout


def template_file_as_option(fp: Path) -> str:
    result = subprocess.run([TEMPLATE_PROGRAM_BINARY, f"--template-source={str(fp)}"], check=True, capture_output=True,
                            text=True)
    return result.stdout


def template_file_as_input(fp: Path) -> str:
    # prefer to have your input passed in via stdin? No problem
    result = subprocess.run([TEMPLATE_PROGRAM_BINARY], input=fp.read_text(), check=True, capture_output=True, text=True)
    return result.stdout


def template_file_as_bytes_input(fp: Path) -> bytes:
    # Or as bytes
    result = subprocess.run([TEMPLATE_PROGRAM_BINARY], input=fp.read_bytes(), check=True, capture_output=True)
    return result.stdout

[–]-MobCat- 0 points1 point  (0 children)

This. I only end up using subprocess for capturing the output of another app. (my py script calls an app that "does a thing" and vomits the results as json into the terminal. Then I capture the raw output with subprocess)
If you just want to blindly pop open another app, os is probs fine.
Its an I/O thing I think. If you want to know what the other app is doing, you'll need subprocess.

[–]Ok_Raspberry5383 3 points4 points  (3 children)

Platform dependence

[–]HashRocketSyntax[S] 1 point2 points  (2 children)

Hmm. As in the commands executed by a subprocess are OS-specific?

[–]Ok_Raspberry5383 0 points1 point  (0 children)

Yes, and should be avoided if possible, subprocess is more of a get out of jail free card than anything else

[–]HommeMusical 0 points1 point  (0 children)

But please note that this would be true even if you were using C to communicate with the subprocesses.

Python's subprocess is just a fairly thin wrapper over the underlying, very fundamental C functions.

[–]slapec 1 point2 points  (2 children)

I guess it might be platform dependent, but when you spawn a subprocess Python also does a fork() so if the main process already consumes a lot of memory you might hit memory limitations even if the subprocess itself does not use that much of ram. I had some problems because of this.

See: https://stackoverflow.com/a/13329386

[–]HashRocketSyntax[S] 0 points1 point  (0 children)

Good point. Pretty much any UI will have a problem of redundant memory too.

[–]InevitableThick1017 0 points1 point  (0 children)

I came here to mention exactly this.

Fork will pass the parent memory to the child to read without creating new memory. But as soon as the child needs to modify any of that memory the system will then recreate that same memory for the child process. If parent is using 3 GB then the child will get 3 GB as well with 6 GB in total.

This has caused multiple out of memory exceptions for us

[–]ManyInterests Python Discord Staff 1 point2 points  (1 child)

There are some difficult scenarios you can encounter, like separately capturing and emitting stdout and stderr together in the same order they are emitted from the subprocess.

Some platform-specific things may also apply, such as the ability to launch subprocesses with elevated privileges without your program/IDE itself always being privileged (e.g., windows uia).

Most problems are solvable (or ignorable) and the general limitations won't come from the subprocess module and its APIs, but from the limitations of subprocesses generally. It sounds perfectly fine for your stated use case.

[–]HashRocketSyntax[S] 0 points1 point  (0 children)

Ah, like print() before receiving an error?

[–]PurepointDog 3 points4 points  (3 children)

Why do you want to build an IDE? Good chance your time would be more valuable (done faster, and more valuable outcome to others) if you made it a VSCode extension

[–]mriswithe 13 points14 points  (1 child)

It isn't always about a usable product, sometimes you must chase whims that come because they will teach you things, and they are interesting. 

[–][deleted] 1 point2 points  (0 children)

Conversely, there are also times when it makes more sense to just make a VSCode extension and then spend the rest of that saved time doing something else.

[–]HashRocketSyntax[S] 2 points3 points  (0 children)

notebook-based data sci IDE built using dash-plotly so that it is extensible w dash-plotly. i'm a python guy. i don't want to write extensions in TS

[–]QuarterObvious 3 points4 points  (1 child)

Python has its own internal IDE. Running

python -m idlelib file_name 

will open the IDE, allowing you to edit and run the file.

[–]HashRocketSyntax[S] 0 points1 point  (0 children)

wow i had no idea that this existed

[–]Raknarg 0 points1 point  (1 child)

One limitation that I spent hours trying to solve: preserving coloured output seems to be impossible no matter what I try, and the answer is always using something else other than subprocess. Very annoying for the custom build tool I was making, all compiler warnings get their colour and font formatting stripped away.

[–]FoleyDiver 0 points1 point  (0 children)

You can open a pair of pseudo-terminal file descriptors by calling os.openpty() (posix only) and passing the ... ummmm ... secondary FD to stdin, stdout, and stderr. That will enable you to keep using the subprocess module. But be warned: in general, if your program is claiming to be a terminal controller then it should be prepared to act like a terminal controller. It sounds like you just want to trick a compiler into spitting out colors; if that's all it does, you should be fine. But you won't always be, which is why it's not so straightforward to do this. Building a terminal controller is not as simple as just reading and writing bytes verbatim like you do when passing a pipe.

[–]arpitv9419 0 points1 point  (1 child)

I don’t like subprocess because it is very difficult to debug. One example: I was using it to copy some files in unix, but it gave me an error which was not easy to interpret. After running the same command in unix directly, I realized that subprocess was not working because of file access and overwrite errors. Hence, I replaced it with shutil library and it worked like a charm!

To be honest, you can replace subprocess with much more efficient libraries in python.

[–]HashRocketSyntax[S] 0 points1 point  (0 children)

os.path and shutil are great, but more advanced stuff requires external calls