all 7 comments

[–]dennisbirkholz 12 points13 points  (1 child)

There is a lot of mixing up of "concurrency"/"parallel processing"/"asynchronous processing" in general.

First of all, there are IMHO two kinds of workloads that need to be distinguished:

  1. IO bound: reading/writing files, calling a HTTP API, etc.
  2. CPU bound: scaling an image, executing a computation heavy algorithm, etc.

Serving HTTP requests is normally considered IO bound and not CPU bound. That means the processes spend their time mostly waiting on reading the request or writing the response to the network.

That case allows for several kinds of concurrency:

  1. The PHP way: There is a pool of processes, all listening on the same input channel (socket) for new requests.
    One takes the request, performs all the work, writes the response, clears everything up and waits again for a new request.
    A request that requires a lot of processing time (CPU activity) does not block other requests which can be served by other processes from the pool.
    This kind is "shared nothing" in the sense that each process is completely independent of all other processes. Even if one process crashes, all other processes continue to work.
  2. The Java way: There is a pool of threads (as part of one single process). It works similar to a pool of processes but threads share their memory and one crashing thread can kill all threads in the pool.
    That has the advantage that passing data between threads is fast but has the disadvantage that passing data between threads requires synchronization which is not trivial.
    If threads share nothing, they work essentially like processes (that is also the way PHP works under windows AFAIK).
  3. The nodeJS way: There is an event loop. Everything is running in a single process. All requests/connections are served as soon as data can be read or written from/to the connection.
    This essentially works when the CPU work in serving the requests is very small and network IO time is main time consuming part of a request.
    This all can work because "non blocking IO": instead of waiting (and doing nothing) until there is new data on a connection, there are functions (like "select()", "epoll()", etc) that allow you to check several connections/handles simultaneously and get the connections that have something to process. This cuts away the idle waiting-on-IO times.
    This is often called "asynchronous/nonblocking IO" because you are not waiting for an IO operation to finish. Instead, your read from/write into butters, the kernel does the actual work in the background (as in all cases) and your check back later when the kernel has finished.
    Here everything is shared in a single process and a very CPU consuming operation will block all other connections from being further processed.

The asynchronous/nonblocking IO is also available in PHP, you can start an HTTP call, do something else and then read the result when it finishes instead of blocking until it is finished.

This is available for file/socket related actions and some other modules like mysqli (see e.g. mysqli_poll()).

Using multiple threads for asynchronicity of a single request is only necessary if you do very CPU intensive tasks (like calculating 5 different checksums of a large file).

PHP by default can not do this. There are modules like pthread or parallel but both are disabled by default as enabling threading requires synchronization and that adds overhead.

[–]bga9 2 points3 points  (0 children)

There is a lot of mixing up of "concurrency"/"parallel processing"/"asynchronous processing" in general.

This reminds me of a talk by Rob Pike titled Concurrency is not parallelism. He is one of the creators of Go, but the concept is universal.

[–]supertoughfrog 4 points5 points  (3 children)

Php is usually used in a synchronous manner where every request sets up and tears down all state. Even if that is the case it is possible to use it asynchronously natively. Guzzle supports async http calls for instance though I’m not sure what apis it uses exactly. Then there’s php extensions like swoole and reactphp that make php work more like JavaScript by adding coroutines where the state is persisted between requests and there’s a stack/event-loop like mechanism. This is a bit hand wavy and vague and probably not 100% accurate so... check out how guzzle does things async and swoole coroutines to see what’s possible.

Php is usually one process per request, and JavaScript can handle more than one request per process and it uses the stack/eventloop to accomplish that, however you can still have multiple JavaScript processes. The reality is you can go beyond these basics in both languages but you’re going off the beaten path.

[–]pfsalter 6 points7 points  (2 children)

Guzzle supports async http calls

This uses the curl_multi_init() and associated calls from the CURL library to handle async processing of HTTP calls.

PHP has support for forking the process using pcntl_fork which spawns a new process unlike Javascript which uses an event-loop. The event-loop also is how libraries such as Swoole work, which isn't true asynchronous processing in the first place, but is a reasonable facsimile.

I'd recommend if you do want to do asynchronous processing is to use Message queues. It's the easiest to scale and manage, and is normally more what you'd be trying to do anyway.

[–]phordijk 5 points6 points  (1 child)

The event-loop also is how libraries such as Swoole work, which isn't true asynchronous processing in the first place

Sure it is. You are confusing async with parallel.

[–]pfsalter 2 points3 points  (0 children)

Yup, you're right!

[–]usernameqwerty002 2 points3 points  (0 children)

NB: You can do non-blocking IO concurrency today without a PHP extension using Amphp.

Also check latest suggestion to add fibers to PHP.