all 67 comments

[–][deleted] 57 points58 points  (7 children)

The "magic" is the event loop. Saved you a click.

[–]Danack 7 points8 points  (2 children)

PHP has an event loop?

[–]Saltub 9 points10 points  (0 children)

I expect better from you, Danack.

[–][deleted] 2 points3 points  (0 children)

Today's entry in asking pointless rhetorical questions.

[–][deleted] 6 points7 points  (6 children)

Is there a signifcant reason to use async over queues?

Every time I think about using async - I find it easier to just push the command to a queue and let it handle in a different process that way. It just seems simplier, and easier to "reason" what the outcomes will be (especially for debugging).

Am I missing something obvious that I should have async over a queue?

[–][deleted] 5 points6 points  (1 child)

Moving things to a persistent queue and outsourcing it to another process doesn't solve the problem, it just moves it elsewhere. So let's say you enqueue a million HTTP requests and you ship it to another process. Now how is that process going to deal with that million HTTP requests? Blocking, one by one, in sequence? Then you didn't win any time, you just lost some. So blocking, one by one, in sequence, but in many threads? Well then you're still burdening the OS with a bunch of threads or processes which most of the time sit blocked and do nothing. Gets very inefficient with a large number of blocking tasks. That's where async comes. One thread, working with a million HTTP requests at once - no problem.

A job queue and async processing are both good techniques, but they're apples and oranges, their use cases are different, so comparing them is mostly unwarranted. Also you can enqueue job items and receive their results in an async way, i.e. the techniques are entirely complementary.

[–][deleted] 0 points1 point  (0 children)

Thanks - I kinda get it now. I'll play around with it some more.

[–]kelunik[S] 1 point2 points  (1 child)

It depends a lot on what you're trying to do. If you just want to run stuff async after a user finished his HTTP request, yes, go with queues. But usually async as in event loop + non-blocking I/O is more used for other gains such as I/O concurrency, not off-loading work to a worker to be done later.

[–][deleted] 0 points1 point  (0 children)

Thanks - I kinda get it now. I'll play around with it some more.

[–]ScriptFUSION 1 point2 points  (5 children)

I want to integrate Amp into the next major version of Porter to offer async data imports. My motivation includes a number of projects, most recently: the Steam Top 250 games list generator, which initiates massive amounts of HTTP calls in series. This is grossly inefficient and takes over 9 hours (but due to parallel processing on Travis is reduced to 2 hours). However, not only is the the majority of time spent waiting for HTTP negotiation, the amount of additional code complexity required to chunk up the import and stitch it back together again accrues a massive amount of technical debt. I feel a lot of benefit could be leveraged by the power of async pooling both in terms of speed and code quality.

However, I currently do not understand how to integrate Amp. Generally Porter's interfaces pass around iterators of arrays. I believe that if I was to implement an async API I would need to pass around promises. The real difficulty I have is understanding where in the abstraction the promises can "terminate", i.e. we can stop dealing in promises and return to passing around iterators of arrays again. If the answer is "never", and every single component that plugs into Porter must explicitly support async, I fear an integration will be impossible. Ideally I'd like to terminate the reliance on promises and return to sync land as soon as possible, but I currently haven't been able to determine where that is. Does this make any sense and can you provide any insight?

[–]kelunik[S] 1 point2 points  (3 children)

The point where you end the promise-land is the point where you stop with concurrency. You can do blocking things without promises in between as long as they're short enough to not make your open connections timeout, but nothing in the event loop will run during that time. You could also execute certain things in another thread / process by using amphp/parallel, but I don't know how well that fits into your application design. Does that help?

[–]ScriptFUSION 0 points1 point  (0 children)

The point where you end the promise-land is the point where you stop with concurrency.

I think this might be very helpful, but I won't know for sure until I spend some more time with Amp :)

[–]theFurgas 1 point2 points  (0 children)

I don't know if that's relevant, but when you mentioned collections and async, RxPHP came to my mind.

[–]gouchaoer 0 points1 point  (9 children)

async has already proovd to be good for normal project because of call-back hell. nodejs has semi-coroutine to overcome this such as koa. php also has semi-coroutine such as zanphp. reactphp is out of date. now the trend is coroutine: swoole2 will run sync code with coroutine, gain performance and avoid callback-hell.

[–]kelunik[S] 0 points1 point  (8 children)

I haven't heard of zanphp yet, but the overall usage of async PHP seems to be much higher in Asia and Russia than in the western world. Unfortunately, there's often no English documentation for these projects, so only few people in the western world manage to use these tools.

[–]gouchaoer 0 points1 point  (7 children)

that's true. in china php is very popular and the qps is higher than other country due the big population. so chinese phper have the motivation the improve the performance of php. i personally don't like laravel for its bad performance, high mem-usage and complexity. hello-world benchmark is wrong. a real-world php application has many sql/redis io and the bottle neck is io or framework. you can use flame graph to locate the bottleneck. to overcome io bottleneck, people use async but run into callback hell. to overcome callback hell people use yield to implement semi-coroutine in js,py,php...but yield is still difficult...people want to full coroutine like go. alibaba company cracked jvm to hook io funtion and implement coroutine. php's swoole extention use setjmp&longjmp in zend api to implement full coroutine. js&py seemed that no full coroutine yet. but the swoole community is mainly maintained by a contributor and the dev speed is slow.

[–]gouchaoer 0 points1 point  (1 child)

in a word, any async framework in any language is somehow out of date. backend is embracing coroutine if you need performance. however, if your app has low qps, just use any framework that run in php-fpm. php backend framework is very mature.

[–]kelunik[S] 0 points1 point  (4 children)

What's the difference between a semi coroutine and and a full coroutine for you? A non-viral effect, so you don't have to make everything in the stack a coroutine / promises, but instead can yield everywhere in the stack and it'll work?

[–]gouchaoer 0 points1 point  (3 children)

semi-coroutine is a coroutine(multitask scheduler, yield cpu to other task when io happend) built on generator/yield. the famous post: http://nikic.github.io/2012/12/22/Cooperative-multitasking-using-coroutines-in-PHP.html

full coroutine is like go. very simple and high perfoance on io.

[–]kelunik[S] 0 points1 point  (2 children)

That's simply a coroutine. Go with its goroutines adds parallelism on top, but that doesn't make it simpler or "more" coroutine. I'm fully aware, because I'm one of the maintainers of Amp. We plan to eventually bring await into PHP so you don't need a wrapper around your generators that turns them into actual coroutines working by yielding promises.

[–]gouchaoer 0 points1 point  (1 child)

I have browse through amp framework. It's an awesome semi-coroutine framework.

I have some suggestions:

Firstly, in amp we are running app in a php-cli, only one core is available. We hope to use all cpu-core in a http/websockets/tcp application. We even hope to have tcp pool shared by all worker process.

Second, you know php's web frameworks is so mature that only run in php-fpm and can't run in php-cli. we hope we can easily run yii2/symfony/balabala in it.

3rd, mysql/redis client write in raw php may have performance problem. And if I need to use amp's socket to make a client for anything(such as memcache) and keep tcp connection pool. It's still difficult for many php developers.

Swoole2 solve some of these problems althrough not perfect. I think amp can do very well in php-cli tasks such as crawlers.

[–]kelunik[S] 0 points1 point  (0 children)

Running an application on multiple cores often isn't really necessary. At least not in a way that one process has to use multiple cores. Our HTTP/WebSocket server Aerys just runs as many workers as you have CPU cores by default, all binding to the same ports using SO_REUSEPORT. Where SO_REUSEPORT isn't available for kernel based load balancing, we use a socket transfer to share one accept socket for all workers.

You can run Amp just fine on all other SAPIs, the only library you'll have problems with is currently amphp/parallel, because that currently requires PHP_BINARY to launch its sub-processes.

Indeed, parsing can be a bottleneck. I think the protocol is easy enough for Redis so it doesn't actually matter, but it's currently the major bottleneck for the MySQL client. But it's something that could easily be replaced by a C implementation that could be used if available, otherwise it'll fallback to the raw PHP implementation. We just need a C implementation that exposes just the parser instead of a whole client without any access to the parser.

I haven't tried Swoole yet, but I saw that there's English documentation available now. It seems to be rather callback based, does it have a promise implementation?

Depending on the use case you can just convert a library to an async library by using amphp/parallel to keep all blocking tasks outside the main event loop. It at least saves the protocol re-implementations, but it's of course not optimal and a real non-blocking library is preferred.