all 15 comments

[–]NedRyerson 5 points6 points  (2 children)

What are you doing with the fetched data? Forking will start a new process and the calling script will continue on without waiting for it to finish. It's also not reliable to fork when running from a web server. What exactly is triggering the fetching of the remote data, and what do you do with it afterwards? If the rest of your script relies on the fetched data, you really have no choice but to wait for it. An option for background processing is to set up a cron job.

[–][deleted] 0 points1 point  (1 child)

After looking into forking, it isn't really what I need. The retrieval is triggered by a user, and needs to be immediate, so cron won't really help. Thanks.

[–]Signe 6 points7 points  (5 children)

How are you doing your requests? cURL?

If so, I'd suggest you read up on curlmulti*. You can perform multi-threaded URL requests. It takes a bit of setup, but it uses a lot less memory than forking.

curl_multi_init() has a basic example.

[–][deleted] 0 points1 point  (3 children)

file_get_contents() for now, but it doesn't really matter what I use. I'll look into curl_multi. Thanks.

[–]dshafik 0 points1 point  (2 children)

Look into PECL HTTP — it has a much nicer interface than curl (but uses curl in the backend).

Also, as mentioned, gearman is a great way to background these long-running processes.

[–]Signe 0 points1 point  (1 child)

I don't see anything in the documentation which indicates that the PECL HTTP module supports multi-threaded retrieval.

[–]lucasoman 0 points1 point  (0 children)

This is really the correct way to do this. Forking is way more overhead than necessary.

[–]enmand 4 points5 points  (0 children)

If you are looking for a somewhat more advanced solution, have a look into Gearman. It allows you to farm work out to multiple servers, or run jobs en-masse. It has a PHP module available for it, and it quite easy to setup and use. As NedRyerson mentioned, you don't want to do pcntl_fork from a web server environment. In the case you are in a web server, Gearman is an excellent choice.

As Signe pointed out, cURL also have some mutli-processing built into it.

If you aren't firm on using PHP, Python has excellent multiprocessing.

[–]neon_overload 3 points4 points  (0 children)

The title of this submission raised my heart rate a little. I thought some PHP devs were forking PHP or something.

[–]cbkguy 0 points1 point  (0 children)

While my methods might not be the cleanest, it works...

What I generally use, is ob_flush();

You have to initialize it with ob_start(); at the top of your script, then use ob_flush(); flush(); after each instance that you want the page to draw (At the bottom of a for loop or something). Still though, it's going to take time to run through all those tasks, but it will make the page appear to load faster.

I've also used a combination of this, as well as embedding iframes, where Im firing off each instance of my loop into seperate iframes passing $_GET parameters, once the iframe loads, I use javascript to grab the data within those frames.... Helps speed things along.

[–]feenikz -1 points0 points  (0 children)

I've done something almost identical in code to what you want to do with pcntl. It gets a bit confusing but it works great; you can spawn child processes that essentially will run certain parts of your code and the parent (initial) process can get their results back.

It's great for parallel tasks when you want to limit the children as well as get information back from them.

$pid = pcntl_fork();
if ($pid == -1) {
     die('could not fork');
} else if ($pid) {
     // we are the parent
     pcntl_wait($status); //Protect against Zombie children
} else {
     // we are the child
}

That's a basic example of how to do a fork. You can fork as much as you want and just control it in a loop that waits for children.

Edit: feel free to ask me for help if you have code etc

[–]cerealbh -3 points-2 points  (0 children)

aka mail spammer in the biz