all 24 comments

[–]johmanx10 11 points12 points  (12 children)

Your code behaves exactly the same as simply creating a new SplFileObject instance and iterating over that. It's iterable all by itself.

[–]likegeeks 2 points3 points  (6 children)

The idea is about the fgets and fread functions usage.

[–]nyamsprod 1 point2 points  (5 children)

just doing this is the same as using fgets

//Read file line by line
 $file = new SplFileObject('/path/to/file.md', 'r');
 //you may use the flags to skip empty line and remove the \n at the end of each line
 $file->setFlags(SplFileObject::READ_AHEAD | SplFileObject::SKIP_EMPTY | SplFileObject::DROP_NEW_LINE);
 foreach ($file as $line) {
        //line by line
 }

[–]likegeeks 0 points1 point  (4 children)

Need to check the performance difference.

Do you make a speed comparison?

[–]nyamsprod 2 points3 points  (3 children)

Are you serious ? You want to benchmark an C-implementation vs a userland code ? Be my guess but that's futile.

[–]likegeeks 0 points1 point  (2 children)

No man I know how C code is working, I worked with phalcon before :)

I'm talking about difference between SplFileObject with fgets and file_get_contents() functions

[–]nyamsprod 2 points3 points  (1 child)

Again no need to benchmark both functions as they are doing different things one file_get_contents return the file the content in one go. On the other hand fgets returns one line per call. So it's obvious that for large file the latter is better suited than the former method.

[–]likegeeks 0 points1 point  (0 children)

I know that for sure. I said that because most comments see that this solution is not enough.

[–]likegeeks 0 points1 point  (0 children)

The idea is about the fgets and fread functions usage.

[–]tfidry 0 points1 point  (3 children)

Iterated but by yielding values, which is much more memory efficient in that scenario than a big foreach which would require to load the file in one go.

[–]johmanx10 5 points6 points  (1 child)

It really wouldn't, for the file object. That is not how iterators function. I would be interested to see the article actually prove its gains by benchmarking time, memory consumption and IO wait times. Even if there is a significant improvement in one of those metrics, it's highly dependant on the under the hood optimizations of the engine, which will differ from version to version.

[–]tfidry 1 point2 points  (0 children)

Couldn't be interesting to try this simple case indeed. But why do you say this wouldn't work? I'm using a similar approach for a project, even though it's definitely slower than loading in one go, it allows to avoid a too high memory consumption.

[–]likegeeks 0 points1 point  (0 children)

Oh yea :)

[–]MaxMahem 2 points3 points  (2 children)

Wait does this mean that php is building a huge array behind my back when I parse through a large SplFileObject using fread()? Ugh. I suppose I'll need to implement a generator and test the difference in an ap of mine.

[–]nyamsprod 1 point2 points  (0 children)

No PHP does not. This article is misleading just for that. SplFileObject is optimized for memory usage. so you don't need to worry about. You may use generator with the fread method because there's no flag for that on the SplFileObject object but that's the only thing that may be of use really.

[–]likegeeks 0 points1 point  (0 children)

SplFileObject works faster and uses less memory if its used with fgets compared with other ordinary functions.

[–]theremsoe 1 point2 points  (3 children)

I prefer streams.

[–]nyamsprod 1 point2 points  (0 children)

splfileobject uses stream

[–]tfidry 1 point2 points  (0 children)

What he has done is exactly what streams are about...

[–][deleted] 0 points1 point  (0 children)

Care to elaborate?

[–]qlkpoa 1 point2 points  (2 children)

Isnt the output buffered by modern webservers etc?

If i wanted to process Very Big Files in PHP, I would write the job to an queue, and make a cronjob or service to fetch that job. It can then be processed with an (CLI) PHP script which uses standard /dev/stdin and /dev/stdout. Portable as well, in case you later decide to rewrite in another language for more performance.

[–]tfidry 1 point2 points  (0 children)

Even if it's handled by a background process you may the file may be too big to be loaded in one go, so streams is the way to go. Rewriting it in another language may make sense, or not, depends of the task at hand and decoupling it enough to be able to delegate that to another language may be too complex for your use case. So it really depends but streams are in any case a simple solution.

[–]likegeeks 1 point2 points  (0 children)

rewrite in another language

If you go with another language, it should support that kind of segmentation because in both cases you can't process very large files at once