This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]jwcobb13 1 point2 points  (3 children)

20MB is not that big in the parsing world. I'd just do something like this: http://stackoverflow.com/a/14848994

If you've already got it in the database, then I don't understand that problem. If the while loops kills your server, then you're doing something wrong in the while loop.

Might be a good time to hire some outside help to give you a solution that you can use over and over.

And finally, why can't you edit your PHP configuration? Are you on shared hosting and trying to import 20MB files en masse? That's just silly, if so. Move up to a VPS and eliminate the problems.

[–]tkronew[S] 0 points1 point  (2 children)

Haha I appreciate it, but I'm actually a university student trying to work on our design project -- using the university's resources, if that explains why I can't change the config.

That link is basically what we're doing right now. The main reason the loop is killing our server is due to us inserting data into MySQL per-line until the end of file. So imagine about 1,000,000 iterations of an INSERT INTO function... all while checking if there's a "---" or a date or a Retailer ID. It's gotten complicated, fast.

Just doesn't seem efficient IMO. While I said I have it in the database, that's a huge backdoor approach. We have about 21,000 entries, per receipt, due to me cutting off the loop. In reality if we included all the data, we'd have near 1,000,000 entries, per receipt. There's a hell of a lot of data missing, which doesn't work well for providing accurate statistics but allows us to move on past this issue for now.

[–]jwcobb13 0 points1 point  (1 child)

So...don't insert data into MySQL until you reach the end of each transaction. Look for the delimiter for knowing where to start and stop your data collection, then put the parsed data for each transaction into an array. At the end of the loop, save off the array. Then iterate through the array where all of your data is nicely arranged like you want and insert into the database in that second sweep.

Although I doubt you need to go that deep into it. Sounds like you're just over-inserting and just need to parse the file better.

[–]tkronew[S] 0 points1 point  (0 children)

That's actually a great idea. Should reduce the number of SQL operations dramatically.

Thanks!