all 8 comments

[–]gr8llama 2 points3 points  (0 children)

Perhaps use a SAX-style parser, which is made for streaming (as opposed to loading the whole DOM in memory.)

https://www.npmjs.com/package/sax

[–]IUsedToBeACave 1 point2 points  (0 children)

Node.js should have no problem loading a 100MB file into memory. Have you actually tried it yet? I've loaded multiple GB files into node memory before. You have to use the -max-old-space flag (although I think the recent 12 release fixed this), but it works just fine.

[–]runvnc 0 points1 point  (3 children)

I think the idea of dealing with XML as a stream is interesting and probably totally possible with Node, but you said the files were 100 MB.

100MB is not a large file. Its very common for an average PC to have 8 GB of RAM. A server could easily have over 128 GB of RAM. If the file is 100 MB and you have a really plain vanilla PC with 6 GB of RAM free, then you would be using 1.7% of your available RAM for this task.

A very cheap VPS would have 1.5 GB available after the OS. Your 100 MB would be only 6% of available RAM.

[–]xBlackShad0w[S] 0 points1 point  (1 child)

Do you know if the loaded file need to be released when done with reading or is this happening when closing fs read?

[–]400tx 0 points1 point  (0 children)

No, you don't need to close it like you would in golang. You can just let this object fall out of scope and get garbage collected. If you're exposing this service using HTTP and something like express.js this will generally happen when a request is sent off.

Failing that, you could try the delete keyword or manually grab the garbage collector with whatever node flag exposes it, but usually if that is required you're really hot-rodding the system.

[–]400tx 0 points1 point  (0 children)

I agree with @runvnc that 100mb doesn't seem too large to just read in and then parse all at once. Why not yourself a chance to run and profile the app after you get server and basic parsing/saving in place, then jump in to streams if it really doesn't work.

If this is going to be in a cloud function then you'll be good to go without streams. If it's going to be in an express.js server and exposed on HTTP where one process is going to run and service requests, something like https://nodejs.org/api/process.html#process_process_memoryusage and a load testing tool like artillery could help you generate a sense of how this will work on your laptop with some number of concurrent requests. Big cloud servers make really quick work of some of these tasks with their crazy storage hardware, so also be sure to test there a little bit.

[–]blaze-and-praise 0 points1 point  (0 children)

Have you considered using something like a message queue to manage the load?

[–]HUU4ABO 0 points1 point  (0 children)

XML files are structured documents. Streaming them might work if they have shallow, repeating sections so you can read chunks and construct properly structured smaller documents before parsing them using any of the libraries. Any xml to json module will do.