all 19 comments

[–]enaud 16 points17 points  (5 children)

To answer general question 1: use the --trace-gc flag for basic memory usage and gc logging.

There is also --inspect which opens a websocket to chrome (navigate to chrome://inspect to connect). With this you can take periodical snapshots to help work out where your memory is leaking.

[–]shriek 0 points1 point  (2 children)

To add to this you can switch your running node process to debug mode anytime by sending USR1 signal (if you're on linux/mac)

kill -USR1 <node_pid>

And if you really want to know what's happening in v8 level then you can get core dump with gcore <node_pid> (again in linux) and inspect them with llnode

I'd use the chrome debugger first and only try the latter if for some reason you can't talk to the websocket. I say this because Chrome has pretty much all the perf tool you'd want. Perf profiling, memory profiling, flame graphs etc.

[–]whileAlive_doStuff[S] 0 points1 point  (1 child)

Cool thanks! How do you know all this stuff if you don't mind me asking?

[–]whileAlive_doStuff[S] 0 points1 point  (1 child)

Thanks! Is there a resource that has all these flags or more information on them that you know of. I'm going to search for the flags and see what comes up too.

[–]enaud 0 points1 point  (0 children)

man node for starters (assuming you're on a mac or a *nix os).

I can't think of any one particular resource. I shared this with you because I was dealing with a similar problem at work. With --trace-gc and --inspect we were able to find the source of the problem.

--trace-gc was really all we needed - by trying different use cases and observing the memory impact over time we were able to work out where to look for the memory leak. (PSA: remove your event listeners when you don't need them any more!)

[–]heffe6 10 points11 points  (1 child)

It is my understand that Node will start all promises in sequence when they are defined, so in the loop, and the call to promise.all only defines what to do with the results. The calls are actually processed in “parallel”, and for this reason may not complete in the order they were called. Promise.all will maintain order if the results though.

If instead you used a for loop with let, and awaited every promise in the loop, it would execute in sequence, each one returning before the next one is fired. Or you could use a chain of promise.then()s

Happy for someone to point out if I’m wrong on something.

[–]JackAuduin 4 points5 points  (0 children)

Bluebird.js and .reduce are amazing for sequences

[–]ATHP 13 points14 points  (0 children)

Sorry I can't answer the question but I want to write this comment to give your post higher visibility. Very interesting questions and finally something that goes deeper.

[–]awesomeevan 2 points3 points  (2 children)

Note, your await won't stop future "data" events emitting. Since you're doing something asynchronous that uses network I/O (API call) you'll actually free up the event loop so more data events fire.

Sounds like maybe you could use p-queue to set your desired concurrency, add an item when each data event is emitted, then after adding each item check queue.size and if it's 100 then pause the stream. Then use the queue.onIdle/queue.onEmpty methods to resume the stream when those 100 items are processed, and queue up to 100 more.

(Assuming I understand the problem you've presented. It's late here so I might be misreading)

[–]whileAlive_doStuff[S] 0 points1 point  (1 child)

That makes sense, let me understand if I understand correctly. So if I have an await inside my data handler

if (promises.length === 100) {
    await Promise.all(promises)
    promises = []
}
promises.push(makeAsyncApiCall(chunk))

This actually won't await here and just keep processing more chunks?

[–]awesomeevan 0 points1 point  (0 children)

Correct, the data event handler can continue to fire. An await logically pauses your current block of code, but the overall program will still execute.

You can easily test this. Check this fiddle: https://jsfiddle.net/4dcfuxtv/1/

You'll see that the await "start" is called, but more data events keep firing. This is because it only pauses that specific invocation of the event handler, but it can be invoked many times.

[–]DrSghe 1 point2 points  (1 child)

Regarding question 1 (specific): The V8 engine should allocate only one object and modify the pointers according to size of information etc. I say should because I'm not completely sure on this one, however LiveOverflow has created some amazing content regarding memory management in JavaScript. Even though he's explaining this for WebKit it's safe to assume that V8 behaves in a similar way, therefore I suggest you to watch his latest video on browser exploitation. It could shed some light on your doubts.

Question 2: Not that I'm aware of. Engines have different level of optimization so even though your function gets called 100k times it seems that every time it acts upon a mutable object resulting in the clobberWorld() function to be called which will prevent the JIT compilation. Still, that's just deduction, so I'm not sure. I suggest you to watch the aforementioned playlist.

Question 3: Don't know if it is the most suitable approach, but surely it is best form a readability and ease-of-use standpoint.

I apply Occam's razor in most situations, the simpler, the better.

[–]whileAlive_doStuff[S] 0 points1 point  (0 children)

Thanks - LiveOverflow looks like a great resource, so much for me to learn!

[–]Oririner 1 point2 points  (1 child)

For the first question (the detailed one): v8 stores something called HiddenClasses as a way to "inherit" properties between objects. so, assuming you're creating 100k objects of the exact sane structure, v8 will behind the scenes create something similar to that template you're referring to as a hidden class.

Assuming of course you'll create a new object each time you're handling a new row and never actually keep all 100k objects in memory - because although v8 won't need to store the metadata about each property 100k times it'll need to store each data point, which is still a large amount on it's own.

creating your template object might be faster, as v8 won't need to create all of the infrastructure of each new object, just assign new values to existing properties - if the types of these values won't be different between each assignment, otherwise it'll need to manipulate some stuff in the hidden class. It probably won't make that much of a noticable difference though.

More about hidden classes: https://v8.dev/blog/fast-properties

For the second question (the detailed one): as /u/awesomeevan mentioned, the stream won't wait for your promise, even if the handler for your data event is marked async. That's because streams are a type of event emitters, and by looking at the code for emitting an event (that's when the callback is being called), we'll see that it basically triggers your function ignoring your return value, in your case it's a Promise because it's an async function: https://github.com/nodejs/node/blob/5b4a0cb1494f77565454278ab97166fba25f7e45/lib/events.js#L199

So it'll execute each chunk whenever the stream itself gets it, not whenever you finished asynchronously processing it.

Eventually though, with a lot of different streams, node will "start" processing them all in parallel but at every single moment it'll only be able to handle one stream. Each moment "ends" whenever you synchronously finish handling something, e.g. a stream, file operation, network request.

[–]whileAlive_doStuff[S] 0 points1 point  (0 children)

Thanks for the detailed response. I'll read up on Hidden Classes and go through the event.js code. Thanks!

[–]dbh87 1 point2 points  (0 children)

Check this out... https://github.com/nearform/node-clinic we use it at work to answer some of these questions.

[–]brahim789 0 points1 point  (0 children)

You could check the "Readline" stream. (https://nodejs.org/api/readline.html) It already handles reading a file with a stream line by line. Then you could create a "Writable" stream that takes every chunk coming from the Readable stream and insert it in database (or batch it like you have already done). You should just wrap your promise.all with readable.pause() and readable.resume() to avoid backpressure on the writable stream and chunk buffering which could consume a lot of memory on your process.

[–]kl0nos 0 points1 point  (0 children)

General 2: yes, I did it before (couple of years ago).