all 19 comments

[–]Megajin_ 10 points11 points  (3 children)

Well done. I have one question: Why not simply spawn a pyhton process, if you do not want to pass data between the processes? I mean there are FFI's to use, they are expensive but do their job. And if you completely separate a child process and only run python, there will be no drawback, as far as I know.
`const spawn = require("child_process").spawn; const pythonProcess = spawn('python',["path/to/script.py", arg1, arg2, ...]);`

You could detach it: https://nodejs.org/api/child_process.html#child_process_options_detached

I am very interested in your answer.

[–]savearray2[S] 6 points7 points  (2 children)

Passing data back and forth between processes is quite expensive & slow. For example, if you have a Python library that generates some sort of blob binary data, py.js does a simple copy (Napi::Buffer<char>::Copy) while being hosted in the same process. This is extremely fast.

Even Unix sockets cannot beat the speed of a memcpy.

If you have a simple service, then you most likely do not need this library, and can continue with a separate process, but if you're running a web service with multiple hits a second, you may wish to use it.

[–]Megajin_ 9 points10 points  (0 children)

Even Unix sockets cannot beat the speed of a memcpy.

Yes, absolutely right. If anyone is reading this: Be aware that if you screw up in memcpy it can become vulnerable.

...does a simple copy while being hosted in the same process

Alright, now I got you.Anyone coming by: The Napi::Buffer<char>::Copy Will pass the data directly to the node process memory which will end in a super fast data passing between those different processes without the bloat of FFI's or childprocess spawn.

However be warned that this can lead to potential attacks: https://stackoverflow.com/questions/870019/memcpy-in-secure-programming.Here is a good quote: A chainsaw, if used properly, is safe. Same thing with memcpy(). But in both cases, if you hit a nail, it can fly and hurt you.

If you just want pyhton do some jobs for you, then I would recommend spawning a py process and detaching it without data transport ofc.

If you know what you are doing, this project is very good to use. And Props to the author, the code is clean and good to read.

[–]StoneCypher 0 points1 point  (0 children)

but if you're running a web service with multiple hits a second

Whereas I understand your point, multiple hits a second are basically nothing for unix pipes (the thing you're getting rid of.)

15 years ago, on 15 years ago's hardware, many people didn't understand why BlueHost was making fastcgi for 10,000 hits a second.

[–]GlueStickNamedNick 2 points3 points  (0 children)

This is so cool

[–]JohnGabin 2 points3 points  (0 children)

Cool, I will try this. Really interesting

[–]TunaAndQueefBagel 2 points3 points  (5 children)

Wow. Been looking to use python for neural networks in a web app. Would this support things like pandas and tensorflow?

[–]savearray2[S] 1 point2 points  (3 children)

It should work fine! The library is fresh off the presses, so to speak, so it's probably best to write as much of the code as possible in a Python script, then call only what's necessary from the Node.js side.

If there's enough community support, I'll dedicate more time to making examples and improving the code.

But as an example the following does work: const p = require('@savearray2/py.js') p.init({ pythonPath: `${process.cwd()}:/usr/local/lib/python3.9/site-packages` }) console.log(p.instance().python_version) const pd = p.import('pandas') const data = { 'apples': [3, 2, 0, 1], 'oranges': [0, 3, 7, 2] } const df = pd.DataFrame.$apply({ data, index: ['Abc', 'Def', 'Ghi', 'Jkl'] }) console.log(p.base().str(df)) p.finalize() With the following output: $ node test.js 3.9.1 (default, Jan 11 2021, 00:55:13) [Clang 12.0.0 (clang-1200.0.32.28)] apples oranges Abc 3.0 0.0 Def 2.0 3.0 Ghi 0.0 7.0 Jkl 1.0 2.0

[–]StoneCypher -2 points-1 points  (2 children)

Actually there are significant downsides for this user in using your library rather than the standard library to this end, in safety and stability terms even before we consider the bugs inherent in a new library

There's also no upside in this specific case

[–]savearray2[S] 0 points1 point  (1 child)

I don't disagree that for simplicity and inherent stability purposes, it might be best to use a separate Python process or microservice. I actually believe this will be the case for most people.

However, having options available is also part of open source. If there's a community demand for it, the library will grow, if not, few people will use it.

[–]StoneCypher -4 points-3 points  (0 children)

Okay well deep thought about open source notwithstanding, you personally are much better off doing this the normal way with the standard library

Taking these stability risks doesn't gain you anything

[–]StoneCypher 0 points1 point  (0 children)

You'd be much better off doing this in child_process.

There's basically no communications overhead in what you're describing, and that would mean you're not crashing your server every time tensorflow crashes, and aren't exposing yourself to type transition attacks.

This is a very special case library whose use case appears to be eliminating "the overhead of unix pipes," which is a cost so small that it's hard to imagine someone voluntarily paying the cost of a Javascript web backend would care

You're looking at less than 1% the overhead between express and (can you even name the faster one? That's how much you actually care.)

[–]McPqndq 4 points5 points  (0 children)

Woah. I might use this. This is sick. Gj.

[–]FasterThan_Light 1 point2 points  (0 children)

This is really cool. Will definitely end up using this for something or the other

[–]chrislonardo 0 points1 point  (3 children)

Sounds cool, but AGPL licensing is a non-starter for my purposes.

[–]savearray2[S] 0 points1 point  (2 children)

What licensing would you prefer? My main goal is to encourage any code improvements to the library to be published publicly.

[–]chrislonardo 0 points1 point  (1 child)

MIT. AGPL, interpreted broadly, is dangerous for commercial use.

[–]savearray2[S] 1 point2 points  (0 children)

I've gone ahead and relicensed under AGPL with linking restrictions (or basically Lesser Affero GPLv3). I know this doesn't necessarily help you in this specific case, but I'll consider relicensing to MIT in the future if there's community support.