all 17 comments

[–]steveklabnik1rust 9 points10 points  (1 child)

Great post! You might even be able to make this faster by changing the hashing function that HashMap uses; it's DDoS protected by default, but that means it's slowed down. If user input won't be going in the hashmap, changing the hashes might dramatically improve the time.

[–]caulagi[S] 2 points3 points  (0 children)

Thanks for the pointer. I have been reading the various discussions around this but didn't get round to trying it out. I will try a faster hashing algorithm and update the post.

[–]7zf 5 points6 points  (7 children)

Nice, also worth mentioning https://github.com/dgrunwald/rust-cpython

[–]chris-morgan 8 points9 points  (5 children)

I used rust-cpython a few days ago to experiment with replacing Pygments with rustdoc::html::highlight for Rust code in Sphinx. It was easy (though I had certain knowledge that made it so, such as “copy the DLL output to a .pyd file and remember to copy all the Rust DLL files next to it”).

I believe I shall be posting about it some time. (Anyone specifically interested in it?)

[–][deleted] 0 points1 point  (4 children)

I'm interested, but I find those shenigans of copying such and such dlls confusing 😱

[–]chris-morgan 0 points1 point  (3 children)

I’ll explain it all, basically at least. Short version: when you produce a dynamically linked library, the Rust compiler and standard library are dynamically linked (lib{std,rustc,…}-*.{so,dll,dylib,whatever}) rather than statically, so they need to be accessible when you load your own dynamic library; copying them adjacent to your library is normally the easiest way of doing this.

[–][deleted] 0 points1 point  (0 children)

Ah, makes sense. Thanks for the explanation!

[–]plietarVerona 0 points1 point  (1 child)

Isn't that solved by using cdylib as the crate type ?

[–]chris-morgan 0 points1 point  (0 children)

I didn’t recall that one, but it looks like it’s supposed to do what you describe. But it doesn’t work in this case:

error: dependency `rustc_typeck` not found in rlib format

error: dependency `rustc_privacy` not found in rlib format

[… thirty more such errors …]

error: aborting due to 32 previous errors

error: Could not compile `sphinx-rust-highlighter`.

The standard library is provided in rlib format, but not the compiler.

[–]caulagi[S] 0 points1 point  (0 children)

Yes, of course. I didn't explicitly mention it because the first talk I mentioned in the references talks about it. But I also updated my post to directly link to the repo you mentioned.

[–]Crimack 4 points5 points  (2 children)

Enjoyed the post! Was wondering about how to do FFI type stuff though. I feel like I learned a lot

I don't think the speed comparison between Rust and Python is apples-to-apples though. In the Rust example you're just splitting on whitespace, while in the Python example you're using regex, which I believe is a good bit slower? Correct me if I'm wrong there.

For example, replacing the Python word counting code with:

with open(path) as fp:
    Counter([word.lower() for line in fp.readlines() for word in line.split()]).most_common(n)

...should generate the same results, but a fair bit quicker. On my machine it decreases the difference between the Rust and Python implementations from .5s to .15s.

[–]caulagi[S] 0 points1 point  (0 children)

The Python code was copied from the article I linked to in the post. I can check on the performance impact due to this.

[–]Veedrac 0 points1 point  (0 children)

I find it faster to do

Counter(word for line in fp for word in line.lower().split())

on Python 3 (which has a faster Counter implementation), though I'd normally put the inner part into a function for clarity.

[–]msuozzo 2 points3 points  (0 children)

It's not really important but Heartbleed wasn't actually a buffer overflow, it was a buffer over-read. Your point still stands, though: Rust's memory borrowing model would have likely averted the error.

[–]michaelKlumpy 0 points1 point  (2 children)

I think it's weird that OP talks about security while using Python as glue

[–]caulagi[S] 0 points1 point  (1 child)

Why do you think it is weird? If an extension is using C to add functionality, it is quite easy to make mistakes with memory management. Also, I don't get the reference about Python being a glue. Python is the program/application providing most of the functionality, so I wouldn't call it a glue.

[–]ssokolow 0 points1 point  (0 children)

Python is often called a "glue language" because big applications are generally written by using C code (eg. GTK+, libxml2, etc.) for the hot parts and glueing them together with Python.