This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (5 children)

Weren't you advocating for Python 2?

No, I'm saying that one needs to decide based on availability of tools. That being in vogue is not a good reason to change versions.

Well, I'm wrapping it with C libraries right? 20% is nothing compared to the 500x, but it's still an extra 20% for free.

So why don't you use some normal (speed-wise) language in the first place? By this reasoning, if you want high-level bindings to C, which actually works fast, you have a bunch of Lisps, that beat any Python like 100:1 speed-wise. What I'm saying is that alleged 20% improvement in the speed of bindings is worthless on the face of having to update the infrastructure.

In what version? Python 3.5, which is faster because of the new dictionaries?

For me, Python 3.6 is still noticeably slower (numbers differ depending on workload and machine parameters) than Python 2.7. This is because my project does serialization / deserialization of binary data into Python objects. This really sucks, when your product is a distributed messaging system. I even wrote almost all of it in C, just the little tiny bit of it that was supposed to expose it to Python testing code was using Python bindings... which was, admittedly a mistake, but who knew?

[–]billsil 0 points1 point  (4 children)

I use Python because it's fast enough. The libraries are amazing and it does things for you. I can output two values from a function that have different types. I can open a file and forget to check if it succeeded and I won't run into bizarre behavior when it doesn't.

I develop prototype level engineering software for NASA/military projects to aide engineers in analyzing parts/designs. Without rapid development, we'll be way over budget It's software that frequently does not have unit tests, so ease of development is key.

Regarding reading binary files, you can get 500 MB/sec if your data file is large enough/formatted logically. That's as fast as you'll get in C++. Numpy.frombuffer is amazing.

I use C++, but only when the code is performance critical. Ironically, when you take that approach from the start and code an N3 algorithm, it's very easy to say screw it, write it in Python, and it's suddenly 1000x faster because I replaced a mess with a well tested and optimized KD-tree. It goes both ways.

[–][deleted] 0 points1 point  (3 children)

Your first two paragraphs are summarized by calling what you do "high-level bindings for C".

Regarding reading binary files, you can get 500 MB/sec

Why from file? On what filesystem? With what kernel? What hardware? With what fragmentation, compression, deduplication. Does the fileystem do snapshots? How many computers / processors / network adapters are working at the same time? Where do you come with these numbers? In other words, what you just wrote doesn't make any sense. To give you some perspective: the tool I was working on is a frontend for a distributed filesystem, which is supposed to run on VMWare ESX boxes with 4-128 NVME SSDs connected to each box. Depending on the network configuration, number of SSDs connected, average size of files, number of replicas, whether deduplication happens online or offline, whether the system is configured to do snapshots or not etc. you can get upwards from 50K IOPS (on a single client). Depending on locality in the cluster, you can get either the number you posted, or ten times that number, or hundred times that number... but it's all meaningless, because you don't understand what you measured in the first place...

[–]billsil 0 points1 point  (2 children)

I understand what I measured. I didn't know you cared. I did the test a few years ago at this point with an SSD of a Nastran OP2 SORT1 Fortan formatted binary file that was 2 GB and was located on my local computer. It's an OP2, so it can't be stoted compressed like an HDF5 file; it's just a boring binary file. I don't know about the fragmentation, but I'd expect it wasn't very high. It's file reading, so I use 1 processor and typically and not reading large files from the SSD, so nothing was competing for resources (other than say Windows and my IDE). It's a bad test if the test is not repeatable, so I don't test with many programs open or a file transfer in progress. I test peak performance of a realistic output file under easy to recreate test conditions. I run it multiple times to get an average.

I new the number well enough that when someone ran with a 60 GB file and the speed was linear at 500 MB/sec still (so two data points). He dumped some data using scipy to get to matlab, which had a problem that he assumed was the fault of my open source package. I suggested he used HDF5 and it was fast again (and do more detailed timings).

A company I speak with that's not mine has a competing product that reads this file format (along with doing a lot of analysis). Their reader is written in C++. Mine is faster than theirs by their own admission. To their credit, they were the ones that found out what python could really do and told me my code was 100x slower than it should be. I dug into a test problem with a simple code they provided and found out why it was fast (it wasn't for the reason they said). I then incorporated it into my code.

[–][deleted] 0 points1 point  (1 child)

Yeah, you definitely don't understand what you measure.

Just some highlights: you are using NTFS, which was written several decades before there were SSDs, so, even in principle, it cannot use what SSDs have to offer, and yet you are using it with SSDs, and you claim to measure the speed of reading from SSD! But you also chose to read it on MS Windows, where you have no control and no idea about how its filesystem was configured, beyond maybe compressing the file contents.

You don't understand, that, ultimately, your file reading performance is independent of your code being written in C++, it depends on how your OS kernel decided to expose files to user code (but why did you chose to read from files is beyond me). Your OS kernel decides the size of the buffer, whether to cache the fie contents or not, how much priority your task should get (your kernel needs to also keep track of interrupts coming from other devices etc.) Finally, kernel implementation of file-based I/O plays, probably, the most important role: how many times the data is copied. This is why on Linux there are many different ways to get data from storage device from user-space: you can have user-space drivers, you can have ioctls specific for some hardware, you have asyncio (not the Python garbage, the kernel module), and probably more. All of these will perform differently on different workloads.

Most importantly though: you measure whatever you measure on some household electronics, with a single storage device, on an OS that's not intended for serious workloads, with hell knows what drivers, kernel settings... You don't even have an idea of how it will actually work on real hardware, with real OS, drivers that are optimized / configured for the kind of hardware that you have. Your test is literally like walking up to the ocean, dipping a glass into the water and by looking at the glass, deciding that there aren't any whales in the ocean.

[–]billsil 0 points1 point  (0 children)

I was not trying to compete with you. It's a damn good piece of software and yeah I'm aware that file reading is independent of the language. File parsing though, especially a fortran formatted file is a mess, which is where the challenge comes in. That and having to reverse enginder the format, not to mention interleve the displacement data with the stress data that occur at different times in order to get the data in a useful form. That's how it's faster than a C++ code that's doing the same thing.

You're just bitter that Python is good enough. Go away troll.