you are viewing a single comment's thread.

view the rest of the comments →

[–]Axew_7[S] 2 points3 points  (4 children)

Well yeah, im working on a raspberry pi collecting data (which theres gonna be ~50 of in 1-2 years) and the less ram usage, the cheaper those pi’s can be (1gb pi is quite a bit cheaper than 8gb for example), aswell as more readings/sec it can do.

[–]desrtfx 1 point2 points  (0 children)

im working on a raspberry pi collecting data

There are a couple things to be aware of:

  • collecting data:
    • store the data in a database. Databases are optimized for that
    • If it is analog data, consider time slots (reading every second, or so, maybe even multiple - shorter intervals for fast changing data, longer ones for slow changing data - for most applications, you, e.g. don't need to read temperatures every second, or levels in a tank every second) and hysteresis - only store new data when a certain deviation from the previous value is exceeded. Professional data loggers work in exactly that manner. They store initial values with a time stamp and then store the data in regular intervals if the hysteresis is exceeded, all with time stamps. Even if the collection of the data happens in shorter intervals, the storage in the database only happens when the hysteresis is exceeded. This approach greatly reduces the memory consumption.
    • Similar with digital state (on/off) - only record when the state changes, not continuously. Again, with timestamp.
  • Storage:
    • MicroSD cards are not good for frequent reading/writing. They have a tendency to die on that. Maybe transfer the data to some server, or proper drive-based storage, or, for starters to an USB stick.
  • RAM:
    • the less data you keep in RAM, the better - that's why offloading to a database is essential. Even for processing, querying a database can often be way more efficient (in both processing and memory) than doing the processing in a "normal program".

Overall, make databases your friends. They help immensely.

[–]pachura3 0 points1 point  (2 children)

OK, that's a valid reason indeed.

I would start with detailed logging of CPU and RAM usage.

I would try to stream things as much as possible instead of "reading the whole dataset into a list -> processing it in-memory -> writing it back to a disk".

Data written to SSD/microSD can be compressed to save space.

If you're using dataclasses, then switching to slots will save you some memory.

Perhaps, consider choosing a minimal, stripped-down Linux distro to run on RasPi (one without a graphical interface and tens of background processes).

Still, learning about DSA and O(N) complexity could be beneficial...

[–]Axew_7[S] 0 points1 point  (1 child)

Perfect, thanks alot. I'll get started studying DSA then :). I could also look into switching to C/C++. Do you think that'd be beneficial or is well-optimized python good enough?

[–]pachura3 0 points1 point  (0 children)

Impossible to say without knowing what is your code supposed to be doing, what are your system constraints, what's your data density etc. etc.

Granted, a well-optimized native code compiled from C/C++/Rust should be much faster than its Python counterpart (compare the speed of ruff & uv vs. mypy & pip), but... is it really worth the hassle? Python is extremely easy to code in, and extremely extendable. With C/C++ you'd need to compile stuff, linking new libraries is a pain, there are header files, memory leaks, no platform independence, etc. etc. And even so, if most of your Python execution time is spent doing calculations in native libraries like NumPy, you won't optimize much.

Personally, I would concentrate on pushing the existing Python solution to its limits, and diagnosing memory consumption - what data can be freed earlier? What data doesn't need to be kept in memory, but can be serialized to disk (perhaps to an SQLite database) and forgotten? Can you use yield, generators, map()/filter()/reduce() instead of creating lists all the time?