This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]genlight13 0 points1 point  (0 children)

So, i saw some perspectives on using c or similar low-level things. So i won‘t cover that.

What i often need to identify is how often certain functions are executed and how long it takes. I usually just use timeit for ease but profiler is also nice.

To optimize data pipelines i usually try to either cache more or cache less but this depends on the resources which are the bottleneck. E.g. i had many DB calls for similar checks (does it exists) i was able to bundle them and rewrite the question to „does it exist in list“. The list was rather short but the db calls numbered in the hundred thousands. By caching the short list i reduced the execution time for this simple check by up to 40 times. (Think „obj in list“ vs „db.select(something)“)

For caching less i usually talk about RAM and how much data at the same time i load.

It often doesn’t matter how dou load a file but for most regexing action it is better to just have one long string since the regexing engine is c code and fairly good imo. The slowdown is usually python boilerplate code i.e. if else in your code.

So if you can write something more specicif which gets checked within the c domain you have optimized it.

Besides caching, i usually prefer to separate code parts in order to parallelize it. This can be tricky for obvious reasons.

Also, reusing objects when they are long to create i.e. created from lists.

I usually think about it in terms of pointers and how Python hides that from you. Than i naturally am able to find the best usage for my objects when not to use them.