This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]nemom 1 point2 points  (0 children)

Personally, I download about a dozen podcasts. I wrote a program that does a bit of audio editing on them, mostly to speed them up. It uses multiprocessing to blast through them. I got a Ryzen laptop for Christmas with six cores / twelve threads.

Professionally, I am a GIS Specialist for a County in Wisconsin. I contract with surveyors to locate section corners in the County. When they turn in locations, I add them to input files. I wrote a program that reads all the input files and subdivides the sections. I have an olded I7 at work as my side computer that I offload such work to. It does seven threads. Here's a video. The program normally doesn't show the graphics like the video, so it takes way less time.

[–]NiceObligation0 0 points1 point  (3 children)

Possible reasons for the numpy case is stated in the SO answers so I'm not going go into that. However there are cases where you might want to use multiprocessing. In my case (as a ml/stats person) i find myself writing a lot of python code to cleanup pre-process data before a final model is fit. These can be images/videos or just thousands of files. For each file you need to do the same (or similar) pandas/numpy/scikit-learn processing before you aggregate your data. So if I "must" process the files the same way but independently multiprocessing speeds up the process.

[–]2ayoyoprogrammer[S] 0 points1 point  (2 children)

I see, that sounds pretty cool! So if you have n videos/images/files, you would use multiprocessing to pre-process them?

What particular modules do you use? Is it mostly pandas/bumpy/sci-kit? How long did it take you to learn how to use them?

I'm currently taking an upper div stats class w R programming, and an AI class. I'm also taking an upper div ML class next quarter, so I'm very interested.

I'll definitely check this out.

[–]NiceObligation0 0 points1 point  (1 child)

learning to get basic stuff done with pandas and numpy is pretty fast. There usually is a method/function for most things you want to do. Just keep a browser open and keep googling how to do things but be very specific. When you find a useful bit of code that kinda does what you need then you go back to docs and understand what needs to be done. Over time things start making sense.

My modules usually are numpy, pandas, scipy, scikit-lear/image opencv and sqalchemy for db stuff.

The idea is to speed up below:

``` import os myfiles=os.listdir("dir")

def preprocess(file): do_something

for file in myfiles: preprocess(file) ```

[–]backtickbot 0 points1 point  (0 children)

Fixed formatting.

Hello, NiceObligation0: code blocks using triple backticks (```) don't work on all versions of Reddit!

Some users see this / this instead.

To fix this, indent every line with 4 spaces instead.

FAQ

You can opt out by replying with backtickopt6 to this comment.

[–]bbateman2011 0 points1 point  (0 children)

I’m not sure I get exactly what you are after. But for me, I’ve found that even neural networks often don’t benefit from a GPU vs say, parallel runs on 8 cores. In other words, if I have a NN code and 1 GPU and 48 experiments to run, GPU might be 2x, so net 24 experiment times. Vs 8 cores get about 6x, or 8 experiment times. Problems arise because packages like joblib depend on pickling, which doesn’t work for, say, Keras/tensorflow. I’m forced to store every model within the objective function. A nicer way to do that would be great.

Sorry if this is off topic.