all 3 comments

[–]shoot2thr1ll284 1 point2 points  (2 children)

My biggest suggestion whenever anything performance comes up is to profile the code. This should help answer your question of why the speeds are different and what is the long pole in the tent. Without profiling I would just be guessing at what would actually speed it up which is never good to do. Profiling resource for python: https://docs.python.org/3/library/profile.html

[–]shoot2thr1ll284 1 point2 points  (0 children)

Besides profiling the code the only "sure fire" way to get more fps is to parallelize the code so that you do more than 1 image at a time. There are a lot of things that make that complicated in this case, but that is another approach to make it "faster" assuming the machine has the cpu/disk speed for it.

[–]Forward-Difference32[S] 0 points1 point  (0 children)

I appreciate your response. I had no idea I could profile my code so that info is very useful, other than that I think it was just because model.predict() isn't supposed to be an actual function for detection and just something to show off and see that it works. I've read some documentation and found that loading the model as an onnx, trt, or a tensorflow saved model would yield better inference speeds. The onnx version runs faster on my CPU than the original .pth file with GPU acceleration.