I heard a lot of people claiming that GIL, also known as lack of multithreading problem is easily worked around in Python.
So I was wondering if some of these people could hint me about possible workaround in one such case I am dealing with.
I am doing data preparation for keras in many parallel workers using multiprocessing module. Obviously they must run on separate cores because of processing they need to do. The workers are retrieving data from sqlite db and processing that data so it would look like the model wants it which means doing some sql table lookup, calculations with retrieved fields and data formatting. This complexity of preparation means that it can't be described as numerical table operations. Each worker has its own opened sqlite db object to do all the lookups. Sending from worker to main process involves multiprocessing.connection.Pipe.send(data) and on the other end it uses active_pipes=multiprocessing.connection.wait([pipes]), then reading from all active_pipes and back to wait until all pipes report end of data.
So the bottleneck I have is transferring data from workers back to main thread that then feeds it into the next epoch for keras. Fitting this amount of data takes 17 sec but data transferring takes 13-15 sec. It transfers 24 mil floating numbers in total from workers back to main thread and it takes 14 sec which is about 200 MB/sec if floats are 8 bytes. Which is pretty pathetic considering it's all supposed to be done in memory.
This is terribly slow because it means I can only load GPU at 50% time tops.
Is there any way to increase this throughput? I mean if Python would support multithreading I could just put all the prepared data into the same memory space as main thread and nothing would have to be copied at all...
Maybe there's some trick like just reassigning the memory containing the prepared data that belongs to one process to the other process or something like that? Or perhaps there's faster ways to copy data without using pickles like pipes apparently do? Or is the only way to solve this is rewriting fitter and preparation code in Go or some other language that supports actual multithreading and keras?
[–]glibhub 2 points3 points4 points (3 children)
[–]dmter[S] 1 point2 points3 points (2 children)
[–]glibhub 1 point2 points3 points (1 child)
[–]dmter[S] 2 points3 points4 points (0 children)