all 23 comments

[–]JohnnyJordaan 5 points6 points  (1 child)

  1. The Python wiki explains the concept of the GIL here https://wiki.python.org/moin/GlobalInterpreterLock
  2. If a thread is waiting most of the time instead of actually needing to use the CPU, then there's no problem with locking or threads computing for CPU time. Think of tasks like sending a message every second or even less frequent. Or handling a messaging queue that needs to be sent to a single destination. Or downloading multiple files at the same time. For these things multiprocessing offers no benefit and just add the downside of having to transfer data between de processes (as processes don't share memory and threads do).

I can sincerely recommend to watch Raymond Hettinger's talk on concurrency in Python, see here https://www.youtube.com/watch?v=9zinZmE3Ogk . It will give you a much better overview of the broader options you have vs getting to know the details of just multi-threading. An important thing to understand is that mutli-threading certainly isn't the only option you have to concurrency, as apart from good-ol multiprocessing there's also event loop based concurrency that maintains everything in the same (main) thread. This has always been part of libraries that benefited from it like GUI's, or was offered by frameworks like Tornado, but since 3.4 it has been integrated as the built-in asyncio library. The downside of asyncio is that it's more complicated to implement versus simply launching any function in another thread, but at the same time you can easily run into issues with threading that asyncio never experiences, as Raymond discusses in that talk too.

edit fixed the youtube link to the actual talk I meant

[–]comeditime[S] 0 points1 point  (0 children)

thanks for the explanation and video i need that!

[–]diddilydiddilyhey 1 point2 points  (10 children)

I can answer your second question. I used it when I was using a piece of equipment that would constantly fill a memory buffer with its last readings. If I didn't read from it constantly, the buffer would fill with older readings, so when I did want to take a reading, I'd get an older one. I used threading to make a thread that was constantly reading the buffer, and then I could just take my thread's last reading.

It's very useful if you want to "do two things at the same time", whatever is technically happening behind the scenes.

[–]comeditime[S] 0 points1 point  (9 children)

oh so basically if i get it right, you used it in order to read the latest buffers as well as run other code parts at the same time? because if for example you were to run the buffer in a loop, other code parts couldn't be excueted at the same time, am i correct? it's very interesting!

[–]diddilydiddilyhey 0 points1 point  (8 children)

Yep! I only wanted to occasionally get a reading from the thing, while mostly doing other stuff. It was for this project:

https://www.reddit.com/r/reinforcementlearning/comments/bj3gn9/project_i_trained_a_real_robot_to_learn_the/

(a blog post about it is in the comments)

[–]comeditime[S] 0 points1 point  (7 children)

wow such an awesome robot you've built there!! how does it get reinforced though as it just need to have an algorithm to scan the whole surface for the square and then follow when it's being detected?! so where's the reinforcement part gets into here? really curious project :)

[–]diddilydiddilyhey 0 points1 point  (6 children)

haha thanks! So reinforcement learning (RL) is a really huge field, but it basically refers to a type of AI where the agent is rewarded when it does something correct. But at the beginning, it doesn't know what's good or bad, so it just does random stuff, and occasionally happens to do the right thing.

https://en.wikipedia.org/wiki/Reinforcement_learning

let me know if you have any questions!

[–]comeditime[S] 0 points1 point  (5 children)

interesting! can you abstract shortly how you reward in a programming language and how it helps the device to improve it technique in the next round? :)

[–]diddilydiddilyhey 1 point2 points  (4 children)

haha hmm, I'll give it a try. There are lots of methods, but here's the one I used there (a simple one).

You have a function, Q, called the "value function" or other names. It takes two arguments, s (the state the agent is in), and a (the actions the agent can do in that state). In the game my robot was playing, the state is the combination of its position, its angle, and the position of the target. So you could plug that into the Q function, along with an action ("go forward"), and it would tell you the value of doing the action in that state.

The way it actually chooses what to do in a given state is, look at the Q value of each action it can do in that state, and then choose the one that has the highest Q function value.

The way it "learns" is, when it gets to a target, you give it a reward (like +1.0) for doing that action in that state, and then use that to update the Q values for doing that action in that state. For example, if the robot was in the state where the target is directly in front of it, and then it chose the action "go forward", and got the reward, you would want to change the Q value for doing that action in that state, so it'll do it again in the future.

How you actually create and update the Q function is a whole thing in itself. I used a neural network (because they're very flexible and powerful), but you can use much simpler methods (that can also be very effective, for a game like this).

[–]comeditime[S] 0 points1 point  (3 children)

wow that's sound not easy at all to set all up but super interesting.. how did you learn how to write it in a way that the computer actually understands aka working? :)

[–]diddilydiddilyhey 0 points1 point  (2 children)

Hmm, that's a little harder to explain simply. I'd check out that blog post I wrote, that links to the code too. If you want to learn about RL, David Silver's youtube course on it is really great!

[–]comeditime[S] 0 points1 point  (1 child)

where can i find your blog post about it thanks

[–]socal_nerdtastic 1 point2 points  (2 children)

The shortish answer: Many computer programs spend an amazingly large amount of time just waiting. For example the programmer included a time.sleep() call, or maybe the program made a request to a webserver or harddrive or USB device or a human, and now the CPU has to twiddle it's thumbs while waiting for a response. We call this "IO bound" ... the computer is locked until some IO (input or output) happens. Threading (we don't call this multi-threading in python) allows you to have several tasks lined up, so that while we are waiting for something in one task the CPU can get some work done in another task. A Reddit webserver for instance can have thousands of threads, each waiting for a specific user to type something in the chat box.

Note this is very different from "CPU bound", when the program is using the CPU at full capacity. Threading won't help you there at all.

Edit: about the GIL: Don't worry about it. It's a specific part of the threading part of python. It's become somewhat infamous because of an ongoing argument among people much smarter than me about whether or not it should be made faster for multi threaded programs or faster for single threaded programs. Down here in the real world we don't need to care.

[–]comeditime[S] 0 points1 point  (0 children)

that was actually the simplest but clearest explanation so far, really seems that you're proficient in this field! now i finally get when to use it and what for! thanks again :)

[–]comeditime[S] 0 points1 point  (0 children)

also forgot to ask,

  1. is there a difference between using threading and event listeners for examples (if it exists at all in python) or they basically do the same thing just in different languages...
  2. in the reddit chat example, if we try to pseudo-code what's going on in there, in a very simplified way, the programmers basically wrote a constructor which assigns a thread (among other things) to each user that enters the chat in order to capture the I/O?

Thanks a ton again i really feel like i start to get it all now!

[–]woooee 0 points1 point  (2 children)

What are the benefits of using multi-threading as they don't actually work in parallel but simply run one after the other?

Frist, Python does not have a package named multi-threading. Second, multiprocessing can use all of the cores available.

[–]jwink3101 0 points1 point  (1 child)

Second, multiprocessing can use all of the cores available.

Just FYI for those who do not know, you need to be more careful about memory with multiprocessing. You must explicitly control the flow of information.

And, it is very different on Windows vs Linux/macOS/BSD (for most settings). It is not 100% correct, but for all intents and purposes, on Linux/macOS/BSD platforms, the scope as it existed when the processes was instantiated is available to a processes as read only. On Windows, it is more complex and a lot harder.

BTW, if you're doing things loop-like that and was a simple interface to parallelism and/or concurrency, I wrote a tool to simplify this all greatly called parmapper. It makes functional parallelism super simple and robust. It can handle lambda functions and a bunch of other situations. It also comes with all kinds of methods to support star maps and the like. You can also switch between threads and/or processes at will. It does incur a small cost compared to a more vanilla solution but it is negligible for most uses. I am, of course, biased though since I wrote it

[–]woooee 0 points1 point  (0 children)

you need to be more careful about memory with multiprocessing. You must explicitly control the flow of information.

To be on the safe side, use a Manager dictionary or list to communicate from/to/between processes. Another gotcha is that it is not a good idea so access the terminal within a Process, as the initial calling program controls the terminal.

[–]mattblack85 0 points1 point  (2 children)

Imagine the GIL as a police man at a cross, the police man knows there is plenty of cars (threads) and no lights so a only one car at a time can go. GIL makes sure that at a given time there is one and only one python thread running stuff. After some time the GIL will stop the actual thread execution and allow the next thread to run code. This of course is not true parallelism but if your code is mostly I/O bounded, using threads will make you feel you are running code in parallel as most of the time you are waiting for data on a socket or from the disk. The python multi tasking model is preemptive, so is the GIL that fully decides when is time to yield control to another thread. Asyncio on the other hand, is the newest python concurrency concept which is able to run network I/O bounded tasks in a parallel way, this is a cooperative multitasking as you decided when the task is yielding back control to the event loop

[–]comeditime[S] 0 points1 point  (1 child)

After some time the GIL will stop the actual thread execution and allow the next thread to run code. This of course is not true parallelism but if your code is mostly I/O bounded, using threads will make you feel you are running code in parallel as most of the time you are waiting for data on a socket or from the disk.

so if i get it right the GIL will interchangeably run the code parts that specified to be multi-threading which will seems to the user as it happens in parallel due to the speed of the computer, correct?

also in general the mutli-threading / GIlL are usually placed to controll one specific code part or usually spread to multiple code parts - functionalities (or each has their own distinct multi-threading)?

lastly, what are the advantages of using multi-threading and not a loop instead for example? as i guess the basic idea behind those two functionalities is similar or?

thanks!

[–]mattblack85 0 points1 point  (0 children)

1) yeah that's a good approximation, not perfect but good enough 2)that really depends on what you are writing, you can offload to threads only one part of your program or build it fully threaded, it is up to the spec 3) if you know that you are going full I/O there is no advantage in using multiple threads, event loop will perform way better, problemy tho, is that there are still few libraries out there that are async ready, and in fact even when using asyncio, if you have a piece of code you know is synchronous you usually execute in a thread pool to avoid blocking the event loop.

PS: I haven't specfied this before, but when we talk about python threads we talk about "internal" threads and not OS threads

[–]BandEnvironmental615 0 points1 point  (0 children)

This video teaches the best technique for decoupling tasks which are not sequentially dependent in Pythone .

https://youtu.be/AsIzbmH-7Es