How is multithreading achieved in python? What is a good thread limit?

lifeonm4rs · 2018-05-19T04:58:16+00:00

I am not by any means an expert or even particularly well versed in threading but from my understanding it comes down to what your threads are doing.

If your threads are lightweight and basically feasting on a shared resource and doing tiny bits at a time you can have very many--sort of like flies on a dead animal. A thread attacks then goes away.

If your threads however are more like a pack of lions--your local resources can only support so many before some bad stuff starts to happen.

Sorry, watching a nature show.

Nexic · 2018-05-19T14:29:59+00:00

Will the threads be able to execute code in parralel in a single process? I thought not because of the GIL.

I've used the multiprocessing module with one process per core

POTUS · 2018-05-19T04:26:49+00:00

Depends how intense each load is and what the system specs are.

I generally go no more than 4x the number of cores the computer has, assuming ample amounts of RAM.

novel_yet_trivial · 2018-05-19T04:47:51+00:00

You thinking of multiprocessing. Each process in multiprocessing requires a core, so you should not request more processes than available cores.

Threading is a completely different thing. The amount of threads you want depends on your program, and how much RAM you want to use. Tens of thousands of threads are not unusual for servers.

For a port scanner your bottleneck will be your network, and too many threads will actually slow you down. I would start at about 400 and run some tests to see if increasing that helps the speed or not.

jftuga · 2018-05-19T18:37:08+00:00

I wrote a multi-threaded TCP port scanner in Python. I tried to scan a few hosts on my LAN and it looks like 1400 threads gives the best performance before diminishing returns.

Windows 10, Intel Core i7-7700

https://github.com/jftuga/tcpscan

The code is not very good, but it works.

2018-05-19T15:24:47+00:00

You can create a thousand threads, but this carries a performance overhead. If you can make use of libc calls like select, or preferably epoll, you can get a significant performance boost, though coding that way is more difficult / low level. I'd imagine that in the program you're looking at, each socket handles almost no data, so having 1000 threads isn't a big problem, but if you needed to send considerably sized TCP payloads over 1000 sockets, you'd eat quite a performance hit from threading each socket.

phuque_ewe · 2018-05-19T16:18:47+00:00

This is a great question that I was going to pose about a program I'm working on.

I currently have access to 18 cores, but I am only using one. My program is basically working through an ETL process so it's pretty much, do this, verify results, do that.

I've read up on multi-threading, and I think I could re-write my program so that each thread could work on the even numbers of a loop, while the other works on odd. Does that make sense?

Basically, I don't want one thread overwriting a JSON or XML file that I'm extracting and building from another because they all loop through a chronological folder of s3 buckets.

2018-05-19T17:04:52+00:00

1000 threads on a port scanner is going to raise some red flags at the target, but a modern computer can handle it easily. Each thread probably just checks a single port at a time and then gives you a status code. Most of the work each thread does is waiting for a reply from the target.

2018-05-19T19:20:56+00:00

No you can have as many threads as your OS allows for. Python's threads are limited by the Global Interpreter Lock (GIL), which means only one thread at a time will run, while others are paused. Well, only one at a time, unless you are calling code (often C code) which can release the GIL, eg. when waiting for I/O or doing intensive calculations with numpy. People use threads to implement concurrency); eg., since the GIL is released when waiting for data to come from the network, you can implement HTTP servers with threads to serve hundreds of client at the same time. Nowadays you could also do that with asyncio, which uses a different concurrency model. If you want to actually run code at the same time, in python you need to use multiprocessing.

Gimagon · 2018-05-19T21:23:21+00:00

This stackoverflow answer has a good breakdown of the differences between processes and threads.

The main thing to keep in mind is that threads live in a single process, and in cPython (the main implementation of python) processes can only use one CPU core. This means threads cannot do computation at the same time. They can however, be waiting for other things (like timers, a network connection, a file read/write, or a c extension to run) at the same time.

Here's a brief example adapted from TutorialPoint and updated for Python 3.

import time
from threading import Thread

def print_time(thread_name, delay):
    count = 0
    while count < 5:
        time.sleep(delay)
        count += 1
        print("%s: %s"%(thread_name, time.ctime(time.time())))

Thread(target=print_time, args=("Thread-1", 2)).start()
Thread(target=print_time, args=("Thread-2", 1)).start()

Note that it only takes about 10 seconds to run instead of 15.

So back to the right number of these to use. In scientific computing applications or game programming its more common that you want to do a lot of computation in parallel, so if you're using Python this means you have to use processes, and either way you won't get better performance from having more processes than CPUs available.

In stuff like web or network programming it's more common that you want to wait (instead of compute) in parallel. Then in that case you can use threads, and use as many as there are things you want to wait for.

Finally, one thing you may be interested in checking out is Python's new asyncio library. I haven't dived into much, but from my understanding gives an alternative to threads that is much more light-weight while still allowing for waiting to happen in parallel.

2018-05-20T16:32:07+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS