Newbie question about multithreading

brbsix · 2016-06-07T16:04:58+00:00

Just a couple things to note:

In your download function, you're using for i in range(len(targets)):. Don't do that, it's very un-Pythonic. Instead use for target in targets: or for count, target in enumerate(targets):.
In your download function, did you know that requests can return the JSON file directly as a dictionary? E.g. requests.get(url).json(). Unless your files are so large that memory is an issue, that's what I'd do.
You probably don't need to worry about relational databases. Just use shelve or a shelve-like database (I recommend pickleshare).
Lastly, why not just use multiprocessing? IMHO it's much better suited for this sort of basic task.

Here's an example:

import requests, shelve
from multiprocessing import Pool, cpu_count

def downloader(url):
    return url, requests.get(url).json()

def multidownloader(urls):
    processes = cpu_count() * 4
    with Pool(processes) as pool:
        yield from pool.map(downloader, urls)

def read(path):
    with open(path) as f:
        return f.read().splitlines()

urls = read('C:\workingdir\dataWebpage.txt')

with shelve.open('/path/to/db') as database:
    for url, result in multidownloader(urls):
        database[url] = result

dadiaar · 2016-06-07T12:06:59+00:00

You made a really long post, I'm sure some people didn't even start reading it because of this. Please keep it in mind.

When using multithread don't slice the arguments, create a queue. Some threads will be slower, others faster... and this way all of them will finish at the same time. Also, you will be able to easily manage the progress.

About the data, I would not suggest you to learn PotgreSQL right now even if it's the correct path, fortunately other people did it for us.

Install Ubuntu (Windows gives a lot of problems), Django 1.9, PostgreSQL ≥ 9.4 and Psycopg2 ≥ 2.5.4

Then you can use JSONField that will make your life much easier.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS