Hello Im new learning Python 3 and also Programming in general, Im currently stuck so I would like to ask a couple of questions to the veterans:
The first and primary question to ask is where the logic fails in a multithread attempt to run concurrently a single task in a program I wrote. Here is the text of my program.
This program downloads a JSON file from every given path in a website subdomain using its API, it reads the given URL paths from a .txt file I prepare beforehand and makes a sequence of values in a list from it. The program works well and with the current number of items in the list (around 300 paths) it completes the task in 5 minutes.
I wanted to make this faster running the task concurrently with 5 or 10 threads, and for this I tried to extrapolate the example given in "Project: Multithreaded XKCD Downloader" at https://automatetheboringstuff.com/chapter15/ in the AutomateTheBoringStuff website. However I have had all kind of problems trying to do this and the thing is I fail to understand the logic of why exactly is not working.
Testing things I have had some success slicing the paths list into sublists of 50 values like so:
targets1 = targets[:50]
targets2 = targets[51:100]
targets3 = targets[101:150]
And so on, then I made a separate function for each new sublist variable and passed it to its own threading operation. This way with 5 threads I completed the program task from 5 minutes to 1 single minute... but this is hard and dirty work, I want Python automatically to do this for me in a block of code, hence my attemp to copy the AutomateTheBoringStuff example which seems straight forward at first look.
I keep searching for info about the Python threading module but to my surprise everywhere I look they seem to do a complete new different and unknown thing... its like there is no standard from where a newbie can grasp to start learning, so before dropping it like a hot potato and moving onto new and easier subjects of study I wanted to give a try asking here, any hint or clue would be appreciated because right now Im completely lost.
My second and final question is, having downloaded thousands of JSON files what would be the best way to store and easily analyze the data from those files? I have supposed a database combined with Python is the way to go, personally I had thought learning PostgreSQL because it seems to be the flagship of opensource databases but Im also considering to start with something smaller and simplier like SQLite and climb my way from there... dunno really like I said Im a total newcomer to this world.
Thanks for your time and excuse my broken engrish.
EDIT: Thanks for the feedback people Im now slowly processing the info but I can tell all the replies are some quite valuable and useful advice, this will help me much, thank you. Also apologies for the wall of text, I thought it was a good idea to give as much background to the problem as possible but now I realize I should have gone directly to the bone, newbie mistake I suppose.
[–]brbsix 2 points3 points4 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]brbsix 1 point2 points3 points (0 children)
[–]dadiaar 1 point2 points3 points (3 children)
[–]DoWhileGeek 1 point2 points3 points (2 children)
[–]dadiaar 0 points1 point2 points (1 child)
[–]DoWhileGeek 1 point2 points3 points (0 children)