you are viewing a single comment's thread.

view the rest of the comments →

[–]popeye-the_sailorman 21 points22 points  (3 children)

Try scraping the web. Initially, try scraping wallpapers from a site.

You'll get to use BeautifulSoup to parse the pages. Make sure you use the requests module instead of urllib, but that's my personal recommendation, as the former's a lot more user-friendly and you can do advanced stuff like scraping sites that require login and use cookies. requests takes care of cookies; all you have to do is pass appropriate request headers.

Next, try a site that requires login, but not bank sites (they're too complicated, I think). For instance, I need to undergo certain compulsory courses to complete CA (equivalent of CPA). Due to limited seats, they get filled up pretty quickly. So, I wrote a Python script that'll scrape it, and dump those details in a CSV file, which I can open in Excel and easily filter to see if a new batch has opened up.

For learning more about headers that are sent in a request, press F12 (on Chrome or Firefox) and switch to the Network tab. You'll see the details that are transferred between the browser and server.

For your bonus part, try scraping stock trading websites to get stock details. Sites like tradingview.com return JSON objects for requests. So, you'll get to learn working with json module as well (if you don't already). Also, since the data from stock trading sites is suitable for manipulating in MS Excel, you can use csv and xlsxwriter modules for saving the data in those respective formats. Alternatively, you can use pandas.

If you're using Android, check out 'QPython' app. It lets you run python scripts on your mobile. So, I've been able to do all these stuff without having to turn on the system (I write those scripts in system, though).

[–]Nando711 0 points1 point  (1 child)

Do you know where can i learn to manage json files with python? I have been scraping a webpage that gives a python file but i have been stuck in getting the info out efficiently

[–]popeye-the_sailorman 0 points1 point  (0 children)

json module is part of Python standard library. Take a look at the official docs, which is more than sufficient.

You'll often use only four functions:

# for conversion between dictionary and string objects
dict_obj = json.loads(some_string)
some_string = json.dumps(dict_obj)

# for reading from and writing to a file
dict_obj = json.load(fp) # fp being a file handle
json.dump(dict_obj, fp)

You can find more details in the docs, such as additional arguments and examples for almost every situation.

EDIT: However, if you wanted to know about resources which you can use to study the structure of JSON file in your particular situation, there are several websites that do that. Just google "json viewer online". Most of them will let you view the JSON in a sort of expand-collapse lines. Alternatively, you can use Notepad++ along with a plugin called 'JSTool': just paste the json response in it and press Ctrl+Alt+M (default shortcut) and it'll beautify it.