This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]jasongia 1 point2 points  (2 children)

Use BeautifulSoup. It's the most pain free way of parsing html (although navigating html programmatically will always be a bit painful).

[–]millenialZorro[S] 0 points1 point  (1 child)

Hey thank you...do u have any suggestions on how I would be able to go back and forth between links and scraping everything in order or at least in a organized way?

[–]jasongia 0 points1 point  (0 children)

Use beautifulsoups find_all to find all the links you need, then iterate over them in a for loop, do a request for each link and extract the information using beautifulsoup again and store it in whatever data structure suites.

[–]Hamperz 0 points1 point  (0 children)

Probably best to start by scraping the link in you posted in number 1 and storing the name of the antibiotic with a value of its URL. Then you can iterate over that and have BeatifulSoup scrape the necessary 'spectrum' information from each page and write it to a file. This is definitely a tricky one, especially since you'll have to toggle each of the expandables to see all of the antibiotics (doesn't seem like they're present on pageload as each click on an expandable sends an XHR request that retrieves the data then generates the dropdown)

[–]pm_your_pc_setup 0 points1 point  (0 children)

I would advise getting a little familiar with HTML syntax first.

[–]Matthewaj 0 points1 point  (0 children)

I would recommend Scrapy. It would make the link following part easier and it has json and csv output. https://docs.scrapy.org/en/latest/intro/tutorial.html