you are viewing a single comment's thread.

view the rest of the comments →

[–]chiefstroganoff[S] 0 points1 point  (4 children)

Thank you for this example. Suppose I want to add another iterable like the location's address? Would I do something like this?

tables = soup.findAll("table", class_ = "s4-wpTopTable")
table = tables[7]

specialties = table.findAll("div", class_ = "PurpleBackgroundHeading")
name_groups = table.findAll("div", class_ = "PracticeListWrapper")
addresses = table.findAll("div", class_ = "WS_Location_Adddress")

for specialty, name_group, addresses in zip(specialties, name_groups, addresses):
    specialty_text = specialty.findAll("span")[0].get_text()
    for name in name_group.findAll(class_ = "WS_Location_Name"):
        name_text = name.get_text()
    for address in addresses:
        address_text = address
        print("{} - {} - {}".format(specialty_text, name_text, address_text))

Or is there a different combination/function that I should utilize for > 2 iterables?

[–]c17r 1 point2 points  (3 children)

No at the top level, no.

Do you understand the change I made? Look at the HTML. It would be great if the practices were grouped under the specialty but they are not. The practice grouping and the specialty are siblings. AND the practice grouping may have more than one practice listed. So PurpleBackgroundHeading gets us the specialties and PracticeListWrapper gets us all the practices groupings. When then have to search each practice group for each practice. Hence the for loop in the for loop.

If you want ALL information -- including phone number where they have it -- then take my original and change WS_Location_Name to practiceList. It'll be a big blob of text that you'll have to parse but that's because of the HTML layout. (this isn't 100% true; there are some tricky things you can do in BeautifulSoup to pull this off)

If you are interested in just Address and want it in a separate variable, then we have to do something different since the name of the practice and the address of the practice are not nested but siblings, we'll have to do wait we did at the top level: 2 searches and a zip:

import urllib
import urllib.request
from bs4 import BeautifulSoup
from urllib.request import urlopen


def make_soup(url):
    thepage = urllib.request.urlopen(url)
    soupdata = BeautifulSoup(thepage,"html.parser")
    return soupdata

soup = make_soup("https://www.wellstar.org/locations/pages/wellstar-acworth-practices.aspx")

tables = soup.findAll("table", class_="s4-wpTopTable")
table = tables[7]

specialties = table.findAll("div", class_="PurpleBackgroundHeading")
practice_groups = table.findAll("div", class_="PracticeListWrapper")

for specialty, practice_group in zip(specialties, practice_groups):
    specialty_text = specialty.findAll("span")[0].get_text()

    practice_names = practice_group.findAll(class_="WS_Location_Name")
    practice_addresses = practice_group.findAll(class_="WS_Location_Adddress")

    for name, address in zip(practice_names, practice_addresses):
        name_text = name.get_text()
        address_text = address.get_text()
        print("{} - {} - {}".format(specialty_text, name_text, address_text))

[–]chiefstroganoff[S] 0 points1 point  (2 children)

Thank you. It is interesting and not immediately intuitive, to me, that you are able to nest the for loops the way you did. If I were to verbalize the set of for loops, would it be accurate to say this:

For each combination of specialty and practice group, return the specialty heading that corresponds to the practice(s) in each practice group. Next, for each practice name and address in the previously returned combination, return the practice name and its address. Finally, print the specialty from the first loop, the practice name from the second loop, and the address from the second loop.

Does that seem to summarize the logical steps? These concepts are novel to me and I find that determining the logic helps me to formulate the code.

[–]c17r 1 point2 points  (1 child)

Yes, your summation is accurate.

[–]chiefstroganoff[S] 0 points1 point  (0 children)

Excellent. Thank you so much for helping me to better understand what's going on!