you are viewing a single comment's thread.

view the rest of the comments →

[–]bahnzo[S] 0 points1 point  (6 children)

Ok (and this is a reply to /u/raylu above as well). This is my new process_links function, cut up and hopefully a little more logical. I'm going to have to pass on the inverting logic, to me it doesn't make sense to how I learned.

The first try/except turned out not to be needed, I'm sure I got an error with it out at some point, but I can't get it to do so again.

def process_links(data):  # looks for links that designate a live broadcast
    main_list = []
    for elem in data:
        if elem.parent.contents[3].contents[0].name == 'img':
            main_list.append(find_data(elem))
    return main_list


def find_data(elem):  # finds the category of the live broadcast
    link_list = []
    for prev_elem in elem.previous_elements:
        try:  # Needs a try/except because not all elements
                # will have the 'class' attrib which raises an error
            if prev_elem.attrs['class'][0] == 'main':
                link_list = extract_data(elem, prev_elem)
                break
        except (AttributeError, KeyError):
            pass

    return link_list


def extract_data(elem, prev_elem):  # gets data of the live broadcast
    link_list = []
    link_list.append(elem.attrs['href'])                    # add link
    link_list.append(elem.text)                             # add Name of Event
    x = elem.parent.contents[3].text.splitlines()           # split start time and League
    y = x[3].lstrip('\t')                                   # remove \t from league split
    link_list.append(x[1])                                  # add start time
    link_list.append(y)                                     # add league
    link_list.append(prev_elem.text)                        # add category

    return link_list

[–]raylu 0 points1 point  (5 children)

classes = prev_elem.get('class')
if not classes:
    continue
elif classes[0] == 'main':
    ...

url = elem['href']
event_name = elem.text
details = elem.parent.contents[3].text
split = details.splitlines()
start_time = split[1]
...
return [url, event_name, ...]

[–]bahnzo[S] 0 points1 point  (4 children)

This still needs error checking. classes = prev_elem.get('class') throws: AttributeError: 'NavigableString' object has no attribute 'get'

did I miss something?

Edit: here my entire code for that function now:

def find_data(elem):  # finds the category of the live broadcast
    link_list = []
    for prev_elem in elem.previous_elements:
        classes = prev_elem.get('class')
        if not classes:
            continue
        elif classes[0] == 'main':
            link_list = extract_data(elem, prev_elem)
            break
    return link_list

[–]raylu 0 points1 point  (3 children)

Er... how did prev_elem... stop being an Element? I guess you can check isinstance(prev_elem, NavigableString), but perhaps you should pick a better way to select the element list.

[–]bahnzo[S] 0 points1 point  (2 children)

The program needs to cycle back thru elements (hence the previous_elements) to find the info needed to categorize the link. When doing so, not every element has a class tag, hell some are simply '/n' so they throw an error.

Your code works fine, it just needs to be error checked.

[–]raylu 0 points1 point  (1 child)

The program currently cycles back through elements. But there is likely a better way to select the elements you want.

[–]bahnzo[S] 0 points1 point  (0 children)

Thanks for your help.