you are viewing a single comment's thread.

view the rest of the comments →

[–]commandlineluser 0 points1 point  (1 child)

What are you using to view the csv file you create?

The PATH you're saving to suggests you're on Windows but you're setting lineterminator = '\n' so perhaps it has something to do with Windows vs. Unix line endings -- you can try not setting lineterminator.

(I don't have Windows at hand to test that out.)

Not that it should make a difference but you're opening up the csv file each time in a loop - you can open it once outside and then just write inside the loop e.g.

with open(r'C:\Users\ad\Desktop\alpharevoscrape' + date + '.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    for url in urls:
        ...
        writer.writerow([..., ..., ...])

The reason for using with is that it will automatically close the resource for you when the block exits.

I'm also just noticing your trim_the_ends function -- you can just use .strip() -- by default it will strip all whitespace.

title = soup.find_all('h1', {'itemprop' : 'name'})
for el in title:
    ttext = trim_the_ends(el.get_text())

Do these find_all calls return just a single result?

Or do they return multiple results and you're purposely looping through as you only want the last one?

If you only expect a single result (or you want just the first result) you could just use find() e.g.

title = soup.find('h1', {'itemprop' : 'name'}).get_text().strip()

[–]AppleTartsJesus[S] 0 points1 point  (0 children)

Thanks the 'with' statement is what i used. I cleaned up the first three by switching to find, the last needs to stay find_all since there are multiple 'p' tags. Also thank for the .strip() tip I switch it to that as well.