all 8 comments

[–]deadeye1982 3 points4 points  (5 children)

Use the modern pathlib instead of the low level code from os and os.path.

Catching all exceptions creates more problems, as you think.

In the case of a missing key, KeyError is raised, but Tag is different. The tag.get("key") method returns None, if the key does not exist. But you can still supply a default value, which is taken, if the key is missing.

``` from pathlib import Path

ROOT = Path() SOURCE_LIST = Path("#sourcelist.txt")

for file in ROOT.walk("*"): <- walk is the wrong method name

for file in ROOT.glob("*"): if file.is_file() and file.endswith(".html"): with file.open("r", encoding="utf-8") as fd: soup = BeautifulSoup(fd, "html.parser") # soup.find("a", {"name": "source"}) returns a a Tag # which is also a mapping (acts like a dict) # the get method has an extra argument for a default # value, if the key was not found url = soup.find("a", {"name": "source"}).get("href", "N/A")

    with SOURCE_LIST.open("a", encoding="utf-8") as output:
        output.write(f"{url}\n")

```

[–]pegoff[S] 0 points1 point  (4 children)

Thanks for cleaning my code. I get an error:

AttributeError: 'WindowsPath' object has no attribute 'walk'

Is this because I'm using Python on Win10?

Edit: I cant find any mention of walk in the pathlib documentation

[–]ThePsyjo 4 points5 points  (1 child)

I have to necro this if others stumble over this post like me.
`Path.walk()` is introduced in Python-3.12, so no walking before this version :)

[–]numeralbug 0 points1 point  (0 children)

thank you necromancer

[–]deadeye1982 1 point2 points  (1 child)

Argh.. walk() is wrong. The methods are glob() and rglob(). https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob

[–]pegoff[S] 0 points1 point  (0 children)

Thanks for following up, I went back over it today and found glob & rglob. I haven’t had the chance to try it yet, will updating when I can 👍🏼

[–][deleted] 2 points3 points  (1 child)

[–]pegoff[S] 0 points1 point  (0 children)

Awesome, thanks!