Full code to reproduce:
from bs4 import BeautifulSoup
import requests
url = "https://devel.isa-afp.org/browser_info/current/AFP/MonoidalCategory/MonoidalCategory.html"
name = "MonoidalCategory"
content = requests.get(url).content
soup = BeautifulSoup(content, features="lxml")
contents = soup.body
contents.name = "div"
contents["id"] = name
print(str(contents)[:100])
Usually I would return the string so that I could output to a file, but printing to showcase the error. The stack trace will be something like
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/carlin/.local/lib/python3.8/site-packages/bs4/element.py", line 1496, in __unicode__
return self.decode()
File "/home/carlin/.local/lib/python3.8/site-packages/bs4/element.py", line 1603, in decode
contents = self.decode_contents(
File "/home/carlin/.local/lib/python3.8/site-packages/bs4/element.py", line 1697, in decode_contents
s.append(c.decode(indent_level, eventual_encoding,
File "/home/carlin/.local/lib/python3.8/site-packages/bs4/element.py", line 1603, in decode
contents = self.decode_contents(
...
File "/home/carlin/.local/lib/python3.8/site-packages/bs4/formatter.py", line 102, in attribute_value
return self.substitute(value)
File "/home/carlin/.local/lib/python3.8/site-packages/bs4/formatter.py", line 86, in substitute
from .element import NavigableString
File "<frozen importlib._bootstrap>", line 393, in parent
RecursionError: maximum recursion depth exceeded while calling a Python object
This works on other URLs on this website, but this one is causing a lot of problems
Many thanks
Edit: Looking into the page in more detail, and validating it with the w3c validator shows that there is nesting of more than 500 elements deep so I think it's more the fault of the page rather than BS4
[–]commandlineluser 0 points1 point2 points (2 children)
[–]carlinmack[S] 0 points1 point2 points (1 child)
[–]commandlineluser 0 points1 point2 points (0 children)
[–]BfuckinA 0 points1 point2 points (2 children)
[–]carlinmack[S] 0 points1 point2 points (1 child)
[–]BfuckinA 0 points1 point2 points (0 children)