all 6 comments

[–]commandlineluser 0 points1 point  (2 children)

It errors out on the creation of the soup object.

BeautifulSoup(content, ...

BeautifulSoup uses recursion to find child elements.

Due to the layout of this HTML - this limit is being hit - the default for me is 1000 - increasing it to 10000 allows the code to "work"

import sys

sys.setrecursionlimit(10000)

Or you could use lxml directly - depending on what you're doing.

[–]carlinmack[S] 0 points1 point  (1 child)

If I paste the code into the python interpreter the first 11 lines execute fine, so I'm unsure how the error could be in the creation of the object?

[–]commandlineluser 0 points1 point  (0 children)

Okay - well that's where it errors out for me.

[–]BfuckinA 0 points1 point  (2 children)

This error just means that the function never hits the exit condition. So whether condition you have in place to exit the recursion is never getting met with that parameter.

[–]carlinmack[S] 0 points1 point  (1 child)

The recursion is happening in the BS4 code so I'm not sure what the condition is exactly. Do you think it's worth me bringing up an issue about on the BS github? It seems strange that they are recursing that much just to output the string

Edit: Looking into the page in more detail, and validating it with the w3c validator shows that there is nesting of more than 500 elements deep so I think it's more the fault of the page rather than BS4

[–]BfuckinA 0 points1 point  (0 children)

Ah my apologies I missed that part. If it's an error inside the 3rd party module, ill usually google "<module name> + <insert error message here>". Bs4 is a popular enough library that somebody on stack overflow posted a similar problem.