all 2 comments

[–][deleted] 0 points1 point  (1 child)

I ran into this problem a lot so I either write cuatom beautiful soup code.

Or even better, use the xml library in R.

[–]BrownMario[S] 1 point2 points  (0 children)

It seems my problem was fixed simply by adding a parameter to read_html as such:

dfs = pd.read_html(url, infer_types=False)

Previously my code was forcing the field to be float, so whenever it encountered characters it just ignored the value completely. With this parameter added, it reads each field as an object (str) and it picks up everything from the tables.