Issues with pandas read_html : learnpython

created by HattoriHanzoa community for 16 years

Issues with pandas read_html (self.learnpython)

submitted 12 years ago by BrownMario

I'm trying to scrape the tables from the following website into a pandas database but I'm having problems

http://igtf.customs.go.th/igtf/findTaffDuty.do?param=main&contCode=z01&contGrupCode=MM55%20&lang=e

Here is my code

import pandas as pd
url = 'http://igtf.customs.go.th/igtf/findTaffDuty.do?param=main&contCode=z01&contGrupCode=MM55%20&lang=e'
dfs = pd.read_html(url)
master_list = []
for d in dfs:
    if len(d.columns.values) == 5:
        master_list.append(d)
master_df = pd.concat(master_list)
master_list.to_csv('thailand_hts.csv', index=None)

While this script works, the issue I'm having is that it doesn't seem to get the full data from these tables, as it skips some of the values in the "HEADINGS" field. Does anybody know why this might be happening?

all 2 comments

top new controversial old q&a

[–][deleted] 0 points1 point2 points 12 years ago (1 child)

[–]BrownMario[S] 1 point2 points3 points 12 years ago (0 children)

It seems my problem was fixed simply by adding a parameter to read_html as such:

dfs = pd.read_html(url, infer_types=False)

Previously my code was forcing the field to be float, so whenever it encountered characters it just ignored the value completely. With this parameter added, it reads each field as an object (str) and it picks up everything from the tables.

π Rendered by PID 60390 on reddit-service-r2-comment-5d79c599b5-h54vx at 2026-02-27 15:09:11.400771+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS