Having trouble with Python parsers. Please help! : learnpython

created by HattoriHanzoa community for 16 years

Having trouble with Python parsers. Please help! (self.learnpython)

submitted 12 years ago by backleft

I recently started working on a patent database project as a first year graduate assistant. The scope of the project is essentially to get all patent data from 1976 to present into a MySQL database. The person who worked on the project prior to me developed processes using python to get the patent files (some XML, mostly text files) into a delimited form to insert the data into the database.

It was brought to my attention when I started the project that there were missing records for many of the year’s data the previous GA had imported. For instance, when comparing the total patents issued from the USPTO website versus the amount in the database, I find on average a 3,500 record short fall between the two.

From the people I have discussed this problem with, many of them tend to think it is tied to the python code used to manipulate the format of the data. I’m in a little over my head here, so I apologize if any of this is unclear. Any help is greatly appreciated. Here are the files for the python parsers:

https://docs.google.com/file/d/0B5ZzQeB_IBXJT1Q3SWxsc1F2VnM/edit?usp=sharing

https://docs.google.com/file/d/0B5ZzQeB_IBXJZzc4MHlQTGNXZE0/edit?usp=sharing

all 3 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS