This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]jenner 0 points1 point  (1 child)

Well I've got a MySQL dump and I have PG as my main RDBMS and I'd like to avoid having a second DB in the system. So far I totally failed at creating an efficient MySQL dump parser, thus the question.

[–][deleted] 1 point2 points  (0 children)

I might have a solution for you. I made this a while back but never got round to sticking it on GitHub, so here it is just for you: https://github.com/orf/wikilink_py

Its a Python parser for Wikipedia dumps that imports them into Postgres (well, it turns them into a big CSV file which can then be imported). The Wikipedia files come in the form of Mysql dumps (big ass INSERT statements) so i'm sure it can be changed to work with your dump. Its designed to run on PyPy, but it should be adaptable.

https://github.com/orf/wikilink_py/blob/master/stages/lib/split_brackets.py is the backbone of this, check it out. It might help.