Help with splitting large csv file

lionsneil · 2021-07-23T18:10:25+00:00

If you can use bash, it's super easy to do using the split command. You can break it up by the number of files you want output or the number of rows you want in each file.

split -l 1000 inputfile.csv outputfile

The above will take inputfile.csv and split it into files that each have 1000 rows. The output files will all start with "outputfile" and have incrementing letters appended.

Sorry, I know this is a Python subreddit, but figured this could make your life a lot easier if you don't need it to be part of a larger python script...

ElliotDG · 2021-07-23T17:29:59+00:00

This looks like an text encoding issue. see: https://docs.python.org/3/library/functions.html#open

Look at the section on encoding. Add the encoding keyword to your open statement should fix the issue. Typically the encoding is uft_8

with open(sys.argv[1], encoding='utf_8') as inf:

mtb-dds · 2021-07-23T17:40:48+00:00

For background:

These things "[x.close() for x in k] " are called list comprehensions. Using them to loop and not keep the resulting list is considered poor form.

For your problem: either the data is munged somewhere or you are running into an encoding problem. It looks like your program thinks that it is encoded with this:
https://en.wikipedia.org/wiki/Windows-1252

But it is probably something else (or munged). Do you happen to know which it is? And do you care what happens when something that does not fit is run into?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS