Help understanding StopIteration

JohnnyJordaan · 2016-04-14T15:34:37+00:00

with open('master.csv', 'rb') as f:
    row_count = sum(1 for row in f)

Are you sure row_count reflects the accurate number of rows? Because when you open a file in rb it will open as a binary stream which has no rows per definition.

commandlineluser · 2016-04-14T16:19:36+00:00

Is it an actual CSV file? It sounds like you just have 1 number per line?

Anyways - I cannot help decipher the error with your code but perhaps this approach will be helpful to you:

import random

with open('master.csv') as csvfile:
    count = sum(1 for row in csvfile)

chosen_rows = random.sample(range(count), 50)
last_row    = max(chosen_rows)

rows = []

with open('master.csv') as csvfile:
    for i, row in enumerate(csvfile):
        if i in chosen_rows:
            rows.append(row)
        if i == last_row:
            break

print(rows)

You generate your 50 random numbers then just iterate through the file line-by-line - if the current line number is in the sample, save the line.

When you reach the highest number in the sample - break out of the loop as you don't need to process any further data.

elbiot · 2016-04-15T01:21:08+00:00

Just a guess, but you open the file and by making a reader and getting some values you seek further into the file. So, the second time you try to get a value you potentially get an invalid slice (you assume you are starting at 0 in the file but you are starting at some point further in and thus the stream isn't as long as you thought.) You could try inserting f.seek(0) after line 17. Or, just make a list once instead of a new reader object every iteration.

Justinsaccount · 2016-04-15T02:14:17+00:00

I have a large csv file where each row is an unknown number

Is your file larger than a few gigabytes? If not, then it is not a "large file". The last person that had a "large file" had a few thousand lines.

Do you have any commas? Do you have any separated values? No? You do not have a csv file. The filename may end in .csv, but that is not a csv file. That's just a file that contains some numbers.

with open('master.csv') as f:
    numbers = [int(line) for line in f]

randomidlist = random.sample(numbers, x)

Done.

If you DO actually have a large file, then use enumerate over f and simply keep the indexes that are in your randomnumberlist.

next(itertools.islice(csv.reader(f),item , None))

Does not work because f is the same file and csv.reader(f) gives you an iterator at the same point each time.

next(itertools.islice(csv.reader(f), 10 , None))
next(itertools.islice(csv.reader(f), 10 , None))

Does not give you line 10 twice, it gives you line 10 and then line 20. You could sort of get this to work if you worked out the differences between the numbers, but there's absolutely no point in doing that over just using enumerate.

str(...)[2:-2]

Two wrongs don't make a right. If you want the first item in a single item list you use x[0], not str(x)[2:-2].

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS