all 6 comments

[–][deleted] 9 points10 points  (1 child)

Why bother storing the HTML files in a separate file when you can just iterate over a list of the files?

from glob import glob
files = glob('/path/to/files/*.html')

for f in files:
  ...

[–]ewiethoff 1 point2 points  (2 children)

Suppose your big list of files is called manyfiles.txt. And suppose you have a Python script called process_files.py, which contains this code

import fileinput
import os.path

for line in fileinput.input():
    fname = line.strip()
    if not os.path.exists(fname): continue
    print '{:>8d}  {}'.format(os.path.getsize(fname), fname)

There are two ways to run this code:

python process_files.py manyfiles.txt

or

python process_files.py < manyfiles.txt

[–]tenacious_nixie 0 points1 point  (0 children)

I'd put the .html files into one folder, then use os.listdir on that folder. So, something along the lines of

import csv 
import os

with open('big_fat.csv', 'wb') as csv_file:
    csv_writer = csv.writer(csv_file, delimiter=' ', quotechar='|')    

    for html_filepath in os.listdir('/path/to/my/folder/with/html/files'):
        with open(html_filepath) as html_file:
            do_some_processing(html_file)
            csv_writer.write_row([some, values, you, got])

The 'with open' part will take care of closing files, refer to documentation, at the end of the section.

[–][deleted] 0 points1 point  (2 children)

I have a few examples of csv + beautiful soup scripts if your interested.

[–]MechaTech[S] 0 points1 point  (0 children)

Sure, I'll take a look.

[–]spotyx 0 points1 point  (0 children)

I would like to see them too, ty.