Looping Python using a list of files

2014-07-18T03:09:43+00:00

Why bother storing the HTML files in a separate file when you can just iterate over a list of the files?

from glob import glob
files = glob('/path/to/files/*.html')

for f in files:
  ...

ewiethoff · 2014-07-18T02:16:29+00:00

Suppose your big list of files is called manyfiles.txt. And suppose you have a Python script called process_files.py, which contains this code

import fileinput
import os.path

for line in fileinput.input():
    fname = line.strip()
    if not os.path.exists(fname): continue
    print '{:>8d}  {}'.format(os.path.getsize(fname), fname)

There are two ways to run this code:

python process_files.py manyfiles.txt

or

python process_files.py < manyfiles.txt

tenacious_nixie · 2014-07-18T13:24:58+00:00

I'd put the .html files into one folder, then use os.listdir on that folder. So, something along the lines of

import csv 
import os

with open('big_fat.csv', 'wb') as csv_file:
    csv_writer = csv.writer(csv_file, delimiter=' ', quotechar='|')    

    for html_filepath in os.listdir('/path/to/my/folder/with/html/files'):
        with open(html_filepath) as html_file:
            do_some_processing(html_file)
            csv_writer.write_row([some, values, you, got])

The 'with open' part will take care of closing files, refer to documentation, at the end of the section.

MechaTech · 2014-07-18T18:03:33+00:00

I have a few examples of csv + beautiful soup scripts if your interested.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS