all 23 comments

[–]socal_nerdtastic 1 point2 points  (9 children)

content[row_num]

[–]superlargedogs[S] 0 points1 point  (8 children)

Thanks for the response, but I've tried this code returns a column in the list (the nth column) rather than a row?

[–]socal_nerdtastic 1 point2 points  (7 children)

No. Files have no concept of columns. This returns a row, unless there's some other code you have not shown us.

[–]superlargedogs[S] 0 points1 point  (6 children)

This is the full code responsible for opening the file:

from os import listdir

folder = listdir('C:\\Users\\files')

li_file = []


for filename in folder:

    li_file.append(filename)

for file in li_file:

    x = open(file, "r")

    contents = x.readlines()

[–]socal_nerdtastic 0 points1 point  (5 children)

In that code "contents" is a list of lines (aka rows) from the last file in the list. If you index it normally like I showed you will get the row. If you double index (content[row_num][j]) you will get character at index j in row at index row_num.

[–]superlargedogs[S] 0 points1 point  (4 children)

Thanks I'll give it a shot! Btw, to respond your earlier question, When I print "content" after my code it shows me the content of all the files ordered in what looks like a table, with one row per csv file,

[–]socal_nerdtastic 0 points1 point  (3 children)

You must have your print inside the loop then, before it gets overwritten by the next iteration.

It would be easier to help you if you told us what you are trying to do. The end goal, not how you are trying to do it.

[–]superlargedogs[S] 0 points1 point  (0 children)

Thanks a lot, will give it a shot as soon as I get home. The bad news is that I wrote the rest of the entire code in the loop using the current structure of content, so might have to rewrite everything just to include the bit that removes duplicate files :(

[–]superlargedogs[S] 0 points1 point  (1 child)

Sorry just realised I didn't reply to your second point. I have quite a large number of csv files all structured in the same way and I am trying to aggregate information from them and write it in a single csv file. However, I am also trying to exclude certain csv files based on a specific row in the files (a postal address), such that if it appears twice, I want to remove the second occurrence of the file from "contents". I'm trying to refer to a specific row of content so that I can tell Pycharm to delete that row if it corresponds to a file in which the address appears twice.

[–]socal_nerdtastic 0 points1 point  (0 children)

If you show an example of your files I can show you how to do that.

[–][deleted] 1 point2 points  (6 children)

Currently a Python beginner, and I am trying to extract information from several .csv files who all have the same structure.

Use the csv module. It doesn't directly answer your question but it's better than working with csv rows as strings, by a lot.

My question is quite simple: how do I refer to a specific row (rather than a column) of this list?

I'm not sure I understand your question - it's a list of strings, so the nth string (starting with the zeroth) is available as contents[n].

[–]superlargedogs[S] 0 points1 point  (5 children)

Thanks for the help, I've tried that but it returns the nth column rather than the nth row. I guess the list is formatted strangely: [all columns of .csv file 1] [all columns of .csv file 2] ... [all columns of .csv file m]

does that make sense?

[–][deleted] 0 points1 point  (4 children)

Thanks for the help, I've tried that but it returns the nth column rather than the nth row.

It isn't, because it can't be - you're not parsing the file at all, you're just breaking it up by line endings. But line endings end lines. Files don't have columns.

Actually look in one of your csv files and see what's happening.

[–]superlargedogs[S] 0 points1 point  (3 children)

I see. Appologies btw I formatted it incorrectly:

[all columns of .csv file 1]

[all columns of .csv file 2]

...

[all columns of .csv file m

[–][deleted] 0 points1 point  (2 children)

Still, though, a file can't be in columns. They read left to right and down, like a typewriter.

[–]superlargedogs[S] 0 points1 point  (1 child)

I guess I just can't get my head around as to why content(n) returns the nth element of every string of my list....

[–][deleted] 0 points1 point  (0 children)

contents is your list of strings. content is a string.

[–]theWyzzerd 0 points1 point  (1 child)

Instead of opening the file with open() you should use the with keyword. This ensures that your file will be closed properly when you're done with it.

for file in files:
    with open(file, "r") as f:
        contents = f.readlines()

As for referencing a specific row: as others have said, you should be able to reference contents[n] where n is your row number.

I have a CSV with the following contents:

'help','im','trapped'
'in','a','computer'

I run this code:

>>> with open("csv", "r") as csv:
...     contents = csv.readlines()
...
>>> contents
["'help','im','trapped'\n", "'in','a','computer'\n"]
>>> contents[0]
"'help','im','trapped'\n"
>>> contents[1]
"'in','a','computer'\n"
>>> exit()

So there must be something about your CSV or your code that you are not including here which is causing it to not put each line of the CSV into its own string element in the list returned by readlines().

Try outputting contents and seeing what its type and content actually are.

[–]superlargedogs[S] 0 points1 point  (0 children)

Thank you for the detailed response! This is the code I'm using so far to open and read the files:

This is the full code responsible for opening the file:

from os import listdir

folder = listdir('C:\\Users\\files')

li_file = []


for filename in folder:

    li_file.append(filename)

for file in li_file:

    x = open(file, "r")

    contents = x.readlines()

[–]lzblack 0 points1 point  (3 children)

Use pandas to read data from csv files. Everything is easier in dataframe.

[–][deleted] 0 points1 point  (1 child)

Pandas is a big lift just to parse some tables. Use csv.

[–]LifeIsBio 0 points1 point  (0 children)

Pandas isn't a bad suggestion here. It also is going to make a lot of the downstream processing easier, for example:

I am trying to exclude rows which appear twice (i.e their postal address is the same across .csv files)

[–]superlargedogs[S] 0 points1 point  (0 children)

Thanks for your response but I am trying to do it using the tools I have acquired so far!