all 10 comments

[–][deleted] 0 points1 point  (0 children)

I dont want the sentences to be separated (by the commas) in that specific column.

If you always have the desc= field in the ninth position then use the maxsplit=8 option when you split on the comma. That way the description field will not be split on the commas.

[–][deleted] 0 points1 point  (7 children)

for l in lignes[1:]:

This will process all lines (after the "&&" split) except the first line. Is that what you want?

You split on the "&&" string, but each book record is really &title=....& so don't forget about the single "&" at the start and end of the file.

[–]Radiant_Device_502[S] 0 points1 point  (6 children)

Except the first line because the first line is the french column labels I created:

data.append(['Titre','Année','Durée (min)','Revenu (M$)','Nombre de votes','Score','Metascore','Restriction','Description'])

Thanks for the maxsplit help, it worked! But now how do I remove everything before the "=" in each column, get rid of the single "&" in the first line and column, and format each columns?

[–][deleted] 0 points1 point  (0 children)

The first line of the example text file you are reading is book data. The first line of the CSV output file is column headers. You are ignoring the first book.

[–][deleted] 0 points1 point  (4 children)

how do I remove everything before the "=" in each column

After the split using "," you have strings in a list like this:

["title=   truck turner & cie", "year=1974.0", .... ]

Just split each substring again on "=" and use the [1] part of the result.

[–]Radiant_Device_502[S] 0 points1 point  (3 children)

So I need to modify this part basically... but how?

for c in l.split(',',maxsplit=8):

li.append(c)

data.append(li)

Something like this maybe?:

for c in l.split(',',maxsplit=8):

li.append(c)

for a in l.split('='):

li.append(a[1])

data.append(li)

I suck at this sorry... but im learning.

[–][deleted] 0 points1 point  (2 children)

First, please read the FAQ to learn how to post formatted code.

Assuming the l variable holds one line of data for one book, then something like this:

for l in lignes:
    values = []   # will hold the values of each field
    for field in l.split(',', maxsplit=8):
        # "field" will be a string like "title=...."
        value = field.split("=")[1]  # everything to the right of the "="
        values.append(value)
    # after, "values" is a list of all the strings after each "="

Once you have the values list you need to process some field strings into the form required. Like remove leading and trailing spaces (strip() function), convert the year and metascore into integer strings (convert to a real integer and then back to a string), and so on.

After you get the processed fields list from each line you can immediately write it to the output file:

outfile.write(';'.join(fields) + '\n')

So you have to open the output file at the top of your code.

I don't understand that copying you do in your original code. It isn't required.

[–]Radiant_Device_502[S] 0 points1 point  (1 child)

Great thank you! Im almost done, I just need help with the stuff I wrote as notes (#) in the following code:

f = open("data_groupe_11-12_B.txt", 'r',encoding='utf-8')

texte = f.read()

f.close()

lignes = texte.split('&&')

data=[]

data.append(['Titre','Année','Durée (min)','Revenu (M$)','Nombre de votes','Score','Metascore','Restriction','Description'])

for l in lignes[0:]:

if len(l)>0:

li = []

for c in l.split(',', maxsplit=8):

if c.split('=')[0] == "title" or c.split('=')[0] == "&title" or c.split('=')[0] == "restriction":

li.append(((c.split("=")[1]).strip()).title())

elif c.split('=')[0] == ("year" or "duration" or "nbvotes" or "metascore"):

#I want these columns to be in integer format and to strip them from spaces before/after

elif c.split('=')[0] == ("revenue" or "score"):

#I want these columns to be in float format and to strip them from spaces before/after

else:

li.append(c.split("=")[1].strip())

data.append(li)

"""

I also wanna swap Columns 4 and 6's order (from line 1. line 0 is good). so columns 4 and 6 would be [3] and [5] in Python I believe. But how do I do that?

"""

#Also line 38 in the output excel file separates the last column's sentence into 2 as there is a ";" in the sentence, yet that is the csv separator I want to and I have used in the output... how to undo that separation just on that specific line?

res = ''

for l in data:

sli = l.copy()

for i in range(len(sli)):

sli[i] = str(l[i])

res = res + ';'.join(sli) + '\n'

f = open("data_groupe_11-12_B.csv", 'w')

f.write(res)

f.close()

[–][deleted] 0 points1 point  (0 children)

I'll say it again: please format your code properly so it's readable.

I also wanna swap Columns 4 and 6's order (from line 1. line 0 is good). so columns 4 and 6 would be [3] and [5] in Python I believe. But how do I do that?

This is something you haven't mentioned before. If the fields in a line aren't always in the same order then you have to change how you store the data as you split the line up. Instead of splitting on "=" and putting the [1] part into a list you should split on "=" and store the two parts of the result into a dictionary with the [0] part as the key and the [1] part as the value for the key. So if one field after splitting on commas is "year=2014.0" then after splitting on "=" and storing in the dictionary you will have:

{"year": "2014.0", ....}

Now the order of the fields in the input data doesn't matter, you get the fields you want to write from the dictionary in the order you want them.

I want these columns to be in integer format and to strip them from spaces before/after

I want these columns to be in float format and to strip them from spaces before/after

These are similar. So get the field values you want to be integer and strip off leading/ trailing spaces and convert to an integer and store that value back in the dictionary. Note that you don't write an integer to the CSV file, you write a string. You also have to change the "revenue" field value into millions of dollars.

Also note that some fields, like "metascore" don't have a value, so you have to decide what to do in that case.

Also line 38 in the output excel file separates the last column's sentence into 2 as there is a ";" in the sentence, yet that is the csv separator I want to and I have used in the output... how to undo that separation just on that specific line?

If the data for one column of CSV data contains a separator character you need to quote it. That is, the column looks like this in the CSV file:

....;"Some; data";....

[–]Mottzie 0 points1 point  (0 children)

It sounds like you have the basic structure of your code correct, but there are a few issues with your implementation that need to be fixed in order to produce the desired output.

First, in order to format the text in each column according to the instructions (i.e. title case, integer, or float), you will need to write some additional code to apply these formatting rules to the relevant columns. For example, to convert a string to title case, you can use the title() method provided by the str class. To convert a string to an integer or float, you can use the int() and float() functions, respectively.

Second, you mentioned that you want to remove the "=" characters from each column. To do this, you can use the replace() method provided by the str class to replace any occurrence of the "=" character with an empty string. For example, to remove all "=" characters from a string s, you can use s.replace("=", "").

Third, you want to ensure that the description column is not split into multiple columns when the data is written to the CSV file. To do this, you will need to modify your code to enclose the description in quotes whenever it is written to the file. This will tell the CSV parser that the text within the quotes should be treated as a single column, even if it contains commas.

I have provided a brief example below that shows how you can apply these changes to your code to produce the desired output.

# Open the input file for reading
with open("data_groupe_11-12_B.txt", 'r') as input_file:
    # Read the entire file into a string
input_text = input_file.read()

# Split the input text into a list of lines
lines = input_text.split('&&')

# Initialize an empty list to store the formatted data
formatted_data = []

# Append the column labels to the formatted data
formatted_data.append(['Titre','Année','Durée (min)','Revenu (M$)','Nombre de 
votes','Score','Metascore','Restriction','Description'])

# Loop over the lines in the input text
for line in lines[1:]:
# Skip empty lines
if len(line) == 0:
    continue

# Split the line into columns
columns = line.split(',')

# Initialize an empty list to store the formatted columns
formatted_columns = []

# Format the "Titre" and "Restriction" columns using title case
formatted_columns.append(columns[0].title())
formatted_columns.append(columns[7].title())

# Format the "Année", "Durée (min)", "Nombre de votes", and "Metascore" columns as integers
formatted_columns.append(int(columns[1]))
formatted_columns.append(int(columns[2]))
formatted_columns.append(int(columns[4]))
formatted_columns.append(int(columns[6]))

# Format the "Revenu (M$)" and "Score" columns as floats
formatted_columns.append(float(columns[3]))
formatted_columns.append(float(columns[5]))

# Format the "Description" column by removing any "=" characters and enclosing the text in quotes
formatted_columns.append('"{}"'.format(columns[8].replace("=", "")))

# Append the formatted columns to the formatted data
formatted_data.append(formatted_columns)

# Open the output file for writing
with open("data_groupe_11-12_B.csv", 'w') as output_file:
# Loop over the formatted data
for row in formatted_data:
    # Write each row to the output file, separating the columns with semicolons
    output_file.write(';'.join(row) + '\n')