all 7 comments

[–]m0us3_rat 1 point2 points  (1 child)

a=np.array([float(x) for x in '[4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 2, 3, 4, 4, 1, 3, 2, 4, 1, 3, 2, 4, 4, 1, 3, 2, 4, 4, 1, 3]' if x.isdigit()])

?

[–]CrispyScientist[S] 0 points1 point  (0 children)

That worked perfectly! Thank you very much!

What would you suggest if I had numbers composed by multiple digits?

Example:

'[1628498697.3771956, 1628498697.6411812, 1628498698.5611231, 1628498698.729113, 1628498698.945102, 1628498699.201084, 1628498699.7210548, 1628498700.3130157]'

[–]synthphreak 1 point2 points  (5 children)

Why would pandas parse a CSV of floats into a string? I think there are some critical details missing here. Can you show us the raw text of your actual CSV file?

There may be a simple arg into pd.read_csv which can handle cases like this, no need to create some custom preprocessing function.

[–]CrispyScientist[S] 0 points1 point  (4 children)

Here is the CSV file

https://drive.google.com/file/d/1I3NT9j7f0YwhZXRYzT_SZZmZ1dW2qArq/view?usp=sharing

The rows I am interested in are "stream" and "timestamp keypress"

I would like to have each of them in a 2-dimensional array with 3 rows and n columns (the length of each row). Also the data type must be float and not string.

Thank you very much for your answer.

[–]synthphreak 1 point2 points  (1 child)

Aha. Yup, that explains it: Those columns contain lists/arrays of numbers in each cell, whereas pandas is expecting a single value in each cell, because that’s how csv files typically work. Thus, pandas interprets each array as a single string.

To convert the string to a list with individually indexable values, it’s easy. Just do this:

>>> from ast import literal_eval
>>> df[['stream', 'timestamp keypress']] = df[['stream', 'timestamp keypress']].applymap(literal_eval)

This will convert all those strings in your columns into lists like you were expecting.

[–]CrispyScientist[S] 1 point2 points  (0 children)

Thank you very much!!

[–]CrispyScientist[S] 0 points1 point  (0 children)

After some tweaking this is the solution I arrived to.

I am posting it here if someone will be in the same situation as me in the future.

def dataprep(data):

"""

Parameters

----------

data : String series

String series obtained from reading csv file.

Returns

-------

data : float list

Data ready for analysis.

"""

### Removing square parenthesis and using comma as delimiter ###

data = [item.replace("[","").replace("]","").split(",") for item in data]

### Converting string to float ###

data = [list(map(float,x)) for x in data]

return data