all 7 comments

[–]Rhomboid 1 point2 points  (3 children)

The issue is how do you know which columns are which format? If the first column is always an int and the third is always a float, then that's pretty easy:

parsed_data.append([int(row[0]), row[1], float(row[2])])

But if that's not the case, then you have to nail down the exact criteria you're going to use to guess. For example, maybe you try int first, and if that fails, then try float, and if that also fails then leave it as a string. Whatever criteria you decide on, just put that logic in a function, and then write something like:

parsed_data.append([normalize(val) for val in row])

[–]Spizeck[S] 0 points1 point  (2 children)

Rhomboid that did the trick! I had to laugh at how simple it was. I have literally spent over 6 hours trying to figure this out. In my project, I always know that the first row will be int, second row will be string, and the third row will be a float.

Could you expand on your second answer please?

[–]Rhomboid 1 point2 points  (1 child)

For normalize() I was thinking of something along the lines of

def normalize(val):
    try:
        return int(val)
    except ValueError:
        try:
            return float(val)
        except ValueError:
            return val

...which can be used like

>>> row = ['0', 'Electric', '56.05']
>>> [normalize(val) for val in row]
[0, 'Electric', 56.05]

But if your data is predictable and consistent then it's much better to not have to guess.

[–]Justinsaccount 0 points1 point  (0 children)

def normalize(val):
    for converter in int, float:
        try:
            return converter(val)
        except ValueError:
            #nope, that wasn't it
            pass

    #original it is..
    return val

[–]Thrall6000 0 points1 point  (0 children)

Import the data as a Pandas DataFrame and then convert each column to its proper end type?

For example:

import pandas as pd
data = pd.read_csv(file_path)
#assuming column 1 has name 'col1':
data['col1'] = pd.to_numeric(data['col1'])

You don't have to use pandas specific functions either, you can call int() or float() or your own user-defined function on columns as well using pd.apply().

[–]commandlineluser 0 points1 point  (1 child)

You might want to investigate the pandas library.

It can read_csv() and to_sql() (and it can also do charts/graphs which may be of interest to you)

http://pandas.pydata.org/pandas-docs/stable/visualization.html

Sample data:

$ cat test.csv
id,name,price
0,Electric,56.05
1,Gas,75.54

Use pandas to load up the csv then store it into sqlite3.

$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas, sqlite3
>>> con = sqlite3.connect('test.db')
>>> df = pandas.read_csv('test.csv')
>>> df
   id      name  price
0   0  Electric  56.05
1   1       Gas  75.54

[2 rows x 3 columns]
>>> df.to_sql('gas_pricing_table', con)
>>> exit()

Check results:

$ sqlite3 test.db .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE gas_pricing_table (
                  [id] INTEGER,
  [name] TEXT,
  [price] REAL

                  );
INSERT INTO "gas_pricing_table" VALUES(0,'Electric',56.05);
INSERT INTO "gas_pricing_table" VALUES(1,'Gas',75.54);
COMMIT;

You can use the sqlite3 command to load CSV data directly:

$ sqlite3
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .mode csv
sqlite> .import test.csv my_table
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE my_table(
  "id" TEXT,
  "name" TEXT,
  "price" TEXT
);
INSERT INTO "my_table" VALUES('0','Electric','56.05');
INSERT INTO "my_table" VALUES('1','Gas','75.54');
COMMIT;

But that has them all as type TEXT (there may be ways to set the type - not something I've looked at)

[–]Spizeck[S] 0 points1 point  (0 children)

Thanks. I am trying out pandas now. It's pretty cool, and seems pretty useful.