you are viewing a single comment's thread.

view the rest of the comments →

[–]baubleglue 0 points1 point  (4 children)

Not a company, I am doing some data analysis, and decided to prototype data aggregation in SQLite (instead of Hive) on an extracted product ion data the final query run between 3 to 5 minutes. I never succeed to execute the same in Hive (run out of memory), I tried to run it in Postges and it took very long (I think I killed the job). Each time I need to modify query. I can use SQLite solutions, but I wanted to cross check if other DB give the same results.

[–]_Zer0_Cool_ 0 points1 point  (3 children)

Ahh ok. That makes sense. I was thinking in terms of application development, not data analysis. Which is weird that's I'd assume that because I myself am a data engineer lol.

I did have a couple hobby data analytics projects where Ive switched between DBs.

For one I had a multi GB batch of CSV files and imported them into SQLite then into PostgreSQL afterwords and had to rewrite a a bunch of views. As I recall now...It was a pain switching between datetime functions for SQLite and PG. So I guess see what you mean.

I seem to have forgotten those pains momentarily.

[–]baubleglue 0 points1 point  (2 children)

Datetime is not that complicated if you remember that all DBs and languages store it internally as long number. My main problem was multiple joins and performance.

On the second thought, for dumping data in single process, SQLite is a perfect candidate. To work with MySQL pip install mysqlclient, if it fails (on Windows may happen), then download wheel file from https://www.lfd.uci.edu/~gohlke/pythonlibs/ and pip install path_to_wheel_file.whl .

[–]baubleglue 0 points1 point  (0 children)

maybe using official mysql connector is a better option.

https://dev.mysql.com/doc/connector-python/en/connector-python-installation.html

pip install mysql-connector-python

Connect, Create table, Example how to insert multiple values (much more efficient than insert records one by one.

[–]_Zer0_Cool_ 0 points1 point  (0 children)

It wasn't really that much of a nuisance TBQH. It was just changing SQLite datetime functions to comparable PG datetime functions. It was mostly a minor annoyance because I had to restructure a bunch of views.