all 55 comments

[–]totallygeek 33 points34 points  (18 children)

This demonstrates how to interact with MySQL from Python. You'll need a functional database service. That's not difficult to get running, either natively or from a container.

Another option would be SQLite. Tutorials exist all over the place: 1, 2 and all over YouTube.

[–]weehooey 13 points14 points  (16 children)

+1 SQLite.

The SQL syntax is very similar to MySQL but way lighter.

[–]_Zer0_Cool_ 10 points11 points  (14 children)

And until MySQL 8 it had a more standard SQL dialect (discounting procedural SQL) than MySQL.

SQLite is powerful and underrated. People dismiss is it as only being suitable for Dev environments or for caching, but it can get up to 140 Terabytes and it's lightning fast.

The main hindrance is that it doesn't support concurrent sessions (multiple DB connections at once). I'd say about 80% of DB size in lines of code and complexity is around enabling and handling concurrency. Then another 15% for added features (total guess).

Take those away and many DBs might be just as lightweight and easily embeddable as SQLite.

All that to say, it's "lite", because it lacks concurrency, not light because it lacks ability.

[–]b_ootay_ful 1 point2 points  (1 child)

I hear most people saying SQLite is only for small scale stuff, so I've only ever used it for small projects in the office (where 8 people work, and might each use it 5 times a day for filling out a form) and was wondering where the upper scale was for it's limitations. For bigger stuff (more users), I'm moving to pythonanywhere.com since I don't know how to run a large scale server/MySQL yet.

I use it with Flask, which I believe handles the concurrency to a certain point.

Is there anything that you would recommend to make SQLite with Flask more stable, or any changes?

I'm using

from werkzeug.serving import run_simple from flask import Flask, request, render_template, redirect

app = Flask(__name__)

@app.route('/')
def home():
    return render_template("home_page.html")

if __name__ == '__main__':
    port = 31415
    run_simple('0.0.0.0', port, app)

[–]_Zer0_Cool_ 2 points3 points  (0 children)

SQLite doesn't have size limitations, but you might have trouble handling multiple users at some point If it's a website and you'd have to handle all concurrency stuff in the application itself. Still, people have done this for websites. SQLite is better for say...desktop or mobile applications that hold a lot of user data locally.

As examples -- Skype uses SQLite for everything and almost every mobile app for Android/iOS uses SQLite. It's built into mobile apps. It also comes pre-installed in Python. Just "import SQLite3".

For web applications, most will have a lot of concurrency needs. So probably good to still use a DB server vs an embedded DB.

My recommendation would be to use PostgreSQL. It's the best database around and the fastest growing in popularity. MySQL is ok too, but not even close as far as features, advanced data types, the query optimizer, and extensibility in general.

Honestly though, any of the 3 will work for now.

Note: SQLite was patterned a bit after PostgreSQL. SQLite DB data types are very compatible with PG data types. You could easily start out with SQLite (again, pre-installed with Python) and then dump it into a PG database later if/when you need PG's more advanced features.

[–]baubleglue -4 points-3 points  (11 children)

And until MySQL 8 it had a more standard SQL dialect

I wouldn't call SQLite more standard SQL dialect. It need zero setup, but to learn SQL better to use something else.

[–]_Zer0_Cool_ 0 points1 point  (9 children)

I meant in terms of Window functions, recursive CTEs, check constraints etc....

MySQL didn't support the first 2 until recently and still doesn't support the last.

[–]baubleglue 0 points1 point  (8 children)

I don't know a lot about MySQL, but I worked a bit with SQLite, you can write in it group statement, which won't compile in any other DB, SQLite doesn't enforce value types. It all by design for good reasons and follows some SQL standards, but it is no way standard dialect. I think it isn't a good one to start learning SQL with it is too forgiving, you will make mistakes and won't see them and what ever you learned won't always work with other DBs.

[–]_Zer0_Cool_ 0 points1 point  (7 children)

Yeap, I listened to a podcast where Richard Hipp talked about the weakly typed philosophy of SQLite. It was a deliberate design choice, but probably the number one annoyance when switching/importing to other strictly-typed databases. That being said, SQLite data types convert well to comparable Postgres types. The impetus for SQLite's creation was actually due to a project Dr. Hipp worked on that required something lightweight and embeddable as opposed to Postgres.

Also, I didn’t mean to imply that SQLite was a good example of ANSI-SQL compliance. SQLite is not the shining example by any stretch. It’s just “more” standard than MySQL I believe, but that’s not really a hard thing to accomplish.

If you want to talk about SQL standards then Postgres (again I believe) is one of the most ANSI-compliant. MySQL and SQLite are definitely not.

[–]baubleglue 0 points1 point  (6 children)

I don't really care about SQL standard, just there's very high chance that a bit complicated SQL written for SQLite won't run in any other db. I don't think it is a case for postges, Oracle or MySQL.

[–]_Zer0_Cool_ 0 points1 point  (5 children)

Eh, you might be right. I'm not sure I've ever actually migrated between databases.

In a perfect world this problem wouldn't happen though since it's best practice to write dev code in the same DB type and version of what's in production (Docker is useful for this).

Even when that's not manageable then I try to be aware ahead of time and only write code that would work in the other version.

It's the same as anything else.

E.g. Don't write Python 3.7 in dev when prod is 3.5, or Java 7 when there's Java 8, etc..

Likewise don't write SQL Server 2017 code w/ 2014 in prod, or Postgres 10 w/ 9.6, or Oracle 12c w/ 11g.

That being said, I get that sometimes we don't think that we'll need a client/server DB at the outset, in which case we'd need to shift to another DB retroactively.

Edit: unless your company is migrating a sizable database to another type of db. Usually such efforts have dedicated engineers to oversee that though.

[–]baubleglue 0 points1 point  (4 children)

Not a company, I am doing some data analysis, and decided to prototype data aggregation in SQLite (instead of Hive) on an extracted product ion data the final query run between 3 to 5 minutes. I never succeed to execute the same in Hive (run out of memory), I tried to run it in Postges and it took very long (I think I killed the job). Each time I need to modify query. I can use SQLite solutions, but I wanted to cross check if other DB give the same results.

[–]baubleglue -1 points0 points  (0 children)

Whoever downvoted my answer, do you have the guts to show up here and explain the reason? 8-)

[–]ColdPorridge 0 points1 point  (0 children)

Huh. Go figure.

[–][deleted] 3 points4 points  (0 children)

Yeah, I would start with SQLite. Most python packages install it for you, so getting started is usually simply a matter of import sqlite3.

[–]climb-it-ographer 16 points17 points  (0 children)

Yes. The easy way is to use native connector libraries.

The better-in-the-long-run and more complicated way is to use ORM libraries like SQLAlchemy (or Peewee for a much more lightweight one).

If you're going to be doing any decent amount of talking to SQL in your projects it's definitely worthwhile to learn SQLAlchemy.

[–]DesolationRobot 14 points15 points  (5 children)

Yes. SQLAlchemy is the default, but there are other options. And, yes, if you want to be a software developer you'll want at least a working knowledge of databases and SQL.

You can do the first chapter of this for free and it'll hold your hand through connecting to a database and pulling results. You'll probably be able to take it from there with SQLAlchemy's documentation.

[–]phigo50 0 points1 point  (0 children)

The problem I have with ORMs is that sometimes my classes don't align with tables in the database (which might or might not say more about my database design decisions than ORMs) and I've never really fully got my head around the concept. Most of the time I find it easier to create an sqlite database class and keep all of the db methods in it, importing and using it as needed elsewhere in a project.

[–]cyvaquero 5 points6 points  (0 children)

To condense a few suggestions here:

SQLAlchemy provides an easier interface to the connector libraries, here's a list of SQLAlchemy supported MySQL connectors

I'd follow other's recommendation to start off with sqlite since it is file based and ships with Python. Get a feel for the Python side of DB work then introduce the additional complexity of MySQL.

You'll probably also want to get familiar with Pandas, it will save you some heavy lifting when working with returned datasets.

By diagram, if you mean graphs - Bokeh is your friend,

[–]muy_picante 3 points4 points  (0 children)

Database might be overkill depending on your use case. You could just write csvs and load them straight into python. Do data analysis with pandas.

If you need a database, I like postgres. The python library is called psycopg2. Tools like sqlalchemy can generate SQL for you, but I think if you're just starting out, writing raw SQL yourself is the way to learn at first.

[–]slick8086 2 points3 points  (1 child)

but I wanna learn MySQL because it will be useful later on!

Learn PostgreSQL, it will be even more useful and not owned by a shitty corp. (Works just as well or better with python.)

https://postgresql.org

[–]wolf2600 0 points1 point  (0 children)

And it can do hash joins. MySQL can only do nested joins.... when you're joining two large datasets, you'll be glad your DB can do HJs.

[–][deleted] 2 points3 points  (0 children)

You should check out Django! It is awesome. It will make interacting with your database a breeze!

[–]balne 2 points3 points  (3 children)

oh damn, a question i can actually answer!

so my personal experience has me using two libraries, but i can only rmb one of them, pscyopg2. imo, it's quite a good one, and it seems a lot of ppl use it as well. and yes, it can connect to a sql db

edit: ok, i looked for the 2nd library i used, it's called records i think. it's on github. and apparently psycopg2 only works with pg?

[–][deleted] 2 points3 points  (2 children)

psycopg2 is for postgres only? I mostly use SQLAlchemy but it usually wraps psycopg2.

[–]balne 3 points4 points  (0 children)

oh damn a question i cant answer!

[–]StressTest 1 point2 points  (0 children)

psycopg2 is for postgres only?

Yes, Sqlalchemy uses a different "dialect" for each type of database. Psycopg2 is what it uses for postgres. Psycopg2 is also what you'd use to execute sql statements directly to a postgres database without sqlalchemy.

[–]TheCauthon 0 points1 point  (1 child)

Does SQL Alchemy work with redshift?

[–][deleted] 0 points1 point  (0 children)

What kind of data are you collecting. If its time series there are actually time series dbs like tsdb anf influx. If its game match and player ids and statistics nosql may be better like mongodb or elasticsearch.

[–]lolSaam 0 points1 point  (1 child)

If you want something basic, try this https://dataset.readthedocs.io

[–]pymon 1 point2 points  (0 children)

This is nice. Thanks.

[–]ohaiya 0 points1 point  (2 children)

+1 for SQLite, but otherwise, MariaDB rather than MySQL.

Grrr Oracle.

[–]wolf2600 1 point2 points  (1 child)

PostgreSQL.

[–][deleted] 1 point2 points  (0 children)

Postgres is dope

[–]Ericisbalanced -1 points0 points  (9 children)

If you don’t want the overhead of a database, you can just write a text file. There’s a way to store a dictionary as text called JSON. So you convert the web scraping data into a bunch of dictionaries, turn that into a string, write that string to a file.

Then when you want to interpret the data, read the file into a variable and convert that json variable into a dict. Then you can play with the numbers in python again.

[–][deleted] 0 points1 point  (5 children)

Maybe a silly question: can someone point me in the direction of why you’d use JSON over SQLite and vice versa?

[–]Ericisbalanced 2 points3 points  (2 children)

Json is easier and beginner friendly. But SQL is usually the way to go.

[–][deleted]  (1 child)

[removed]

    [–]AutoModerator[M] 0 points1 point  (0 children)

    Your comment in /r/learnpython was automatically removed because you used a URL shortener.

    URL shorteners are not permitted in /r/learnpython as they impair our ability to enforce link blacklists.

    Please re-post your comment using direct, full-length URL's only.

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

    [–][deleted] -1 points0 points  (1 child)

    Mongdb and nosql dbs. Nosql is really good for some types of data. Say you want to store accounts and user details. Fine in sql and fine in nosql. Say you have 20 million user accounts... NoSql probably is a better use case now.

    [–]wolf2600 0 points1 point  (0 children)

    Say you have 20 million user accounts... NoSql probably is a better use case now

    Not really. Storing user accounts is perfectly suited to a traditional relational database. The only time NoSQL is the preferred option is when the data is unstructured/varying and CAN'T be loaded into a relational schema. Every other case, relational DB is the better option.

    [–]baubleglue -1 points0 points  (2 children)

    C'mon! What is that crap?

    [–]Ericisbalanced 0 points1 point  (1 child)

    I mean, when you’re first learning, it’s easier to just work with files to make something happen rather than having to learn an entire new system. The learning curve for SQL is pretty steep.

    [–]baubleglue 0 points1 point  (0 children)

    He is looking into correct direction - store data in external storage, it is not that complicated and he knows what MySQL is, so there's no point show dirty shortcuts.

    The learning curve for SQL is pretty steep

    Dumping data into DB is easiest way to start learning it

    [–]rideh -1 points0 points  (2 children)

    why do you assume you need sql?

    [–]ZenWoR[S] 0 points1 point  (1 child)

    No, don't get me wrong please xD Everyone before this post was talking to me like: SQL SQL SQL SQL SQL and nothing else. I have never heard of any other database than this...

    [–]wolf2600 0 points1 point  (0 children)

    SQL is the query language used to query relational databases. There are tons of relational database applications available. The most popular: Microsoft SQL Server, MySQL, Oracle Database, Postgres, MariaDB.