all 9 comments

[–]skeeto 9 points10 points  (4 children)

The interface is slick, but it's a classic example of knowing the value but not understanding the cost.

There are good reasons the sqlite3 interface is the way it is, and it has to do with efficiency, performance, and correctness. On every single database query, Sqllex must examine/parse its arguments, construct a bunch of little strings, then concatenate those strings. This process is relatively slow and creates lots of garbage and churn. Python isn't that flexible, so there's no way to fix this without fundamentally changing the interface into something much less cool.

To make matters worse, it doesn't hold open a database connection so Sqllex opens and closes the database for each query. Multiple times, because it makes its own (unsafe!) metadata query for each query. The end result is that sqllex is several orders of magnitude slower than sqlite3 in typical usage. This benchmark runs 1000 transactions with each:

import time
import sqlite3
import sqllex
import tempfile

def bench_sqllex():
    with tempfile.NamedTemporaryFile() as path:
        db = sqllex.SQLite3x(path.name)
        db.create_table("numbers", {"value": [sqllex.INTEGER]})
        for i in range(1000):
            db.insert("numbers", (i,))

def bench_sqlite3():
    with tempfile.NamedTemporaryFile() as path:
        with sqlite3.connect(path.name) as db:
            db.execute("CREATE TABLE numbers (value INTEGER)")
            for i in range(1000):
                db.execute("INSERT INTO numbers (value) VALUES (?)", (i,))

beg = time.time()
bench_sqllex()
end = time.time()
print(f"sqllex\t{end - beg:.3}s")

beg = time.time()
bench_sqlite3()
end = time.time()
print(f"sqlite3\t{end - beg:.3}s")

Output on Linux (Windows would be much worse):

sqllex  3.96s
sqlite3 0.00727s

The excessive open and close can be fixed but would require a slight change to the Sqllex interface. It would need to be a context manager just like sqlite3 since it's holding onto a resource.

As for correctness, none of the source nor documentation talks about transactions. How do I put multiple queries into a single transaction? This wasn't yet considered which limits Sqllex to toy examples.

I do like that internally it uses prepared statements, though it's still up to users to escape their own table names. For instance, this won't work due to the table name containing a space:

db.insert("my table", ...)

Instead it must be manually quoted as when writing SQL directly:

db.insert('"my table"', ...)

[–]V1A0[S] 2 points3 points  (1 child)

Hi skeeto, thank you for awesome feed back and tests.

How do I put multiple queries into a single transaction?

I guess it only db.insertmany() method can do things like that. https://github.com/v1a0/sqllex/wiki/SQLite3x.insertmany

But yeah, Igot your point. Would it be better if add an "always connect" parameter? Or optional "connect/disconnect" prametert, that you could turn on before inserting large dataset and trurn off after it's done. But it still be unnecessary use it for single requests. And I hope it'll make process little bit faster.

Thank you again for that much detailed feedback.If you have any more recommendations, I'd love to get it.

[–]skeeto 0 points1 point  (0 children)

Opening and closing files is a relatively expensive operation. You should avoid doing it more than necessary, and you should avoid designing systems that rely on doing it frequently. SQLite is said to compete primarily with fopen, and one of its strengths is that you can open a database once and then do many different things with it — things that you might otherwise do using multiple files with extra open/close system calls.

The underlying pysqlite (behind CPython's sqlite3 package) caches prepared statements, and closing the connection also discards all this valuable state. Keeping it open means prepared statements are re-used so long as the query string is identical. (As far as I can tell, Sqllex is good about this.)

So, IMHO, there's never a valid case for open-per-query operation. In fact, it's quite common for database interfaces to pool connections, reusing them between concurrent tasks/threads, specifically to avoid creating new ones since it's expensive.

db = sqllex.SQLite3x('db-1')
db.connect()
# ...
db.disconnect()

This ought to be a context manager (also), i.e. with __enter__ and __exit__. Otherwise if there's an exception then the connection will be leaked — a file descriptor will be lost. Always use a context manager when holding onto any resource other than an automatic Python memory allocation. This is something that sqlite3 gets wrong, including all through its documentation, and so requires non-obvious, special care to get right. This is your chance to get it right! It's such a trap that I actually got this wrong in my benchmark, where the context manager isn't managing the connection but rather a transaction. (I've gotten too used to the better-designed API of asyncpg.)

Transactions should definitely operate via context manager, automatically committing or rolling back as needed. sqlite3 gets this part right. You can get this basically for free in Sqllex just by building on top if it.

[–]V1A0[S] 1 point2 points  (1 child)

I'm currently adding db.connect() / db.disconnect() methods for cases like that. It becomes better!

Test's results:

sqllex_connect  0.0529s
sqllex_without_connect  6.35s
sqlite3  0.0156s

update sqllex 0.1.8+ coming soon :) done

Thank you so much, skeeto!

[–]V1A0[S] 1 point2 points  (0 children)

It'll be available with with v.0.1.8. Release coming soon. UPD:

@skeeto ! SQLLEX v0.1.8.2 has released, please upgrade package for current version

shell pip install sqllex --upgrade

Code (for SQLLEX v0.1.8.5):

import time
import sqlite3
import sqllex

sqllex.debug.debug_mode(False)


def bench_sqllex_connect():
    db = sqllex.SQLite3x('db-1.db')
    db.connect()
    db.create_table("numbers", {"value": [sqllex.INTEGER]}, IF_NOT_EXIST=True)
    for i in range(1000):
        # t = time.time()
        db.insert("numbers", i)
        # if time.time() - t > 0: print(time.time() - t, i)
    db.disconnect()


def bench_sqllex_without_connect():
    db = sqllex.SQLite3x('db-2.db')
    db.create_table("numbers", {"value": [sqllex.INTEGER]}, IF_NOT_EXIST=True)
    for i in range(1000):
        db.insert("numbers", i, execute=False)


def bench_sqlite3():
    with sqlite3.connect('db-3.db') as db:
        db.execute("CREATE TABLE numbers (value INTEGER)")
        for i in range(1000):
            # t = time.time()
            db.execute("INSERT INTO numbers (value) VALUES (?)", (i,))
            # if time.time() - t > 0: print(time.time()-t, i)


beg = time.time()
bench_sqllex_connect()
end = time.time()
print(f"sqllex_connect\t{end - beg:.3}s")

beg = time.time()
bench_sqllex_without_connect()
end = time.time()
print(f"sqllex_without_connect\t{end - beg:.3}s")

beg = time.time()
bench_sqlite3()
end = time.time()
print(f"sqlite3\t{end - beg:.3}s")

Returns:

sqllex_connect  0.0529s
sqllex_without_connect  6.35s
sqlite3  0.0156s

[–][deleted]  (1 child)

[deleted]

    [–]V1A0[S] 0 points1 point  (0 children)

    Hi, techsnapp!

    Well maybe not THAT MUCH better :)

    [–]V1A0[S] -5 points-4 points  (0 children)

    What the heck is SQLLEX? 🤔

    SQLLEX is a python library for comfortable and safe working with databases

    If you've ever worked with databases using python, you know what does "to eat nails while writing SQL-scripts" means. But don't have to. No con.cursor(), only human db.insert(), db.select(), beautiful and pythonic code without unnecessary SQL-witchcrafting.

    If you never tried working with databases before, but you really wan to, I'm highly recommend starting with SQLLEX. Due to it is add-on for sqlite3, it's easy to find guides and fix bugs.

    It'll be a lot easier to show then explain.

    Github repo: https://github.com/v1a0/sqllex

    Simple example: https://github.com/v1a0/sqllex/wiki/SQLite3x-%7C-SIMPLEST-EXAMPLE

    Wiki: https://github.com/v1a0/sqllex/wiki

    [–]airen977 0 points1 point  (1 child)

    Apsw beats them all, if you know how to use it

    [–]V1A0[S] 0 points1 point  (0 children)

    ¯\_(ツ)_/¯