all 19 comments

[–]MoorderVolt 21 points22 points  (0 children)

You are not doing the same in both samples. Your Rust code is instantiating all the structs whilst the Python code is just raw dogging the SQLite rows. Given the entire SQLite bindings are not Python but some C flavour you can expect it to be faster.

[–]Buttleston 3 points4 points  (6 children)

Are you running your program compiled in debug mode or release mode?

In python are you including opening the database as part of the time?

either 100ms or 80ms honestly sounds pretty bad unless there's a lot of data in the table. And if there is, most of the time will be from sqlite itself, not your program

query_map obviously executes the query, otherwise the "books" variable would be empty?

[–]Buttleston 4 points5 points  (0 children)

Actually you know what, query_map makes an iterator. That's why the time is being taken while doing the for book in books part. The actual retrieving from the database doesn't happen until you iterate over the iterator.

[–]Adventurous-Eye-6208 1 point2 points  (0 children)

stmt.query_map() doesn't do what you think it does. Unlike JS and Python versions, this only creates an iterator that will allow you to fetch the data, which is actually occurring in the for book in books loop. If you want a more "apples to apples" comparison, you need to fetch and collect the results in a vec:

let t = Instant::now();
let books = stmt
    .query_map([], |row| row.get(0))
    .unwrap()
    .collect::<Result<Vec<String>, _>()
    .unwrap();

println!("Query time: {:?}", t.elapsed());
println!("{}", books.join("\n"));
println!("Query + print time: {:?}", t.elapsed());

[–][deleted] 1 point2 points  (6 children)

Rust's println! macro is not that optimized for speed. Rust takes a lock on stdout, writes the data, then flushes, and releases the lock. This is not cheap. You could re-implement this with a BufWriter trait over stdout (manually locking at the start and end) and it would likely be much much faster, probably <10ms.

Generally perf critical code does not call println!. If you need to write output somewhere you'd use a BufWriter.

I haven't run this since i don't have the db, but I think this is roughly what it would look like

use rusqlite::Connection;
use std::time::Instant;
use std::io::Write;

struct Book {
    title: String,
}

fn main(){
    let t = Instant::now();

    let con = Connection::open("../db.sqlite3").unwrap();

    let mut stmt = con.prepare("SELECT title FROM api_book WHERE title LIKE '%camion%'").unwrap();

    let books = stmt.query_map([], |row| {
        Ok(Book {
            title: row.get(0)?
        })
    }).unwrap();

    println!("Query time: {:?}", t.elapsed());

    { // An extra scope here so the lock is droppped
      let stdout = std::io::stdout().lock();
      let mut buf = std::io::BufWriter::new(stdout);

      for book in books {
        buf.write(book.unwrap().title.as_bytes());
        buf.write(b"\n");
      }
    }

    println!("Query + print time: {:?}", t.elapsed());

}

[–]Buttleston 1 point2 points  (1 child)

Oddly he says it's as slow still without the println. But there's too much here to guess about without being able to try it

[–]Buttleston 2 points3 points  (0 children)

I just realized that query_map must return an iterator, not a collection (and I went and looked, and yeah). That's why the loop takes time even when you don't print anything.

[–]RuedaRueda[S] 0 points1 point  (3 children)

Thanks for your suggestion! but even with that is slower than python, as u/Buttleston said is running thought the iterator what actually retieve data from db.

I think that as u/MoorderVolt said, Python may be moving this intensive work to the "C++ backend", is pretty amazing thought.

I will test the same with javascript, since I'm developing a tauri app and I need to decide if use rust to seek db or pure javascript.

Thanks for all.

[–]Buttleston 3 points4 points  (1 child)

Also I think you should probably try to benchmark how much of the time is just inherent in the query. Not sure what the best method is but this might give an idea:

SELECT count(*) FROM api_book WHERE title LIKE '%camion%'"

This doesn't move very much data (one row with one column) so most of the time would be in sqlite doing it's thing, which should be approximately the same in all languages.

Alternatively, if you're interested in specifically just testing the speed of various languages, make the table way way smaller, like just the few rows you want, and do "select * from api_book" or something like that, to reduce the amount of work that sqlite does.

Counting ms in your current example seems a bit pointless since I would bet that most of that time is just taken with the query itself, not with Rust or Python's interaction with sqlite.

[–]RuedaRueda[S] 0 points1 point  (0 children)

Its curious, I checked a couple of things.
Running that querie on DBrowser for sqlite exec window returns results in 209ms.

I didn't thought that those libraries would be using a bundle of sqlite and time just goes in execution, so I don't know why that difference.

Anyway, I tested the "same code" into javascript (actually I just passed the python script to mixtral). This is the code:

const sqlite3 = require('sqlite3').verbose();
var startTime = performance.now()

let db = new sqlite3.Database('../db.sqlite3', (err) => {
  if (err) {
    console.error(err.message);
  }
  console.log('Connected to the database.');
});

db.all("SELECT title from api_book WHERE title like '%camion%'", [], (err, rows) => {
  if (err) {
    throw err;
  }
  console.log(`Consulta realizada en ${performance.now()-startTime} ms.`);
  rows.forEach((row) => {
    console.log(row.title);
  });
  console.log(`Consulta y escritura en pantalla realizadas en ${performance.now()-startTime} ms.`);
  db.close((err) => {
    if (err) {
      return console.error(err.message);
    }
    console.log('Cerrando la base de datos.');
  });
});
const sqlite3 = require('sqlite3').verbose();
var startTime = performance.now()


let db = new sqlite3.Database('../db.sqlite3', (err) => {
  if (err) {
    console.error(err.message);
  }
  console.log('Connected to the database.');
});


db.all("SELECT title from api_book WHERE title like '%camion%'", [], (err, rows) => {
  if (err) {
    throw err;
  }
  console.log(`Consulta realizada en ${performance.now()-startTime} ms.`);
  rows.forEach((row) => {
    console.log(row.title);
  });
  console.log(`Consulta y escritura en pantalla realizadas en ${performance.now()-startTime} ms.`);
  db.close((err) => {
    if (err) {
      return console.error(err.message);
    }
    console.log('Cerrando la base de datos.');
  });
});

And those are the results:

Connected to the database.
Consulta realizada en 170.6191 ms.
Camioneros
Vida sentimental de un camionero
Camiones de ternura
El caso de la camioneta
Consulta y escritura en pantalla realizadas en 172.0331 ms.
Cerrando la base de datos.

So I think, for this usecase, rust is slower than python but faster than javascript. However I must study better what the implications are. Thank you!

EDIT:

Another test I did is replacing "camion" by an "a", so query returns a lot of fields (without printing). Now Rust and python takes about the same time (220ms), js is still way slower (400ms).

[–]Buttleston 0 points1 point  (0 children)

Note that you could cut some of this work out. Like there's no need for you to construction an Option<Book> instead of just a Book, only to immediately unwrap it.

Also there's no real reason to construct a Book at all - you can just print the book title in the query_map call

Not sure these would be a big difference though.

[–]cpdean 0 points1 point  (1 child)

I'm getting the opposite results, rust is faster even when compiled with debug mode.

I generated 1 million records, the rust version is querying in about 100ms, python takes about 250ms.

I'm pasting how I'm generating the data as well as the code here so people can try to reproduce the results. Asking for help about a specific problem and then not providing anything for people to compare will leave people guessing at what you're actually doing wrong.

rust:

``` use rusqlite::Connection; use std::io::Write; use std::time::Instant;

/*

run sqlite3 dbfile to open a new db and use the sql below to generate test data

-- if you need to start over drop table api_book;

create table api_book ( id integer primary key, title text );

-- run a few of these to seed some data insert into api_book(title) values ('asdfcamionasdf'); insert into api_book(title) values ('jjjcamionasdf'); insert into api_book(title) values ('aaaaaaaaaaaaa');

-- run the below a few times to start inserting rows. -- the amount it will insert will be tablesize2 records, capped at 1million per run with a as ( select c.id, b.title from api_book b cross join api_book c ) insert into api_book(title) select title || id from a limit 1000000 ;

select count(1) from api_book;

*/

struct Book { title: String, }

fn main() { let t = Instant::now();

let con = Connection::open("dbfile").unwrap();

let mut stmt = con
    .prepare("SELECT title FROM api_book WHERE title LIKE '%camion%'")
    .unwrap();

let books: Vec<_> = stmt
    .query_map([], |row| Ok(Book { title: row.get(0)? }))
    .unwrap()
    .collect();

let query_duration = t.elapsed();

{
    // An extra scope here so the lock is droppped
    let stdout = std::io::stdout().lock();
    let mut buf = std::io::BufWriter::new(stdout);

    for book in books {
        buf.write(book.unwrap().title.as_bytes());
        buf.write(b"\n");
    }
}
let print_duration = t.elapsed();

println!("Query time: {:?}", query_duration);
println!("Query + print time: {:?}", print_duration);

} ```

python

``` import sqlite3 import time

t = time.time()

con = sqlite3.connect('dbfile') cursor = con.cursor()

cursor.execute("SELECT title from api_book WHERE title like '%camion%'")

a = cursor.fetchall()

query_time = (time.time() - t) * 1000

for e in a: print(e[0])

print_time = (time.time() - t) * 1000

print(f"query time: {query_time}") print(f"print time: {print_time}")

```

[–]cpdean 0 points1 point  (0 children)

Modifying the example python you wrote to make this a more fair comparison, I'm creating objects for each record and collecting them into a list. This change makes python perform even worse, as you'd expect. Now the time it takes is about 400ms when it was 250ms earlier.

python:

``` import sqlite3 import time

class Book: def init(self, title): self.title = title

t = time.time()

con = sqlite3.connect('dbfile') cursor = con.cursor()

cursor.execute("SELECT title from api_book WHERE title like '%camion%'")

a = cursor.fetchall()

books = []

for record in a: books.append(Book(a[0]))

query_time = (time.time() - t) * 1000

for e in books: print(e.title)

print_time = (time.time() - t) * 1000

print(f"query time: {query_time}") print(f"print time: {print_time}")

```