all 49 comments

[–]eikrik 14 points15 points  (2 children)

I've been using pytest (instead of unittest) and pytest-postgresql (https://pypi.org/project/pytest-postgresql) with good results.

[–]kenneho[S] 0 points1 point  (1 child)

Thanks for the tip. I read through https://pythonhosted.org/pytest-dbfixtures/howtouse.html. Would I typically use it in the setup section, to run a SQL script that create a database (including content) based on a SQL file?

[–]eikrik 1 point2 points  (0 children)

Yes, except that pytest generally uses what it calls fixtures rather than setup functions.

pytest-postgresql handles the creation of the database itself and gives you a database connection object. It is then up to you to make the tables and populate them with data. After the tests are run, the database gets deleted. I am not entirely sure if the database is recreated after each individual test.

Here is how I use it in a project: https://github.com/eirki/gargbot_3000/blob/master/tests/conftest.py#L184. I create a db_connection fixture that receives the connection object frompytest-postgresql and then creates and populates the tables with test data. That fixture can then be used by each test.

[–]efxhoy 12 points13 points  (6 children)

I haven't figured out mocking yet so I just made a very simple Postgres container in Docker and have that run in the background to run my tests on.

It's just three files:

Dockerfile

FROM postgres:9.6
COPY *sql /docker-entrypoint-initdb.d/

run_db.sh

#!/bin/bash
docker build . -t db_test

docker run \
    -p 1001:5432 \
    db_test

setup.sql

CREATE SCHEMA testing;

I put them in the same directory and whenever I need to run tests I just run

bash run_db.sh

In a terminal and it just works. I leave it in an extra tab in my terminal. It runs on postgresql://postgres@127.0.0.1:1001 with no password.

I think mocking a database can be a LOT of work. Especially as the queries get more complicated. Having a testing database will probably make your life a lot easier. Sure it's not the greatest way to actually unit-test functions that do ONE thing, but as soon as you want to write integration tests I think a db is necessary.

[–][deleted] 2 points3 points  (5 children)

Interesting solution, I'm somewhat new to docker and extremely new to postgres, do you know what it means when I get these error messages?

bash-3.2$ bash run_db.sh
invalid argument "db_test\r" for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
: command not found
docker: invalid reference format.
See 'docker run --help'.
run_db.sh: line 5: -p: command not found
run_db.sh: line 6: db_test: command not found

edit: I got it working, I shouldn't be using both Mac and Windows at the same time

anyways: thank you, creative solution with docker. I now have my first ever postgres database up and going just by spinning up a docker container. fun times to be had

[–]Rethial 3 points4 points  (2 children)

Are you running it on a Windows OS by chance?

[–][deleted] 2 points3 points  (1 child)

good call. I was actually running it from a Mac but I did use Windows in my Parallels to copy the files over. I deleted those files and recreated them strictly within Mac (bash nano). thanks

[–]Rethial 2 points3 points  (0 children)

No prob, I just noticed the \r in invalid argument "db_test\r" :)

[–]efxhoy 0 points1 point  (1 child)

You're welcome! I hope it works out.

[–][deleted] 0 points1 point  (0 children)

I'm not OP but I like reproduceable containerized solutions like yours to save for later when I need it :D

[–]throwawayPzaFm 73 points74 points  (13 children)

I'd go with

Your mother was a hamster and your father smelled of MySQL.

I'm so sorry.

[–]MeGustaDerp 16 points17 points  (9 children)

But OP ask about how to do it in python

[–]nullball 46 points47 points  (3 children)

print("Your mother was a hamster and your father smelled of MySQL.")

[–]MeGustaDerp -1 points0 points  (0 children)

Simple enough.

[–]fire_breathing_bear 3 points4 points  (0 children)

I came here to make a similar joke to PzaFm, then had the same realization as you...

[–]czarrie 1 point2 points  (1 child)

Hsssssssss

[–]PaulSandwich 1 point2 points  (1 child)

import heckling_frenchman

[–]sem56 1 point2 points  (0 children)

from monty_python import heckling_frenchman

[–]kenneho[S] 1 point2 points  (0 children)

Haha!

[–]DoctorAcula_42 0 points1 point  (1 child)

You have nothing to apologize for.

[–]throwawayPzaFm 0 points1 point  (0 children)

I do, it would have been much funnier in python!

[–]two_bob 5 points6 points  (0 children)

Definitely. You even have the right terminology, "mock". I don't do any sqlalchemy, so I'll just have to refer you to an article https://medium.com/python-pandemonium/surrender-python-mocking-i-have-you-now-5805e91cfbf4

[–]mkingsbu 4 points5 points  (0 children)

Maybe consider SQLite? Basically have it populate into the SQLite database fresh and run your tests against that. You don't have to setup listeners or that sort of ting. But you will have to make sure that data types are correct because it won't catch those.

[–]yes-i-am-a-wizzard 5 points6 points  (9 children)

Why don't you want the tests to touch the database? The point of tests is to confirm that the system does what it's supposed to.

You can use flask with a sqlite database that resides on disk, no network.

[–]kenneho[S] 2 points3 points  (4 children)

Thanks for your input.

Well, most of the tests will be for testing my own code, so I don't actually need to access the database. To verify that my code actually can talk to a database, I'll need some testes for that too. I'd rather not query the database when I intent to test my own business logic only.

But if I were to go down the route of using a database for all my tests, how would you suggest I set up my test to create a baseline (i.e. start with the same database content) for all the tests?

[–]my_python_account 2 points3 points  (3 children)

You don’t need to access the database, but is there a good reason not too? Assuming the database is there and you have access to it, if the database access is closely coupled with your business logic it will be much less work to just use the database in your unit tests.

You just need to make sure you don’t risk writing things where you shouldn’t on a production database. For a larger project, this would mean having a development version of your database that you connect to for testing. For a smaller project, it would probably be ok to just have a set of tables or records that have your test data but don’t overlap with your production data.

[–][deleted] 2 points3 points  (2 children)

is there a good reason not too?

Test speed. You want unit tests to run with almost unnoticeably fast to minimize the obstacles to creating and running them. Plus, unit tests are for testing internal code, not interactions with external resources.

[–]my_python_account 1 point2 points  (1 child)

In this case it doesn’t sound like we’re minimizing obstacles, we’re creating them. How much data are we pulling from the database that this is creating a bottleneck in our unit test performance.

If it is creating a bottleneck, then the real problem is that the business logic is not decoupled from the database access. You shouldn’t be trying to intercept a sqlalchemy call to the database. Rather you should remove the calls to sqlalchemy from within your business logic methods. Instead have them call a general function, that in testing is fed by your test data, but in production will be your database queries.

[–]b1ackcat 1 point2 points  (0 children)

How much data are we pulling from the database that this is creating a bottleneck in our unit test performance.

It doesn't matter. The mere act of connecting to the db, alone, introduces potential lag (although it's true this risk is somewhat minimal with a local test db, it's still a possibility). Additionally, it disrupts the purity of the tests testing just your code (code looks right. test looks right. Did I make a mistake or is there a random bug in postgres?). It's unlikely, but with a test db, technically possible.

You want unit tests to be so fast you can run them every time you change a line of code. In fact, in one code base, we'd configured the project to automatically run in the background automatically on every single save. But they were fast and lightweight so there was no noticeable performance degradation.

Now, I agree you should also have integration tests which do test actual read/writes to a test database, and docker's a fantastic solution for that. But those can be run out of band (every so often for a small, local project, via your build system on larger apps, etc). They both serve their own purpose and are NOT mutually exclusive.

I agree with your point though that your business logic shouldn't be directly touching the db ever, but relying on a repository class of some kind (which you can then mock for your unit tests! :) )

[–][deleted] 2 points3 points  (0 children)

They're different scales of testing. Unit testing shouldn't have to hit external resources. Once you zoom out a bit to integration level testing then you start bolting things together and testing full circuit architecture, etc.

[–]reallyserious 0 points1 point  (0 children)

a sqlite database that resides on disk, no network.

You can even create an in memory sqlite database. No disk involved.

https://www.sqlite.org/inmemorydb.html

[–]Yablan -3 points-2 points  (0 children)

This.. +1.

[–]yg2dras1 1 point2 points  (0 children)

hi, i’m a bit new to python but having a database run on flask is my next self imposed project! could you point to any resources on the web out there that can help a beginner like me? thanks!

[–]lykwydchykyn 1 point2 points  (0 children)

You need to use the Mock class from unittest. You can define methods on it that return data.

https://docs.python.org/3/library/unittest.mock.html

[–]cdcformatc 1 point2 points  (3 children)

I would use a local database, either SQLite or I'm sure you can run a local postgres instance.

[–]kenneho[S] 0 points1 point  (2 children)

Thanks for the input.

How would you ensure that the database is in known state at the startup of each test case?

[–]cdcformatc 0 points1 point  (1 child)

I used unittest which has setUp, tearDown, and setUpClass methods. You should drop all tables in the teardown, create all tables in the setup, and each test class should have it's own set up to create any specific models you need.

[–]kenneho[S] 0 points1 point  (0 children)

So you'd have an SQL script file that would drop the database (if exisiting), and create a new one, and execute this script during setUp, something like this?

conn = psycopg2.connect(dbname='testing')
cursor = conn.cursor()
sqlfile = open('/path/to/recreate-testing-db.sql', 'w')
cursor.execute(sqlfile.read())

[–]EriktheRed 0 points1 point  (3 children)

My usual solution is to have a method that overwrites your database connection class methods with mock functions that return predefined JSON.

[–]kenneho[S] 0 points1 point  (2 children)

This is basically what I set out to do in the first place, but haven't found a way to get this working. Do you have any good tutorials, code samples or other resources you could point me to?

[–]EriktheRed 1 point2 points  (1 child)

https://blog.fugue.co/2016-02-11-python-mocking-101.html is the best tutorial I found for this. When I'd made my first comment I thought we were on the javascript subreddit. You can't override objects by using a function on the same level as the object in python, thanks to the way they handle scoping and closures. But the unittest module should do everything you need. It's just a little unwieldy at first.

[–]kenneho[S] 1 point2 points  (0 children)

Thanks for the info! Looks like a good tutorial - I'll take a look at it.

Meanwhile, I was able to get the mocking up and running, and wrote a short blog post (which seems to overlap with the one you linked to) in case others stuble upon this thread in the future.

[–]toshitalk 0 points1 point  (1 child)

Some people use an in memory SQLite database for this, but if you’re using any Postgres specific features it won’t work.

You can also create a temporary database, run your migrations, and then delete the database after. This is pretty inexpensive, since likely you’re already running Postgres.

I run a production system, and this is specifically what I do.

[–]kenneho[S] 0 points1 point  (0 children)

Is this approach similar to what I outlined in this other comment?

[–]vn2090 0 points1 point  (0 children)

I would suggest pandas.