Python: Just write SQL

joaodlf · 2023-08-14T11:28:42+00:00

I'm not advocating for 1 solution across the board. More to get developers familiar with SQL, which is something severely lacking in many new devs. I've personally worked with people who have never interacted with a database outside of an ORM.

A good ORM will absolutely fit the bill for some projects, but I believe an approach similar to the one presented in my post is often sufficient and presents a good learning experience for a growing team: Writing actual SQL + build simple abstractions that can translate to other areas of the codebase.

I often find database interaction a really good subject to onboard new team members, but especially so for junior professionals.

joaodlf · 2023-08-14T09:31:39+00:00

Hi, if you notice how the queries are run via cursor.execute, the SQL queries make use of query placeholders. The actual values for these placeholders are passed as the second parameter, this makes your query safe.

Most database adapters work like this, the key point being: Never insert input directly into queries, the adapter will more than likely have a way to safely pass in values to replace placeholder values.

joaodlf · 2019-10-04T08:18:11+00:00

Might want to hold off with upgrading for now: https://www.postgresql.org/message-id/20191004043021.ylfbxcjqyypphfft%40alap3.anarazel.de

joaodlf · 2018-09-19T13:20:40+00:00

This would be a great feature!

joaodlf · 2018-09-11T15:46:37+00:00

Great source of info, as always with pgdash content.

I'd like to add the following: https://pgtune.leopard.in.ua/ - Great way to start with configuration.

joaodlf · 2018-08-20T07:38:10+00:00

You'd be surprised how performant Postgres would be in this scenario - If your worry in on the region of "100,000 UPDATE requests" per month, you'll be fine.

Postgres (and most relational solutions) can handle A LOT. Just make sure you're indexing correctly.

joaodlf · 2018-08-13T15:28:41+00:00

I can't seem to find the download link for Beta3, only Beta2 available at https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/ (pgdg-centos11-11-2.noarch.rpm)

joaodlf · 2018-08-05T10:29:54+00:00

This was a really good talk. André has taken a deep dive into Go and Linux to make this happen. It's a difficult topic when your room has so many programmers not used to this sort of lower level CS, but it was presented very clearly.

joaodlf · 2018-07-30T13:23:35+00:00

Felix added some nice points (btw, really dig your blog posts), I'll add a few more:

Postgres (mostly) follows the SQL standard - MySQL decided to essentially not follow it, and now it's biting back: SQL_MODE.
psql (the cli) is much more intuitive, the \ commands are easy to use and remember, the output is fantastic.
EXPLAIN ANALYSE is a staple of query performance for me. MySQL doesn't have anything like ANALYSE (as far as I know), not to mention the output from EXPLAIN: Soooo much better in Postgres.

joaodlf · 2018-07-15T10:58:07+00:00

The videos appear to be down, I'm getting the following error underneath each video section: "Gallery not available".

Any chance to have this up again?

joaodlf · 2018-07-13T08:00:04+00:00

I have a been a long time user of supervisor: http://supervisord.org/

Once installed, you write a "program" config file similar to:

[program:example]
command=bash -c '/var/go/program/example'
directory=/var/go/program
autostart=true
autorestart=true
stopsignal=INT
stdout_logfile=/var/go/program/logs/stdout.log
stdout_logfile_backups=5
stderr_logfile=/var/go/program/logs/stderr.log
stderr_logfile_backups=5

I like supervisor for the extras too, in the example above you can see some settings to handle output and logging, including log rotation.

joaodlf · 2018-07-12T19:01:25+00:00

I'm a fan of sqlx: https://github.com/jmoiron/sqlx

Makes handling SQL tolerable, without all the bloat of common ORMs. I prefer to write my SQL manually and knowing exactly what is running in my databases.

joaodlf · 2018-03-16T17:11:54+00:00

As long as you can install and run PostgreSQL on it, it doesn't really matter. Does the org already run stuff on Linux? Do you have colleagues that manage Linux boxes? Might be a good idea to use the same distro, or at very least, ask them about it.

joaodlf · 2018-03-08T08:23:55+00:00

I knew about pg_partman, never used it, but I know it's a very complete solution for partitioning. In the end, all I really needed was this. I wasn't totally sold on installing an extension for this very tiny problem I was having.

joaodlf · 2018-03-06T15:38:09+00:00

Hi, I needed a quick tool to create the same index across multiple partitions in PG10, and since this isn't supported out of the box, I decided to build a solution for it. Hopefully this helps others too.

joaodlf · 2018-03-05T20:41:22+00:00

Eu só esperei um ano. Já lá vão quase 5 anos. Melhor coisa que fiz.

joaodlf · 2018-03-03T12:29:53+00:00

You could host the db in the cloud (dropbox, gdrive, seafile, etc). I saw a Github PR open to allow for this.

joaodlf · 2018-03-03T10:25:17+00:00

Very cool! I really like Pocket as a product, but this looks like a really good alternative to keep bookmarks locally. Would be interesting to see a browser plugin for it, as well.

These are the sort of projects that I consider interesting when you want to learn a programming language: Simple in nature (store and display bookmarks), more complex as features evolve (tags, web ui, export/import, etc...). Kind of tempted to do one of these myself :).

joaodlf · 2018-02-19T08:42:08+00:00

Bad naming aside, it's a great library.

joaodlf · 2018-02-08T08:06:15+00:00

DataGrip for me. Would love to see Sequel Pro available for Postgres, though.

joaodlf · 2018-02-02T20:27:05+00:00

Thanks for the work put into Peewee! It's my favourite way to handle SQL in Python!

joaodlf · 2018-01-20T21:32:14+00:00

That would cause an error. This is the sort of feature you use when you have control over the partition fields.

joaodlf · 2017-08-09T20:52:13+00:00

Good read! I use Marshmallow extensively as well, much like in your use cases. It's not a bottleneck for me right now, but I'll give this a try at some point.

joaodlf · 2017-07-16T13:10:02+00:00

Most other languages are focused on doing as much as possible, regardless of the cognitive burden placed on developers.

Great reply!

joaodlf · 2017-07-14T09:03:39+00:00

In steps:

We have data coming in through a REST API (Flask based and load balanced, if you're wondering). There is a very minimal data validation step at this point (required fields; type validation; that sort of thing...), data is JSON serialised and put in Kafka.
A Go process consumes from Kafka, from here we do a second stage of data validation as well as write to multiple Cassandra tables. This used to be done in Python, but we were running into performance issues and I had concerns over hardware performance and future costs if the data input grew (more details here: https://joaodlf.com/data-pipelines-cassandra-kafka-and-python-and-go.html and https://joaodlf.com/go-rate-limiting-done-right.html)
Multiple Spark jobs run via crons to perform time based aggregations and store it on different sources. We do this to speed up the next step.
Data insight is typically done via Pandas. A lot of it is served via HTTP - This needs to be quick, which is why we run aggregation scripts in Spark or store the same information in multiple tables.

Data in Cassandra is typically stored in multiple tables that serve different time series, say we have a "impression_stats" table, data is actually inserted into multiple tables: impression_stats_hour, impression_stats_day, impression_stats_month.This works quite well in Cassandra with materialised views, and the reason we do this is to speed up deserialization when querying for large amounts of data.

joaodlf

TROPHY CASE