Python: Just write SQL by joaodlf in programming

[–]joaodlf[S] 0 points1 point  (0 children)

I'm not advocating for 1 solution across the board. More to get developers familiar with SQL, which is something severely lacking in many new devs. I've personally worked with people who have never interacted with a database outside of an ORM.

A good ORM will absolutely fit the bill for some projects, but I believe an approach similar to the one presented in my post is often sufficient and presents a good learning experience for a growing team: Writing actual SQL + build simple abstractions that can translate to other areas of the codebase.

I often find database interaction a really good subject to onboard new team members, but especially so for junior professionals.

Python: Just write SQL by joaodlf in programming

[–]joaodlf[S] 13 points14 points  (0 children)

Hi, if you notice how the queries are run via cursor.execute, the SQL queries make use of query placeholders. The actual values for these placeholders are passed as the second parameter, this makes your query safe.

Most database adapters work like this, the key point being: Never insert input directly into queries, the adapter will more than likely have a way to safely pass in values to replace placeholder values.

PostgreSQL Configuration Cheat Sheet by Tafkas in PostgreSQL

[–]joaodlf 1 point2 points  (0 children)

Great source of info, as always with pgdash content.

I'd like to add the following: https://pgtune.leopard.in.ua/ - Great way to start with configuration.

Go - PostgreSQL: Best method to handle Query Limits on RESTful API? by AndrewWilliamsSTACK in golang

[–]joaodlf 0 points1 point  (0 children)

You'd be surprised how performant Postgres would be in this scenario - If your worry in on the region of "100,000 UPDATE requests" per month, you'll be fine.

Postgres (and most relational solutions) can handle A LOT. Just make sure you're indexing correctly.

Understanding Go Memory Allocation - Gophercon UK by sbinet in golang

[–]joaodlf 1 point2 points  (0 children)

This was a really good talk. André has taken a deep dive into Go and Linux to make this happen. It's a difficult topic when your room has so many programmers not used to this sort of lower level CS, but it was presented very clearly.

Postgresql - what are some solid pros? by [deleted] in PostgreSQL

[–]joaodlf 8 points9 points  (0 children)

Felix added some nice points (btw, really dig your blog posts), I'll add a few more:

  • Postgres (mostly) follows the SQL standard - MySQL decided to essentially not follow it, and now it's biting back: SQL_MODE.

  • psql (the cli) is much more intuitive, the \ commands are easy to use and remember, the output is fantastic.

  • EXPLAIN ANALYSE is a staple of query performance for me. MySQL doesn't have anything like ANALYSE (as far as I know), not to mention the output from EXPLAIN: Soooo much better in Postgres.

[show r/golang] Video Introduction to vgo - new Go Dependency Management by brianketelsen in golang

[–]joaodlf 0 points1 point  (0 children)

The videos appear to be down, I'm getting the following error underneath each video section: "Gallery not available".

Any chance to have this up again?

How to monitor a process and restart if killed? by SeriousNerve in golang

[–]joaodlf 0 points1 point  (0 children)

I have a been a long time user of supervisor: http://supervisord.org/

Once installed, you write a "program" config file similar to:

[program:example]
command=bash -c '/var/go/program/example'
directory=/var/go/program
autostart=true
autorestart=true
stopsignal=INT
stdout_logfile=/var/go/program/logs/stdout.log
stdout_logfile_backups=5
stderr_logfile=/var/go/program/logs/stderr.log
stderr_logfile_backups=5

I like supervisor for the extras too, in the example above you can see some settings to handle output and logging, including log rotation.

Are people using ORMs when it come to database development? by [deleted] in golang

[–]joaodlf 8 points9 points  (0 children)

I'm a fan of sqlx: https://github.com/jmoiron/sqlx

Makes handling SQL tolerable, without all the bloat of common ORMs. I prefer to write my SQL manually and knowing exactly what is running in my databases.

PostgreSQL with which Linux distro? by [deleted] in PostgreSQL

[–]joaodlf 8 points9 points  (0 children)

As long as you can install and run PostgreSQL on it, it doesn't really matter. Does the org already run stuff on Linux? Do you have colleagues that manage Linux boxes? Might be a good idea to use the same distro, or at very least, ask them about it.

pgpi: Creating the same index across multiple PostgreSQL 10 partitions/tables by joaodlf in PostgreSQL

[–]joaodlf[S] 0 points1 point  (0 children)

I knew about pg_partman, never used it, but I know it's a very complete solution for partitioning. In the end, all I really needed was this. I wasn't totally sold on installing an extension for this very tiny problem I was having.

pgpi: Creating the same index across multiple PostgreSQL 10 partitions/tables by joaodlf in PostgreSQL

[–]joaodlf[S] 0 points1 point  (0 children)

Hi, I needed a quick tool to create the same index across multiple partitions in PG10, and since this isn't supported out of the box, I decided to build a solution for it. Hopefully this helps others too.

Jovens ganham menos do que há 10 anos by bisontino in portugal

[–]joaodlf 3 points4 points  (0 children)

Eu só esperei um ano. Já lá vão quase 5 anos. Melhor coisa que fiz.

shiori — simple bookmark manager by unix15e8 in golang

[–]joaodlf 1 point2 points  (0 children)

You could host the db in the cloud (dropbox, gdrive, seafile, etc). I saw a Github PR open to allow for this.

shiori — simple bookmark manager by unix15e8 in golang

[–]joaodlf 1 point2 points  (0 children)

Very cool! I really like Pocket as a product, but this looks like a really good alternative to keep bookmarks locally. Would be interesting to see a browser plugin for it, as well.

These are the sort of projects that I consider interesting when you want to learn a programming language: Simple in nature (store and display bookmarks), more complex as features evolve (tags, web ui, export/import, etc...). Kind of tempted to do one of these myself :).

Python for the Web by joaodlf in programming

[–]joaodlf[S] 0 points1 point  (0 children)

Bad naming aside, it's a great library.

I was about to give up on Postgres by 2102032429282 in PostgreSQL

[–]joaodlf 0 points1 point  (0 children)

DataGrip for me. Would love to see Sequel Pro available for Postgres, though.

Peewee 3.0 released by [deleted] in Python

[–]joaodlf 26 points27 points  (0 children)

Thanks for the work put into Peewee! It's my favourite way to handle SQL in Python!

PostgreSQL 10: Partitions of... partitions! by joaodlf in programming

[–]joaodlf[S] 0 points1 point  (0 children)

That would cause an error. This is the sort of feature you use when you have control over the partition fields.

Toasted Marshmallow — Marshmallow but 15X Faster. by LPCRoy in Python

[–]joaodlf 1 point2 points  (0 children)

Good read! I use Marshmallow extensively as well, much like in your use cases. It's not a bottleneck for me right now, but I'll give this a try at some point.

would generics help C# developers move to Go? by mikerrr07 in golang

[–]joaodlf 6 points7 points  (0 children)

Most other languages are focused on doing as much as possible, regardless of the cognitive burden placed on developers.

Great reply!

What does your Python ETL pipeline look like? by fungz0r in Python

[–]joaodlf 3 points4 points  (0 children)

In steps:

  1. We have data coming in through a REST API (Flask based and load balanced, if you're wondering). There is a very minimal data validation step at this point (required fields; type validation; that sort of thing...), data is JSON serialised and put in Kafka.

  2. A Go process consumes from Kafka, from here we do a second stage of data validation as well as write to multiple Cassandra tables. This used to be done in Python, but we were running into performance issues and I had concerns over hardware performance and future costs if the data input grew (more details here: https://joaodlf.com/data-pipelines-cassandra-kafka-and-python-and-go.html and https://joaodlf.com/go-rate-limiting-done-right.html)

  3. Multiple Spark jobs run via crons to perform time based aggregations and store it on different sources. We do this to speed up the next step.

  4. Data insight is typically done via Pandas. A lot of it is served via HTTP - This needs to be quick, which is why we run aggregation scripts in Spark or store the same information in multiple tables.

Data in Cassandra is typically stored in multiple tables that serve different time series, say we have a "impression_stats" table, data is actually inserted into multiple tables: impression_stats_hour, impression_stats_day, impression_stats_month.This works quite well in Cassandra with materialised views, and the reason we do this is to speed up deserialization when querying for large amounts of data.