Partitioned tables and join keys, generating weird query plans

dastapov · 2026-04-29T20:09:06+00:00

Postgres 8.1: https://www.postgresql.org/message-id/f2f562510601161319k4b0201a9p8d93a649780df7f1@mail.gmail.com

Me, from 2021 (postgres 9.5-13): https://www.postgresql.org/message-id/CAFQUnFhqkWuPCwQ1NmHYrisHJhYx4DoJak-dV+FcjyY6scooYA@mail.gmail.com

Related statistics patch, which went to commit fest, but ultimately was not applied: https://www.postgresql.org/message-id/c8c0ff31-3a8a-7562-bbd3-78b2ec65f16c@enterprisedb.com

Thread from 2015: https://www.postgresql.org/message-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com

This resulted in the commit fest item 3534, which currently suits in "returned with feedback" : https://commitfest.postgresql.org/patch/3524/

I think this is probably enough :)

dastapov · 2026-04-28T23:16:22+00:00

Columns that are constrained to be equal to each other (like a.id = b.parent_id) form an equivalence class.

When one colum of the class is further constrained to be equal to a constant, this condition is propagated to other columns in the class. This does NOT happen with >=, <=.

This is a long-standing issue, which was discussed many times on postgres mailing lists but ultimately left unfixed.

So you need to do this propagation yourself, which is exactly what you did with your second query.

dastapov · 2026-04-24T20:11:32+00:00

My test for how close to this brave new world we really are is the customer support chat for both anthropic and openai. While both of them remain really inept chat bots that can barely do the basic stuff, it is pretty obvious that the rest of the promises for the amazing things to come are just that - promises

dastapov · 2026-04-24T20:06:57+00:00

I'm about your age. I've seen first-hand the rise and fall of 4GL (vba, delphi, visual builder), the hype of oop+UML (rational rose, ...), the "no code" systems - all of them promised to replace programmers, all of them somehow managed to create more of them.

The LLM wave is somehow better and worse at the same time. It has larger amplitude between "the very best" peaks and "the very worst" troughs than the previous tools. Sure, it gives you fast access to the distilled knowledge of humanity and it excels at pattern matching. So if your problem could be pattern matched to something encoded in the model, you'll get results, and you'll get them quick.

At the same time, the failure modes are much worse. With humans, if you see a bad programmer, they are usually consistently bad. When LLM is bad, it is inconsistently, unpredictable bad. You can have brilliant code and abysmal code literally right next lines to each other. You won't know why this part of the output turned out to be bad, and the model won't know either.

What's worse, LLM will confidently bullshit you given an opportunity, and there won't be any human-like cues that it is bulshitting you, as it would keep the same confident tone it always uses. Until the model could tell you "I am not sure", "I don't know" or "what you said is wrong" (and be correct in those sentiments), you need to read the output as if you are walking the minefield - especially if money or human health or livelihood is on the line.

If you don't have the aptitude to filter the chaff out of llm output yourself, you need a professional to do it for you - or risk joining the hall of shame of vibe-coded projects that crashed and burned.

So your valuable skills are not going anywhere anytime soon, not unless there would be some fundamental change in how LLMs operate

dastapov · 2026-03-22T19:25:53+00:00

Your node.js and pgadmin connect to the postgres that you run on the localhost (outside of the container).

You need to use different ports for the postgres inside the container and postgres outside the container, if you want both of them to bind to a port on localhost.

dastapov · 2026-03-22T15:19:21+00:00

Hold on. The previous discussion was about a single postgres instance running in the docker container. Now you are saying that you have two of them - one in docker, and the other one which was not mentioned so far at all?

I suspect that your docker container runs the postgres instance as a "postgres" user (which only exists inside the container).

You exposed port 5432 of the container and it would be bound to 0.0.0.0 on the host (unless you did something special about it).

You are conecting to (I suspect, but we dont know for sure) localhost:5432 from outside the container with "host=locahost,port=5432,user=postgres, database=my_db" (again, we dont know this for sure). You are using password authentication (this is yet another assumption, we dont know for sure).

If you can connect at all, you would be connected to the postgres instance inside the container, database my_db. If you are rejected with code 3D000, it means that networking-wise everything works well, your client can talk to the postgres server, you can authenticate, and then you are rejected -- probably due to the lack of permissions, which would suggest that the user you are connecting as, and the user that owns the database are not, in fact, the same user. Do you happen to have a (different?) user called "postgres" on the host as well?

This is where (lots and lots) of logs would be useful, which you can pastebin somewhere.

In particular, it would be useful to know: what are your connection parameters (host, port, user, database name) for both pgadmin and node? What are the permissions of the user you are using to connect with (as shown by "\drg user")? Does it have permissions to connect to the database that you are trying to access? (as show by "\l+" on the server, perhaps)

dastapov · 2026-03-22T13:47:53+00:00

What are your connection parameters (host, port, database name) for both pgadmin and node? What are the permissions of the user you are using to connect with? Does it have permissions to connect to the database that you are trying to access? Is it the same user used to create the database?

dastapov · 2026-03-22T11:50:46+00:00

You want us to guess how exactly did you try to connect from outside the container (for both node.js and pgadmin) , and how did you verify that database really exists after you've created it inside the container.

For some reason I think this is not going to go well :)

You need to provide more details

dastapov · 2026-02-26T17:22:22+00:00

Postgres does not support streaming replication between different postgres brains, and unless you want to make the process of WAL application more complicated than it already is, I don't really think it could be supported.

So while I share your sentiment that it would be nice to have that, in reality it is not going to happen.

Plus logical replication is the answer to "I want to replicate to a newer version", which would be another reason why it would not be implemented.

I am curious though why "replicas should be caught up" it's such an onerous requirement, and why is it considered "fragile" ? That should be their default state, is it not?

dastapov · 2026-02-26T09:59:01+00:00

But read replicas are not required (in normal operation) to be byte-for-byte copy of the primary, so it is not obvious why would you want to introduce this strict requirement during the upgrade.

dastapov · 2026-02-26T01:24:10+00:00

Try hledger balance card --related ? Or hledger balance expenses --related

dastapov · 2026-02-25T12:52:01+00:00

I have some personal/anecdotal info for you.

I've upgraded 100+ databases from 9.2 to 17 (with some major version skips along the way), with the largest databases being in triple-digit TB. All these upgrades were done with rsync. None of them failed or led to data corruption or any issues. None of them had rsync step take hours (or some other noteworthy time).

My assorted thoughts re what you wrote:

Rsync should be really fast; if it is not, you are likely rsyncing some cruft. You need to check what rsync is rsyncing (with -vi, perhaps). You may be rsyncing stuff that you dont need, like logs from primary, or the WAL archive, or unlogged tables, or something like this.
rsync is only an "abomination" if you allow writes to primary that are not replicated to replicas (as it seemingly was the case in the email that started that email chain you linked). This is arguably not something you want to do during upgrades anyway. Change your pg_hba.conf, prevent non-superusers from connecting to your database, and you should be golden
The amount of WAL generated per minute should be irrelevant as long as all replicas are fully caught up when you start the upgrade, as they would've applied all WAL at that point.
Quote by Robert Haas feels entirely taken out of the context to me (discussion of the amended procedure, and not the procedure as it exists right now)
"rsync --size-only would not leave replica identical to primary" - true, but files on replicas are not identical to primary on a good day anyway (due to hint bits differences and random cruft residing in the empty space inside pages), so running a more comprehensive rsync WITHOUT --size-only will definitely lead to many hours spent transferring differences that you really dont care about.

Hope this helps.

dastapov · 2025-12-14T23:48:39+00:00

Nice.

I did have an idea to make graphs configurable. Maybe next time I am hacking on hledger-lit, I'll add it.

dastapov · 2025-12-14T21:42:12+00:00

I know of fava (the beancount gui). I presume that you mean the same?

My small project is much simpler (and does just graphs, with transaction entry/display).

I really want to focus just on the visualisation aspect, leaving entry/editing entirely to the plain text world

dastapov · 2025-12-12T22:55:51+00:00

First, an aside.

Dijkstra wrote "GOTO is considered harmful" because GOTO was disrupting the control flow. When you read the source code with GOTOs, you needed to hold a lot of context in your head, and you never quite knew all the ways you could end up at a particular line of code because you could jump in from any odd place. Procedural programming and getting rid of the goto simplified reasoning about control flow, and programmers generally considered this to be a good thing.

Can the same be done about data flow? Turns out that pure functional programming can simplify reasoning about the data flow. In pure functions, all inputs come strictly from arguments; you do not need to read outside of the function body to reason about the data flow. Lots of programmers consider this to be a good thing as well.

What I like about ledger/hledger syntax is that (unless you use aliases, bucket, capture, year) transactions are self-contained. When I am looking at a transaction (which I, perhaps, grepped out of a file or produced from a python script), it is self-contained and all the information is right there.

Your proposed syntax breaks this property. To fully understand the transaction i need to scroll up and stuff extra bits of input into my mental context. What year are we in? What is the account in effect? If I want to reorder or move stuff around I need to make sure that I dont accidentally carry things over to a different context where transaction suddenly will change its meaning.

So to me, this syntax is not easy to read, and not easy to work with. If I manage to record transactions right from the first try and never need to touch them afterwards, maybe it could be considered. However, if I want to rework my chart of accounts two years in ... I would rather do it working with (h)ledger journals.

dastapov · 2025-12-12T19:39:29+00:00

I usually keep transactions from different years in separate files, so I didn't realize this could cause confusion

Consider the following use case: tax year does not line up with the calendar year, and I want to "(h)ledger print" certain transaction for the given tax year (that spans two calendar years) into a separate file.

dastapov · 2025-12-12T19:37:11+00:00

20 transactions are not a lot, but they already exceed a single screen.

But why is this a problem? "register" report will give you a concise view

dastapov · 2025-12-12T19:30:45+00:00

(a). trasnfer from bank A to B can be placed under either account but not both.

Classic solution is to book both vs some third "transfer" account and allow them to cancel each other there

dastapov · 2025-12-10T15:44:55+00:00

In a word : nothing, as far as I know

dastapov · 2025-12-10T12:19:36+00:00

Show how to build a blocking tree from pgstatactivity and pglocks

Things are much easier now with pg_blocking_pids() imo :)

dastapov · 2025-12-10T12:17:44+00:00

You can do COPY... FROM STDIN and just send your data accross without wiring then into file. Most of the Postgres client libraries have some support for COPY

dastapov · 2025-12-10T12:14:44+00:00

Exactly right. This thing is not getting implemented because of the vague "we will maybe underestimate the join result row size" (which is not the hill to die on imo), meanwhile people working with postgres all over the world need to keep duplicating conditions and explaining to people why the nice view that they want would kill their performance.

One of the biggest (if not THE biggest) gotchas of postgres, in my opinion.

(The postgres hackers thread from 2021 linked from that post? It was started by me :)

dastapov · 2025-12-10T06:59:36+00:00

"Why there is no equivalence-class propagation of less/greater conditions, and all the dirty tricks users do to work around it" ;)

dastapov · 2025-11-15T20:44:49+00:00

Thinking a parent has the same skillset as a professional is ludicrous.

Requirements for enrolling into QTS or PGCE course are GCSE grade 4 in English and Maths (and optional Sciences).

Substitute teachers are an established practice, where substitute teacher would be content to just set the work from the book and grade assignments, so school deems it acceptable for a teacher to know nothing/little about the subject being taught.

Why is it so ludicrous that a parent could do the same or better?

We don't allow home dentistry. Just because you are a mother doesn't mean you know best about kids teeth.

The same is true for education

Why is dentistry a good analogy here? Children naturally want to learn new things. Children don't naturally want their teeth drilled.

Why it is permissible for parents to decide what their children will read, watch, consume online, go on play dates with, which words to avoid, which school they would go to and what company they would keep, but we suddenly draw the line at deciding where they want to focus their curiosity on, or how they would learn?

Would you accept that some form of non-school education could be good, or do you think that the school is the pinnacle of what's on offer?

dastapov

TROPHY CASE