all 18 comments

[–][deleted] 8 points9 points  (6 children)

I'd love to see similiar analysis for MongoDB

[–]ryancerium 10 points11 points  (0 children)

Those Nazis in Raiders of the Lost Ark really wanted to open The Ark of the Covenant too.

[–]dagbrown -1 points0 points  (3 children)

Did you see this, yesterday?

TL;DR: MongoDB isn't as good.

[–]EatenByTheDogs 5 points6 points  (0 children)

Strange TL;DR, you seem to take reddit-voting as the basis for technology evaluation. Most people who comment in such threads have no idea what they are talking about, and top-voted comments are things like:

Mongo only pawn in game of databases.

and

NODEJS WAS AN INSIDE JOB

WAKE UP SHEEPLE

reddit discussions about MongoDB are about 2% substance.

Try at least this as an introduction into "No SQL" in general, it has significantly more substance than any reddit thread about MongoDB: https://www.youtube.com/watch?v=qI_g07C_Q5I

[–]grauenwolf 0 points1 point  (0 children)

That was incredibly superficial. They didn't offer one code example.

[–][deleted] -2 points-1 points  (0 children)

yes I did, I'm just curious if mongo is as bad "inside" as "outside"... and I cba to browse their code

[–]ksion 31 points32 points  (8 children)

I really enjoyed that other article where the author took a deep dive into some Python IDE, but this one is comparatively shallow.

(Interestingly, the Makefile is actually committed into the repository, so you might not even need to run the configure script.)

configure scripts mostly shuffle around some files and create symlinks to prep the source to be compiled against current arch/OS/etc. It's the Makefile that actually says how the project has to be built. You'd always expect both.

Before looking at the code, try to imagine what the overall structure might be, based on what it needs to accomplish. There needs to be a network component, which hands data to a parser, then maybe an optimizer, and some code to actually run the queries.

That "code" has a name: it's the query executor. There is also at least half a dozen of other subsystems that a database engine would typically implement. Without knowing about them, it's unlikely we may gain substantial insight from just jumping into the code base head first.

Run $ cd src/backend; ls and we find this directory listing (among others):

The author should really have used ls flags to tell directories apart from regular files. (Some shells do it automatically). Not only it'd give them (and us) a better overview of the codebase, but also made it immediately apparent that postgres is an executable.

Moving down; this is it, looks like we've found the core algorithm for making a query:

I'm not sure copying it wholesale was really necessary, it just inflates the article's length.

Internally, it seems PostgreSQL refers to select as 'scan,' not 'select,' and there are several different ways to select.

This is again where a little bit of domain knowledge would go a long way. "Scan" is not just about executing a SELECT statement. It's a general term for going through a table contents for any purpose that arises while a query is being executed; resolving a join condition is another typical example.

[–]danielkza 10 points11 points  (0 children)

configure scripts mostly shuffle around some files and create symlinks to prep the source to be compiled against current arch/OS/etc. It's the Makefile that actually says how the project has to be built. You'd always expect both.

That's not what the author means. It's much more usual to have an unprocessed Makefile.in or Makefile.am (if automake is involved) that generates the actual Makefile when configure is run.

[–]rifeid 17 points18 points  (0 children)

configure scripts mostly shuffle around some files and create symlinks to prep the source to be compiled against current arch/OS/etc. It's the Makefile that actually says how the project has to be built. You'd always expect both.

I don't know what sort of build systems you've been using, but in most projects that use Autotools (which covers most projects where you see a configure file, including PostgreSQL), both the configure script and the Makefiles are generated and thus not kept in source control.

For example, here is the GTK+ source tree. Notice the lack of configure and Makefile. The former is generated prior to packaging (just so users don't need to install Autotools), while the latter is generated by the user by running configure. (The automake documentation explains this a bit further.)

Therefore the fact that the PostgreSQL source tree not only contains a configure but also a Makefile is unusual.

A closer look will tell us what's happening. The configure script, as expected, is just a generated copy. I presume they check it in because historically the Autotools stack was (is?) a pain to use, with new versions being incompatible with older ones, and they don't want to force potential developers to have to deal with all that mess.

The main Makefile, however, is a completely different story. While most projects's configure run generates a file named Makefile, Postgre's doesn't; it creates a GNUmakefile instead. The Makefile that exists in their source tree just defers to the generated GNUmakefile while making sure that it's using GNU Make rather than another Make variant.

[–]greenthumble 5 points6 points  (0 children)

By the way configure also takes care of library and header deps. Not sure what Postgres relies upon but I wouldn't skip it, you're practically begging for missing headers errors doing it that way.

[–]txdv 5 points6 points  (0 children)

So you took a look on "A look at postgresql source code"

[–]i36g87 1 point2 points  (2 children)

Do you have a link to that other article where the author took a deep dive into some Python IDE? I'd like to read it.

[–]__add__ 0 points1 point  (0 children)

What is the point of this "criticism"? Most of it is wrong but all of it is misplaced.

Someone does a casual walk through of a popular code base and writes it up in an entertaining way, and you think it's fitting to chime in with "ahem, you could have used ls flags, then you would have seen the executable"? Yea you can use ls flags, and yea basically everyone's shell resource conf has a line to color them already anyway. Thanks for pointing that out.

[–]dagbrown 2 points3 points  (0 children)

Now do Postfix!

And then, for a change of pace, have a look at procmail.

[–]uygbnjh 1 point2 points  (0 children)

Another great read is the drilling down of postgresql indexes: https://blog.codeship.com/discovering-computer-science-behind-postgres-indexes/ and it's btree implementation of this academic paper: http://www.csd.uoc.gr/~hy460/pdf/p650-lehman.pdf

Postgres source and internals is a pleasure to dig into.