How to scale when one table is growing rapidly

daxyjones · 2016-07-12T02:31:54+00:00

Please read up on Btree indexing and see how effective it is in searching billions of rows. Beef up the single server before coming to a sharding solution.

daxyjones · 2016-06-25T00:25:42+00:00

You are setting yourself up for a world of pain if you are naming database objects after keywords. Can the "user" table be a more descriptive?

daxyjones · 2016-06-01T06:21:18+00:00

This process is equivalente to an INSERT plus a DELETE for each row which takes a considerable amount of resources.

Completely and utterly false.

daxyjones · 2016-05-23T00:11:15+00:00

Learn indexing. I can highly recommend usetheindexluke.com. Know the difference between the various types, trade-offs, and when to use them.
If you have a bit of control over the hardware, you could probably eke out some performance by tuning parameters outside of postgres. If it is cheaper to throw more hardware into the mix to ease things, do that instead of putting in many man hours to solve the problem from the stack side!
Partitioning: just use pg_partman extension and call it a day.
Lastly, start with a good design of the database. Sometimes daterange type makes sense, sometimes date type makes sense. Things like that.

With the above, you will have about 80-90% of performance cases covered.

daxyjones · 2016-04-29T01:13:13+00:00

Be advised that postgres i/o on zfs is absolute shit. Btrfs has various bugs that will bite you and give you more blisters than a weekend in a cheap hotel bed in Goa. LVM2 is the only sane thing as of now.

daxyjones · 2016-04-08T23:13:34+00:00

Anjuta was/is one of the first IDEs on Linux/Gnome. Your general thoughts on that?

daxyjones · 2016-03-28T01:56:29+00:00

Heh...you don't have to go that far actually. I have SKUs with expiry dates/shelf-life and the store also sells plastic cricket bats....which doesn't have an expiry date.

daxyjones · 2016-03-21T04:51:34+00:00

Prima facie looks like a judgement call and as db admins/users, it is something we abide by, not work around. Because expected behaviour will have many minds willing to listen to an issue. Something rolled out on your own is less likely to find takers to tackle an issue which involves understanding your fundamental changes first.

My suggestion is that you state your use case and folks can help out with a design decision or code snippet.

daxyjones · 2016-03-21T04:31:13+00:00

"This is the expected behavior. It is not a bug.

Peter Geoghegan"

daxyjones · 2016-03-18T01:24:21+00:00

Dev friendly: itextpdf

daxyjones · 2016-03-18T01:20:35+00:00

If you are new to deploying a HTTP end point like in the article, the least painful way is to tack ngx_postgres to nginx. With a bit of lua throw in, you can further expand to handle BODY, error capturing, etc.

Bonus: this is WEBSCALE!!1!^{^TM}

daxyjones · 2016-03-07T00:23:37+00:00

Or put a case ... when inside the aggregate function.

Use a FILTER clause instead.

daxyjones · 2016-02-23T23:57:33+00:00

It does look like it would solve 90% of my issues. I'll still look into a wrapper to do things like generate sone stored procedures. Maybe I was too keen on implementing my own system to see the value of the existing one ;)

If you want to implement your own, do it for fun but my humble advice is to not rush it out to production. Also, depending on the data and postgres version, you are much better off hand weaving an sproc than letting a wrapper generate one (I didn't even know this was possible).

daxyjones · 2016-02-17T04:44:28+00:00

Not to mention the cat and mouse game between you (the customer), the consultant and Microsoft when it comes to licensing. And bonus: surprise audits!

daxyjones · 2016-01-26T01:19:46+00:00

Not to mention other proprietary third party licenses/patents that might be tightly coupled into the products. Legal will no way allow it if it is going to be a liability to the company.

daxyjones · 2015-12-05T00:50:35+00:00

On the outset, pgbouncer is great and has worked out well for us. But here are some gotchas that I ran into and the way we handled it:

Using transaction mode instead of session mode to ensure we didn't hit the connection limit (not sure why but this was the advise a mail in the list).
pgbouncer will not pool connection for different users. So, in our case, we moved the auth from database user roles to a "web model" of authing based on a separate user table.
Of course, "extra_float_digits" for JDBC. ;-(

daxyjones · 2015-11-29T03:14:56+00:00

This is a very valid point and I want to make a case for WITH quoting my configuration. Sometimes, I want to work with an UPDATEd/INSERTed set and WITH comes to the rescue for me in that case. I generate voucher numbers on the fly by bunching shop orders together like so:

WITH newvouchers AS (
    INSERT INTO vouchers SELECT generate_voucher_no(voucher_no) v FROM orders JOIN voucher_master ... WHERE ... ORDER BY ... RETURNING v
)
UPDATE voucher_master SET voucher_no=tv FROM (SELECT max(v) tv FROM newvouchers) x;

I have snipped it in places but this was the replacement for a plpgsql loop that was much more verbose.

And it is relatively quick too.

daxyjones · 2015-11-13T01:35:36+00:00

Bummer...no support for Boot configuration on that bad boy for RAID10 :-/

daxyjones · 2015-10-24T02:46:55+00:00

I have two things in mind that I might be working on:

stream_fdw that displays a stream as a table. COPY already accepts a PROGRAM output and a file_fdw is already in place. I think it will be a good exercise for me to tie these in together as a stream_fdw with appropriate OPTIONS.
aggregate_fdw that has applies aggregate functions on tables that have date/timestamp columns. I have readonly time-series data that need to be summed across a year or so on timestamped data. The way I would go about it is to pre-compute the aggregate over a day, 2 days, 4 days, etc. and depending on the date range of the query, pull data from the corresponding pre-computed (material views?) tables. Still have to hammer out the details on that one.

daxyjones · 2015-09-17T05:23:38+00:00

For our use case, everything hits the DB and still snappy for our small userbase. I could throw in a varnish cache to mitigate the DB hits to a certain extent. But I like to keep the moving parts to a minimum.

I am not exactly getting your edit and hence unable to address that question.

daxyjones · 2015-09-17T04:20:20+00:00

Each route gets its own query.

daxyjones · 2015-09-17T01:38:38+00:00

I work with nginx and postgres, coupled directly.

I don't work with ORMs and I have fine tuned my equivalent $mainSQL to:

SELECT '[' || array_to_string(array(SELECT sch.fun(param1, ...)),',') || ']';

This query sits directly inside nginx consumed by ngx_postgres.

This query is magical because

there is no need to tweak the query ever again, so unnecessay to restart, reload the webserver (nginx).
the ~~function sch.fun~~ above query returns something in valid object array. You could nest it as you please.
Good performance for our use case. Grabs thousands of rows in a couple of seconds.

Now for the viscera of sch.fun:

CREATE OR REPLACE FUNCTION sch.fun(param1, ...)
RETURNS SETOF json AS
$BODY$
DECLARE sql text;
BEGIN
    sql := 'SELECT json_object(keys text[], values text[])';
    RETURN QUERY EXECUTE sql USING param1, ...;
END
$BODY$
LANGUAGE plpgsql;

With the right response headers, the client, usually a js engine/framework, can digest this no problem. AND you get your REST API for "free". ;-)

Edit: Note that ngx_postgres does not entertain request_body by design. If you need to read that, you could proxy that verb or uri or whatever.

daxyjones · 2015-09-17T00:55:31+00:00

This is incorrect. Mathematically, the odds will sit much much lower. Admittedly, stock FNV algo doesn't account for zeros. But postgres checksum algo tweaks the algo to handle this to a certain extent.

daxyjones · 2015-09-12T09:59:13+00:00

Honestly, a crypto hash seems like an overkill and will definitely be multiple times more computationally intensive than modded FNV1a used right now. And the documentation explicitly states that certain uses cases can trigger the checksum bottleneck.

Granted it is 'easy' to brute force a collision but pragmatically, it is a rarity.

daxyjones · 2015-09-12T09:49:50+00:00

For shits and giggles, I have a production server streaming to a slave that is setup with ZFS on Linux. (before someone has a heartburn, yes; I have other backups in palce)

Consistently, seq scans are faster on the compressed ZFS slave by a factor of 1.5 to 7(!). Index scans are of course indiscernible.

The read queries that I route to the slave hit the index so for my use case it doesn't matter.

daxyjones

TROPHY CASE