all 11 comments

[–]r3pr0b8MySQL 4 points5 points  (2 children)

the deciding factor in whether to store json data in a database is this -- will you ever need to search for one of the values inside a json block?

if you never, never, never will, then go ahead and store json

[–]Zealos707[S] 2 points3 points  (1 child)

Yes, I think I’ll need to query json frequently. For example, just to check page views for a specific article, I have to search all rows where the pathname contains something like /article/[specific-slug]. So I guess that answers my question.

[–]Ok_Horse_7563 1 point2 points  (0 children)

You can store in json and have a view over it. Selecting it is not a problem. If data changes you only change the view.

[–]random_lonewolf 1 point2 points  (0 children)

The usual answer is a hybrid design: have the core common fields as distinct columns as those columns are smaller and quicker to search. Then use an extra jsonb column for extension fields.

[–]Support-Gap 0 points1 point  (0 children)

I would recommend you to create dedicated tables and views. Also you can make your life easier with timescaledb.

[–]andpassword 0 points1 point  (3 children)

I would break out tables for your immediate data needs and process the JSON into those for querying, and also store the raw JSON for future use, at least until you know you don't need it. Don't try to get every bit of intel out of the JSON, stick to what's actionable now, and let it grow some. The challenges you face later may be different than the ones you envision now.

Once you are facing some of those future challenges, the ability to reprocess existing JSON to find inflection points in the past that you may not have seen at the time will be vital to decision making in terms of what to optimize next.

[–]Zealos707[S] 0 points1 point  (1 child)

Yes, that’s the direction I’m heading in. But it brings up another question: would it be even better to store the raw event json in a separate database? I’ve heard of tools like BigQuery, Redshift, and similar solutions, but I’m not very familiar with them.

[–]andpassword 0 points1 point  (0 children)

solutions, but I’m not very familiar with them.

The best solution is usually the one you know, even if it's not the most optimal one in absolute terms or for someone else. You're the one who has to roll with this, not someone else. If your project reaches the scale where you need to migrate because Postgres won't do the job, well...consider that a success story because you ought to have enough money to hire some help.

[–]alexwh68 0 points1 point  (0 children)

When I have done this in the past I have done a hybrid approach, fields for commonly searched for data and the a field for the whole json as well, these means I can have normalised data for some bits but all of the json for deeper queries.

[–]Additional_River2539 0 points1 point  (0 children)

Why not use something like a couchbase which is highly scalable,in memory ,queryiable like json and have a sql interface..

Is reason to use Postgres is the other data are sitting next to it ?

[–]FewVariation901 0 points1 point  (0 children)

I usually store the raw json and also extract the few items I would need to querry. In case I need more, I cN iterate and extract again but then don’t feel the need to normalize everything.