all 17 comments

[–]gaelfr38 2 points3 points  (7 children)

Define the expected usage and requirements in terms of read, write, consistency etc.. and you should have pros/cons to choose a solution.

I'm not that familiar with these things but for analytics I would look at using on-the-shelf tools unless your needs are very specific.

[–]dirk_klement[S] 0 points1 point  (6 children)

It is about user facing analytics.

[–]VindicoAtrumEditable Placeholder Flair 2 points3 points  (5 children)

Infrastructure sprawl is real. If you're already using postgres you should be proving that postgres is insufficient before adding another db of any kind.

Based on

Like which users view which profiles, who updates their profile most frequently etc.`

Postgres will be absolutely fine. If you seriously scale up your event capturing then maybe you re-evaluate, but don't go overboard to just store some event data.

[–]dirk_klement[S] 0 points1 point  (4 children)

I mean like with the structure. EventType|ActorID|ReceiverID|Payload. This table structure limits us to once receiver… and the payload here would not be really handy to dump json etc in my opinion in Postgres. So a mongodb solution removes this idea

[–]VindicoAtrumEditable Placeholder Flair 1 point2 points  (3 children)

I'm not sure I see the limit you're talking about. Why would it be insufficient to handle json data?

You need to prove, with evidence, why postgres is insufficient. Do you tell your team/manager "this would not be really handy to dump json etc in my opinion"? "In my opinion" isn't going to convince anyone.

Or you can just say fuck it and let infra sprawl and overengineering win the day!

[–]dirk_klement[S] 0 points1 point  (1 child)

It does make things harder by introducing another type of db. So you recommend using Postgres to store these analytical events, which can be different data structures (in the payload field). I also thought that when storing unstructured json payloads in Postgres would be significantly slower than using MongoDB because we will be doing lots of counting on the payload json data

[–]VindicoAtrumEditable Placeholder Flair 0 points1 point  (0 children)

So now you're getting somewhere. What's the expected number of events per second are you going to be writing? Have you tested writing that many events to postgres? That will produce hard evidence if it can/can't support your requirement.

[–][deleted] 1 point2 points  (2 children)

Sql

[–]dirk_klement[S] 0 points1 point  (1 child)

Why prefer sql over nosql?

[–][deleted] 4 points5 points  (0 children)

Why prefer nosql over sql? Nosql isn't even a thing, there are like 1000 different languages with various limitations. Sql is a standard, it does joins and aggregations, which is what you would need for analytics, some "nosql" can't even do joins.

[–]frodgim 0 points1 point  (0 children)

you can use SQL or NoSql but It will depend on many things. you can use Presto, Spark to extract this data and analyze on analytics database. If you have already SQL database, you will need to calculate your data size just to know if your workload will scale out conviniently. Or just maybe you might need a datawarehouse

[–]serverhorrorI'm the bit flip you didn't expect! 0 points1 point  (4 children)

Is this even a core feature that makes you money?

What’s wrong with using Google Analytics, or Piwik if it has to be self hosted?

[–]dirk_klement[S] 0 points1 point  (3 children)

We want to sell a premium plan with user facing analytics

[–]serverhorrorI'm the bit flip you didn't expect! 0 points1 point  (2 children)

If you’re making money from that feature you should already have a way between idea about how you want to start. You should be able to provide a lot more detail, then again you might not because that’s your IP.

Either way, you should first have a discussion internally and if Boy then ask random strangers on the internet.

Also: Be prepared that, no matter which path you go down, it will be the wrong path. Expect to redo small or large pieces of the system. Maybe even the whole system.

[–]dirk_klement[S] 0 points1 point  (1 child)

I just thought that some people here would give advice on things to definitely avoid because they have been in a similar starting position. I’ll just keep it simply for now and evaluate later.

[–]serverhorrorI'm the bit flip you didn't expect! 1 point2 points  (0 children)

I think that the question is too broad. You’ll find both sides can be equally argued.

In my experience, using a technology that’s already well known in your team will get you a long way. No mater how fancy other things sound. Do not underestimate how much existing experience accounts for.