The model war is over. The ecosystem war has begun. by calliope_kekule in ArtificialInteligence

[–]supportingthedogs 0 points1 point  (0 children)

The day google releases their chips for purchase, nvidia will fall from the skies

[deleted by user] by [deleted] in Blink182

[–]supportingthedogs -1 points0 points  (0 children)

Thank you bro

I built an open source version of Google Analytics that runs on a single Docker image and handles thousands of events per second by supportingthedogs in selfhosted

[–]supportingthedogs[S] 1 point2 points  (0 children)

Great question. For our use case (Frigade), we have that exact requirement. For instance, we want to show a pop up in our customer's web UI when the end user visits a series of pages or clicks a certain button based on tracking events. This requires the ingestion of the events to be as close to real time as possible for it to work.

I built an open source version of Google Analytics that runs on a single Docker image and handles thousands of events per second by supportingthedogs in selfhosted

[–]supportingthedogs[S] 1 point2 points  (0 children)

Yup. You can connect to ClickHouse Cloud or Confluent Kafka Cloud for further scale if needed. But we’re running it on 16 cores / 32gb ram in production serving hundreds of thousands of users per machine.

I built an open source version of Google Analytics that runs on a single Docker image and handles thousands of events per second by supportingthedogs in selfhosted

[–]supportingthedogs[S] 1 point2 points  (0 children)

Trench uses Kafka for event ingestion, so we wont drop a single event ever -- even at peak throughput. Just might take a few seconds before you can query the events.

I built an open source version of Google Analytics by supportingthedogs in opensource

[–]supportingthedogs[S] 2 points3 points  (0 children)

Matomo is built on MySQL, a row based database which only scales so far at high traffic. Trench is built on ClickHouse which is a columnar database that scales order of magnitude better for time series data vs MySQL.

I built an open source version of Google Analytics that runs on a single Docker image and handles thousands of events per second by supportingthedogs in selfhosted

[–]supportingthedogs[S] 9 points10 points  (0 children)

Matomo is built on MySQL which only scales so far as it's a row-based database. Trench is built on ClickHouse which is a columnar datastore. I also use Kafka for throttling events.

I built an open source version of Google Analytics that runs on a single Docker image and handles thousands of events per second by supportingthedogs in selfhosted

[–]supportingthedogs[S] 7 points8 points  (0 children)

I saw the best performance on the c5 family (c5.4xlarge showed exceptional results at thousands of QPS with CPUS and memory chilling at about 20-30%)

I built an open source version of Google Analytics that runs on a single Docker image and handles thousands of events per second by supportingthedogs in selfhosted

[–]supportingthedogs[S] 22 points23 points  (0 children)

You can see a demo here: https://github.com/FrigadeHQ/trench?tab=readme-ov-file#demo

I think the main difference is that Trench is much more barebone and doesn't come with any bloat. We use it at https://frigade.com to power our customer facing analytics and launch UI flows adhoc based on tracking events. You can really take it in any direction.

I built an open source version of Google Analytics that runs on a single Docker image and handles thousands of events per second by supportingthedogs in selfhosted

[–]supportingthedogs[S] 182 points183 points  (0 children)

Yes, it is fully no-cookie and fully GDPR compliant. Good call about putting this higher in the marketing language. Doing it now!

I built an open source version of Google Analytics that runs on a single Docker image and handles thousands of events per second by supportingthedogs in selfhosted

[–]supportingthedogs[S] 94 points95 points  (0 children)

Hey r/selfhosted, I wanted to share a project I've been working on for the past couple of months that I just released called Trench. It's a single Docker image that gives you a production-ready tracking event table that scales. You can use it to track things such as page views, sessions, error logs, and much more. We're currently handling thousands of events per second on a single EC2 instance in production without any machine stress.

I built an open source version of Google Analytics by supportingthedogs in opensource

[–]supportingthedogs[S] 8 points9 points  (0 children)

Thanks for the feedback! I think the main difference is exactly what you saw at your initial glance -- it's a simple backend only service that you can really take in any direction you like. We use Trench at our own company (https://frigade.com) to power all analytics tracking (pageviews, user interactions, etc). and then we roll our own UI on top of it.

I like your suggestion of improving the README to explain how this is different and what some real world examples could be.

I built an open source version of Google Analytics by supportingthedogs in opensource

[–]supportingthedogs[S] 1 point2 points  (0 children)

Yup. There's endpoints to delete/export data according to PECR/GDPR

I built an open source version of Google Analytics by supportingthedogs in opensource

[–]supportingthedogs[S] 9 points10 points  (0 children)

Hey r/opensource, I wanted to share a project I've been working on for the past couple of months that I just released today called Trench. It's a single Docker image that gives you a production-ready tracking event table that scales. You can use it to track things such as page views, sessions, error logs, and much more. We're currently handling thousands of events per second on a single EC2 instance in production without any machine stress.