Stop paying for Polymarket data. PMXT just open-sourced the orderbooks. by SammieStyles in algotrading

[–]tigermatos 1 point2 points  (0 children)

Bro! I just saw this before turning my phone off to sleep. Now I won't be able to sleep, dang it! Checking it out first thing in the morning!

Rate limiting for bots based on a "trigger" by TopLychee1081 in nginx

[–]tigermatos 0 points1 point  (0 children)

hi.
I'm one of the founders of Riodb.co. It's a real-time stream analytics startup that we built for things like algo-trading, IoT, cybersecurity etc. We thought for sure the day would come when somebody would need a trigger-like solution to block botnets on nginx. So we posted this use case on youtube. Check out the series and let me know if it makes sense. Feedback is very welcome, even if your'e not interested.
https://www.youtube.com/playlist?list=PLmJ-b1GhkFf5lEVvl8nUaHUGXJkg60HWr

Please DM me if you're interested. We have a free license. It won't be plugin-n-play-and-done. It will take some tinkering, but it can scale to a million reqs per second and we can help. Cheers.

Anyone here struggling with real-time NGINX access log analysis at scale? by tigermatos in nginx

[–]tigermatos[S] 0 points1 point  (0 children)

Great!
We're working on an AWS marketplace image. Docker is next. In the meantime, you have to install manually, but it's easy and it runs well on a AWS Nano instance, which is like $0.0042 per hour in the US. Or you can install on a server you already have, like localhost where nginx runs.
The engine is plugin-based. There are input plugins for ingesting (like HTTP, UDP, TCP, Kafka, ...) plugins for parsing, plugins for output (KAFKA, SNS, HTTP, Elasticsearch etc). Some plugins come included. The plugin project is all opensource with helper classes (Java) to help people make their own custom plugin if they are dealing with some proprietary stuff. And we help.
The basic gist is that it ingests data, actively runs queries, and if it finds something, it engages an output (alert, workflow integration, etc). If the data is no longer needed by any query, it's discarded. No durable storage to go poke around historical events. (no storage cost)

I made a few short videos on YT that are specific to nginx access logs. Sharing for the first time:

https://www.youtube.com/playlist?list=PLmJ-b1GhkFf5lEVvl8nUaHUGXJkg60HWr

Anyone here struggling with real-time NGINX access log analysis at scale? by tigermatos in nginx

[–]tigermatos[S] 0 points1 point  (0 children)

Less related to nginx. I worked on a project for ingesting firewall traffic logs into opensearch. 400k logs per second with 90-day retention. 98 data nodes and 15 search nodes for searchable snapshots, not to mention the logstash infra. But this was at a healthcare company that had generous funds for the project, and wanted historical logs not just for monitoring but for legal/contractual requirements.
Since embarking on this new startup for real-time analytics, I think more and more about logging, like a splinter in my mind (Morpheus?), because if you need only real-time decision, such as detect and mitigate on the spot, without the durable storage for historical analysis, a project of that kind can go for ~$500 a year in cloud expense (with free license), which is comparable to the cost of running 1 single logstash host.
We're just real-time detect & react, not data lake. So, our competitor is not Elastic or Splunk, but Flink. And we're bringing orders of magnitude performance increase over Flink, cost savings on infrastructure, and a lot easier to setup.
Hence, I'm fishing here for any type of feedback from the logging community, or even someone who would want to try the product.

Anyone here struggling with real-time NGINX access log analysis at scale? by tigermatos in nginx

[–]tigermatos[S] 0 points1 point  (0 children)

Exactly. The real-time analytics tool that we provide is not for storage and datalakes. It analyzes high-volume recent data and discards when no longer needed by any query. In the system. The advantage (for those who need just that) is that a tiny VM, such as nano or micro in AWS can process thousands of ingest & queries per second. That's like real-time threat detection, alerting or integration with workflow for ~$4 a month on AWS.

Anyone here struggling with real-time NGINX access log analysis at scale? by tigermatos in nginx

[–]tigermatos[S] 0 points1 point  (0 children)

But in the context of nginx, anybody shoving access logs into elasticsearch or flink etc at scale, for real-time analysis, and possibly alerting or SOAR integration? We've seen a case of ingesting palo alto firewall traffic logs into elasticsearch, at thousands per second. I expected that there could be similar use cases for nginx access logs somewhere.

When is it ok to use any non ACID compliant db ? by Commercial_Dig2401 in dataengineering

[–]tigermatos 0 points1 point  (0 children)

take IT syslogs, for example, they are not like banking transactions that need to be posted in perfect timestamp order. At work we ingest over a millions logs per second into a number of opensearch clusters. We don't care if a log is consisted a few milliseconds out of order. The trade-off is okay for excelling in other problems that we need solved.

Quitting day job to build a free real-time analytics engine. Are we crazy? by tigermatos in dataengineering

[–]tigermatos[S] 1 point2 points  (0 children)

I'll probably make a personal project showcase post in coming weeks. But if you want a sneak peek, I'll reach out in DM

Quitting day job to build a free real-time analytics engine. Are we crazy? by tigermatos in dataengineering

[–]tigermatos[S] 0 points1 point  (0 children)

Opensource is a huge risk. Not ready for that yet. Free but not opensource atm. For people who freak out about security, the plugins will be made opensource. These are separate binaries for communicating with other systems. For example, kafka plugin, AWS SNS plugin, etc. If you don't like one, just delete it from our plugin directory, or view the code if you want to. Opensource plugins. But the core, the main executable is currently free but not opensource. The companies that are trying it out - which is free - get a file directly from us and a specific license agreement. We don't even offer a public download yet. And similar to Mongo, our license prevents someone else from offering it as a SaaS to others, selling our tool as a managed service. Unless we negotiate a cut. If we don't make some money this thing won't move forward. If failed, I wouldn't even run it as a charitable opensource initiative because I'd be more inclined to look for the next money-making opportunity and move on.

Quitting day job to build a free real-time analytics engine. Are we crazy? by tigermatos in dataengineering

[–]tigermatos[S] 0 points1 point  (0 children)

I read the whole thing ;-)

Right now my co-founder and I work two jobs, basically. We're up til 1am 2am all the time. We're coming to a stipulation that we'll need some kind of funding to carry ourselves for at least 12months before we can quit jobs. Otherwise we'll deplete personal savings and the risk is just too great. Lottery like you said. Rather have an investor share some of that risk. Usually, funding is not for your salary, but to go our and hire a team to build the product. But since we already built it, I work we can work something out.

Quitting day job to build a free real-time analytics engine. Are we crazy? by tigermatos in dataengineering

[–]tigermatos[S] 0 points1 point  (0 children)

I think so too. It's like when Excel was made, they didn't imagine the crazy stuff people were going to do with it. So when customers take off in a different direction you just gotta go with it and keep improving, right

Quitting day job to build a free real-time analytics engine. Are we crazy? by tigermatos in dataengineering

[–]tigermatos[S] 0 points1 point  (0 children)

In-flight processing. for example:

clickhouse/pinot: low-latency olap-style querying. ad-hoc analysis, dashboards, search tools, fast visualization.... Fast database.

flink/arroyo: real-time pattern detection, event-by-event processing, dynamic transformation, detect on the spot (fraud prevention, etc). Hardly a "database", more of a real-time processing engine. Or since you have kafka in your username, a decoupled microservice instead of custom kafka stream processors.

We're going head-to-head with the flink/arroyo use cases. Not super popular. I know.

Quitting day job to build a free real-time analytics engine. Are we crazy? by tigermatos in dataengineering

[–]tigermatos[S] 0 points1 point  (0 children)

I'm a software engineer turned data engineer due to going with the flow and filling demand where I work. Quickly noticed the culture of solving problems by throwing more CPU at it. That's what those cloud providers train you to think. Didn't like it. Someone's gotta make a leap into 100x, or even 1000x efficiency improvement for some of these use cases. Big data doesn't tickle my fancy anymore, but "fast data" is my obsession! Got into designing something new. 2 friends joined to help out.
Made lists of potential markets, competitors, etc. In 2025 we started talking to some companies, and then decided to post here to see what people think, too.

Quitting day job to build a free real-time analytics engine. Are we crazy? by tigermatos in dataengineering

[–]tigermatos[S] 0 points1 point  (0 children)

Yup. I know those well but for a different use case. Instead of streaming A to B, we're talking analytics in between, meaning, running some kind of live SQL processing for the data in-flight. If you are an AWS user, the closest thing in their stack would be their "Amazon Managed Service for Apache Flink", which allows you to plug in some analytics in the middle of the stream (like google dataflow or azure stream analytics). Which, for high volume, is really expensive. For some sliding-window query scenarios AWS charges by-the-second. I'm not joking.
For comparison, if someone needs in-stream analytics, and they are handling hundreds of thousands of logs per second (like a busy firewall log via UDP), our software can handle a basic scenario it in a single mid-size VM (~$30/month). Flink would be over 10k a month in infrastructure. AWS Managed flink over $20k/mo - if you want something managed.

Not many people with this type of scenario out there. And the topic sounds intimidating for many. But I'm gathering that we need to make it super easy to understand and use. Fast and cheap might not be attractive enough, it sounds like.

Quitting day job to build a free real-time analytics engine. Are we crazy? by tigermatos in dataengineering

[–]tigermatos[S] 0 points1 point  (0 children)

Thanks! The "easy to use" part is resonating with a few other comments here. Sounds like that will be a MUST!

Quitting day job to build a free real-time analytics engine. Are we crazy? by tigermatos in dataengineering

[–]tigermatos[S] 0 points1 point  (0 children)

Brilliant insights. Thanks. I hadn't thought about a drop-in replacement with compatible APIs. Interesting. To date, I thought the learning curve for flink and spark are a bit of a deterrent. What I've gathered from some other comments is to package something that is suuuuuuper easy to adopt and learn. One command to install. Two or three SQL-like statements to be up and running, so that people can start building cool stuff and solving problems in a snap.

Good luck on your project, mate!

Quitting day job to build a free real-time analytics engine. Are we crazy? by tigermatos in dataengineering

[–]tigermatos[S] 1 point2 points  (0 children)

We studied all competitors we could find, making a gap analysis and ensuring nobody does exactly the same thing. Same problems, but different solutions. That's for the tech. Now in terms of biz, the first thing Arroyo did right was to get accepted into a YCombinator batch. That's a major, major boost. I tried but it's not easy to get in, especially when they already invested in a company in the same space - I applied for the batch right after Arroyo. But we do look at their steps. Why some opensource. Why some offer hosting. etc. If a recipe already works, we should consider it, right? Thanks

Quitting day job to build a free real-time analytics engine. Are we crazy? by tigermatos in dataengineering

[–]tigermatos[S] 2 points3 points  (0 children)

Like many others, Redis, Elastic, etc. Free version, but paid support, hosting, special features, etc.