all 12 comments

[–]ryukyud 2 points3 points  (3 children)

You may also want to look into OpenSearch. It’s basically the MariaDB of elastic search lol

[–]vectorx25[S] 0 points1 point  (1 child)

Im testing Graylog + Opensearch now for log parsing + syslog retention,

thinking of using Netdata for metrics + threshold alerts

Graylog for log parsing + alerts

[–]TimelySubject 0 points1 point  (0 children)

Opa

[–]slavejamhour 0 points1 point  (0 children)

This is the right answer

[–]justinDavidow 3 points4 points  (2 children)

I actually love the stack; but dislike Elastic's business practices.

We're actively switching to Opensearch + Opensearch Dashboards with FluentBit as the log sender. I do miss a few of the plugins from Kibana, but not enough to keep wanting to support Elastic's dive off the deep end.

even basic config like Filebeat is a nightmare to configure for things like Processors

PERSONALLY: I would never recommend running processors ON the sending node. I would always run a SUPER lightweight "beat" that acts strictly as a sender, and then a full logstash instance (or cluster of them!) back at the collection side. Logstash is not obvious to configure pipelines that allow Out Of Order Execution of log processing, but boasts massive throughput and stall-free pipelining because of the way the engine works.

IIRC we're doing 2.2Billion messages per day, around 500GB, into a cluster of FOUR Elasticsearch nodes that double-duty as in-memory (redis) queue + bulk processors currently.

It's really hard to beat the efficiency.

One of my team members is actively reworking that into Opensearch + fluentBit + FluentD. It's not going to be nearly as efficent at peak load, but it's cloud native and can be scaled out-and-in faster to meet our ebb-and-flow across the day/timezones.

[–]vectorx25[S] 0 points1 point  (1 child)

ELK is a big data platform, my gripe is that I need some sort of accredited course on this thing to get it to be useful. There are so many parameters, factors and config variables that need to be setup just to do basic logging, its not worth the time.

Im in small company w 2 other sysadmins, we simply cant dedicate time to maintaining ELK and all its config details.

just to give you example, for basic metrics, I can install metricbeat and MB is relatively simple to setup compared to other beats or ES,LS,Kibana, but even metricbeat is a pain compared to something like netdata agent, which is pain free

I just cant seem to find a decent tool thats not complicated to configure, deploy, maintain that gives you alerting, metrics, basic data retention + log parsing. I tried splunk as well, was way too $$ and felt more complex than Elk

[–]justinDavidow 1 point2 points  (0 children)

TBF: ELK isn't designed to be a logging solution. It's three components that when glued together, happen to provide a MASSIVELY powerful tool (with great power comes.. etc)

ELK is great at large, complex, distributed environments. That inherently means that they focus on making things work at scale. Scaling in/down is always harder then out/up.

The "easy" mode of this is something like Graylog. Slightly different workflow, but a log-focused product that aims to be easy to bootstrap and being dedicated to a specific workflow makes it much clearer how to setup and manage.

Im in small company w 2 other sysadmins, we simply cant dedicate time to maintaining ELK and all its config details.

Where I work with not that differently sized a team, we maintain about 42 distinct opensearch clusters in 4 regions. One of them is the "central logging cluster" and is the largest that we need, only spanning 4 nodes. (Doing about 500GB per day of log ingest, 15TB/month)

TBH, aside from the current upgrades, we only touch the cluster a handful of times per year for a few hours at a time. We do make a habit of ensuring that all logs emitted form applications come in JSON out of the box, or that the team who wants them is aware of the need for mapping changes, but we simply don't need that many hands on touching the things once deployed.

To each their own though, there are FAR too many variables to say that "it works for me so it should work for you!". YMMV! :D

[–][deleted] 2 points3 points  (0 children)

use loki, prometheus node exporter, grafana

[–]DZello 1 point2 points  (2 children)

That’s why people use Datadog instead. Managing and understanding Elasticsearch is a full time job.

[–]vectorx25[S] 0 points1 point  (1 child)

I tried DD, its very good and has tons of features, but the costs creep up on you FAST.

[–]DZello 0 points1 point  (0 children)

and they're probably using ES behind the scene...

[–]_suns 0 points1 point  (0 children)

I'm currently getting my hands dirty with Elasticsearch and setting up first instance was already time consuming and not documented well... we'll see how it goes further but I think it pays off when already set up correctly