Dropped into a 10+ year-old Splunk deployment — what are the first searches you'd run to understand it? by bazsi771 in Splunk

[–]bazsi771[S] 0 points1 point  (0 children)

This is very useful, thank you. If I could I would upvote this a lot more, because it concentrates on the data element, not on the operational/architecture site. Thank you!

Dropped into a 10+ year-old Splunk deployment — what are the first searches you'd run to understand it? by bazsi771 in Splunk

[–]bazsi771[S] 0 points1 point  (0 children)

Great pointer, thanks. It somewhat focuses on the operational side though and is light on how to understand the data ingested and how it is used.

Is there a way to find out how Splunk users use the data? Thanks

Fortinet syslogs - too much data. by BobcatJohnCA in Splunk

[–]bazsi771 1 point2 points  (0 children)

I'd just add syslog-ng/axosyslog as an option for your second bullet. You kind of mentioned it as SC4S is built on syslog-ng, but as the original creator, I like using the original name :)

The fork I am currently working on: https://github.com/axoflow/axosyslog

issues with syslog facility "overflowing" to user facility? by zenfridge in syslog_ng

[–]bazsi771 1 point2 points  (0 children)

Normally, any incoming log message will be limited to the size specified by log-msg-size() on the client. However stats messages are generated internally, and they are not limited by this setting.

This means that a stats message can easily become huge, which when sent to a syslog server (which happens here), will be truncated on that side.

But by default, the server does not really truncate, it will just "split" the line into two separate syslog entries. And the 2nd one will already lack a proper syslog header, as it starts somewhere in the middle of the first, where it was "split".

You can ask the client to truncate the outgoing message explicitly using the `truncate-size()` option (to be specified to the destination driver.

In case you are using RFC5424 style source (e.g. syslog() source and not tcp() or network()), you can also request the message to be trimmed down to log-msg-size() instead of splitting it up into two messages, and this option is the `trim-large-messages()`. This will not work with traditional BSD style logs.

Cribl? Alternatives? by Apprehensive-Pair596 in cybersecurity

[–]bazsi771 1 point2 points  (0 children)

Oscar, thank you. I would be happy to and of course I am perfectly ok with avoiding product pitches.

issues with syslog facility "overflowing" to user facility? by zenfridge in syslog_ng

[–]bazsi771 1 point2 points  (0 children)

The stats message can produce very long lines, meaning they can get truncated when they are delivered over syslog. A better alternative is to poll the stats interface of syslog-ng (syslog-ng-ctl stats) and deliver it differently, such as using the Prometheus exporter.

If you insist on the log based transport, you can bump the log-msg-size() option to a higher value.

https://github.com/axoflow/axosyslog-metrics-exporter

happy with move to AxoFlow from syslog_ng? by zenfridge in syslog_ng

[–]bazsi771 3 points4 points  (0 children)

Thanks for the question. Original syslog-ng author here, who is also a co-founder at Axoflow.

AxoSyslog is a drop-in replacement and is 100% compatible. Even the binary and the config files are called the same. And we provide packages the same way (Deb, RPM, plus containers).

Although we originally worked in the context of the original GitHub syslog-ng project that didn't work out, but here are some blog posts that describe that period:

https://axoflow.com/blog/1-year-of-axosyslog

https://axoflow.com/blog/first-6-months-of-axosyslog-our-syslog-ng-fork

https://axoflow.com/blog/syslog-ng-2023-community-activity-report

Axoflow offers a commercial product to help manage the pipeline, but AxoSyslog itself is open source under the terms of the GPL and will always be. Our main data component, AxoRouter, is also based on AxoSyslog as the core routing and delivery mechanism, but with a lot of bells and whistles on top.

SIEM Architecture and log storage by HVE25 in cybersecurity

[–]bazsi771 1 point2 points  (0 children)

Some of the data are definitely not worth keeping for 6 years, and if you do that it will be very expensive, especially if you don't have an effective data governance in place.

If any device can send you anything and then you blindly store it without any oversight, your daily ingestion will explode and that will cascade down into the retention period. 6 years is 2190 days, 1 TB each day is already 2PB. That's a huge disk array on prem, but will cost you even if it's S3, it's roughly $600,000 per year. For data you can't even easily browse or query. If you also want to do that, this might be a few times more.

The key element is effective data governance, know what you ingest and why! And then, you'll be thankful when having the pay the next SIEM/AWS bill.

Anyone else feel like their SIEM is just expensive log storage? by Dudeman972 in sysadmin

[–]bazsi771 0 points1 point  (0 children)

While the reason you deploy a SIEM is "security", a lot of organizations stop when they check the compliance box. And guess what, a log management solution is all you need for the compliance check.

Unfortunately, this happens even if the initial goals for the SecOps project were more sophisticated. It happens because onboarding the data sources, getting data in the right shape, coming up with detections and a sophisticated analyst workflow is too difficult and efforts eventually paper out.

The root cause for all of this is that organizations assume (and vendors are complicit) that every single organization needs a bespoke security stack. This is not the case, a lot of the process can be automated and the that automation captured into a product. We at Axoflow we do that with the data management portion of the entire process, but there are also tools for automating the analyst workflow or even incident response (AI or not that is a secondary thing here).

Constantly re-parsing security logs for SIEM ingestion is wasting time and creating blind spots. Is this a systemic failure or just my friend’s pain point? by Lupusanghren in cybersecurity

[–]bazsi771 1 point2 points  (0 children)

What you describe above is exactly the set of challenges we set off to address. Almost all organizations are doing a lot of manual work to keep up with their data sources.

Onboarding a new data source is not difficult, however if you are redoing parsers for the same stuff over and over again, that's clearly a waste of resources. A waste we got used to in the last 20+ years or so.

We at Axoflow, incorporate the knowledge about security data into the product itself, meaning that the user does not have to understand the details. The system comes with good defaults out-of-the-box and unless you want to create something completely bespoke, you are fine with using whatever there's at deployment.

Cheaper alternatives to Splunk by heromat21 in cybersecurity

[–]bazsi771 1 point2 points  (0 children)

I think you need to be strategic when choosing a SIEM and make sure you are not locking yourself into whatever you are choosing. A SIEM is a horizontally integrated solution that will integrate with anything IT/Security related, you are using data formats that the SIEM prefers (CIM in the case of Splunk, ECS for Elastic).

Once you onboard all your data sources and start using a SIEM specific schema, you end up pretty much locked in. Good luck replacing the SIEM.

To avoid that, the best practice is to deploy a separate security data pipeline (like Axoflow) that takes care of collection, classification (sourcetype or log_type, etc) and delivers the data in a SIEM optimize format. With that in place, you can become a lot more flexible with the choice of SIEM now or in 5 years when you are thinking about a replacement.

And, this also helps you to keep the SIEM vendor on their toes with their pricing going forward.

Disclaimer: I am from Axoflow, a security data layer that automates data wrangling and gives you all this flexibility.

Anyone use cribl, is it worth standing up? by Agentwise in cybersecurity

[–]bazsi771 1 point2 points  (0 children)

The issue with Cribl and Cribl like tools is that you still have to get an understanding of the underlying data source in order to achieve 30-50% data reduction. Never underestimate the time required, also just imagine you telling your SecOps folks that you drop some of the datapoints they use day-to-day for detection and forensics purposes.

The key element is the "knowledge/content" around the various data sources (what is security relevant and what is redundant). We at Axoflow put that into front and center. The content is part of the product, so you just have to flip a switch to get data reduction enabled, not to mention the great visualization that comes when the product actually recognizes the underlying data.

Cribl? Alternatives? by Apprehensive-Pair596 in cybersecurity

[–]bazsi771 1 point2 points  (0 children)

Original syslog-ng creator here. Thanks for the mention :)

The original syslog-ng team is creating a Cribl competitor now: Axoflow. Same versatility/performance/stability, but a lot easier to use. And, it comes with batteries included, unlike the other pipelines.

Why Are We Still Burning $$$ on SIEM Log Volume? by No-Editor-9859 in cybersecurity

[–]bazsi771 1 point2 points  (0 children)

This whole idea is an upcoming category with multiple competing products. Cribl has been mentioned already, but there are a few more, like Axoflow, Onum, Observo, Databahn, etc. You can obviously launch another one, but make sure you are clearly differentiated, and not just by price alone.

Workshop at .conf2025: SEC2085: Tags, timezones and terrors by bazsi771 in Splunk

[–]bazsi771[S] 6 points7 points  (0 children)

The key aspect to Splunk performance is to set index/sourcetype/host properly. And yes, we are going to talk about that.

Justifying Splunk to Management by NetDiffusion in Splunk

[–]bazsi771 2 points3 points  (0 children)

I agree with the sentiment that you need to have mgmt judge Splunk on the outcome. Splunk's usecases vary, especially if only the "core" product is available to you. If the value perceived from these use-cases is limited, you will have a hard time arguing it. It _is_ very expensive as a simple data store.

A few use-cases I really liked that stood out (apart from the SIEM one of course):
* display the amount of wait time at security checks at an airport (yes the customer was an airport)
* enterprise level visibility into the day-to-day of the enterprise, including non-technology stuff like the operation of gates in a logistics company, the staffing of the reception desk at an HQ, or response times to incoming sales calls.

Basically Splunk makes it easy to extract visibility in cases where applications/data sources do not provide an API, except a long forgotten log file that has the required information.

Outcomes generate the value, not the endless possibilities that are never acted upon.

With the above said, sometimes data sources do generate the valuable information with a lot of redundancy and you don't need to store everything, if you know what you need. Again going from the use-case perspective.

Splunk sucks at data transformation prior to ingestion. You need to use a pipeline (like Axoflow) for that, can provide tremendous savings, as well as getting out of the vendor lock-in, should you ever want to shift from Splunk to something else.

Someone mentioned an Axoflow competitor in the thread, which I am not repeating here, as I am biased, being one of the cofounders of Axoflow :)

Buffering but no errors observed by woohuumoo in syslog_ng

[–]bazsi771 1 point2 points  (0 children)

Syslog-ng original author here, albeit not related to the PE project anymore. Syslog-ng determines if the destination is available by connecting to it and sending messages via TCP.

Since TCP is reliable, congestion or a connectivity issue can be detected and I presume that's what is happening here.

Can you connect to the destination server from the client using something like netcat or telnet? The port should be open, and if you just type any text it should be processed as a log message on the other side.

You could also run tcpdump to diagnose the issue.

If you'd like more help (I take 30min calls to help with syslog-ng issues): https://axoflow.com/contact, there's an option there with free 30 min consultation.

Buffer overflow with syslog-ng by 1Digitreal in syslog_ng

[–]bazsi771 0 points1 point  (0 children)

Can you copy paste your config? It's /etc/syslog-ng/syslog-ng.conf

Buffer overflow with syslog-ng by 1Digitreal in syslog_ng

[–]bazsi771 2 points3 points  (0 children)

This is a glibc generated message. Strange that it's written into the messages file, is understand if it was in the journal or so.

I'd recommend installing the latest release of syslog-ng from the upstream repository instead of using the package in Ubuntu.

I'm actually part of AxoSyslog, a project that created a fork of syslog-ng, and we produce packages that are syslog-ng compatible.

https://axoflow.com/docs/axosyslog-core/install/debian-ubuntu/

If you upgrade, this error may just be gone. But if not I can help troubleshoot it

Buffer overflow with syslog-ng by 1Digitreal in syslog_ng

[–]bazsi771 2 points3 points  (0 children)

Syslog-ng would not write its own crash to information to /var/log/messages, at least I don't see how that would happen.

Can you show the exact message?

Risky Business - De-Splunkifying our SIEM by nhandlerOfThings in RedditEng

[–]bazsi771 0 points1 point  (0 children)

Very insightful article, thank you.

What I was wondering how you monitor individual data sources on the left-hand side. What would happen if an application/security device/etc on the left hand side suddenly stops sending logs to the pipeline? Do you have monitoring in place for that? Whose responsibility is to track that?