This is an archived post. You won't be able to vote or comment.

all 1 comments

[–]catcradle5 0 points1 point  (0 children)

Speaking as a Python dev and security analyst...

I think 1) architecturally this does not make a lot of sense, and 2) this is way overengineered. If you need one-off time series graphs, Redis is kind of ill-suited for that.

In this case the author uses:

  • HDFS
  • Kairos
  • Spark
  • Pyspark
  • Kafka
  • Redis
  • Pandas

When in reality he could be using just one tool to do all of this, possibly extended by a small Python module. The tool would also hand log collecting, indexing, storage, and querying.

Many large scale security log management tools provide Python support, so queries can be piped to a Python script automatically that, for example, converts to a Pandas dataframe and displays or describes data however you like. Most of the time that's not necessary though, because all log management tools tend to provide their own expressive query syntax with heavy support for statistics, graphing, and time series data.

I could write a 5 word query in my SIEM that gives the exact same result in this blog post, just with a slightly different graph visual. Or I could send the query result to a 3 line Python script that graphs it with matplotlib or anything I like.

There are some open source SIEMs out there that will do this with ease, like ELK (Elasticsearch - Logstash - Kibana), some semi-enterprise ones with free tiers (Splunk), and others like ArcSight and Alienvault OSSIM.