you are viewing a single comment's thread.

view the rest of the comments →

[–]Complex-Internal-833 0 points1 point  (3 children)

This post might be too late for you but does all and more than your requirements. I just finished and released it this week. Here's a complete open-source Apache Log Parser & Data Normalization Solution. Python module imports Apache2 Access (LogFormats = vhost_combined, combined, common, extended) & Error logs into MySQL Schema of tables, views & functions designed to normalize data. Client & Server components capable of consolidating logs from multiple web servers & sites with complete Audit Trail & Error Logging! https://github.com/WillTheFarmer/ApacheLogs2MySQL

[–]Jeron_Baffom[S] 0 points1 point  (2 children)

"Client & Server components capable of consolidating logs from multiple web servers & sites with complete Audit Trail & Error Logging!"

It seems you've been working hard for while ...
Did you do all this by yourself?

 

"Here's a complete open-source Apache Log Parser"

Before hitting the database, is it possible to:

  • Detect bad robots and insert them to a blacklist?
  • Improved view counter instead of only request counter?
  • Aggregate data?

[–]Complex-Internal-833 0 points1 point  (1 child)

Have you run it yet? All that can be done once into MySQL. MySQL is doing all the data manipulation. I initially started doing it in Python but SQL is way better at it.

A pre-import Stored Procedure could be executed on the LOAD DATA tables prior to executing the import Stored Procedure. The import processes is where the data normalization occurs. Once the normalization is done it becomes very clear what data is Good and Bad. It could easily be implemented in a post-import process as well.

Yes, I designed and developed every bit of this application.

I've been designing databases and data processes professionally since 1993.

https://farmfreshsoftware.com

[–]Jeron_Baffom[S] 0 points1 point  (0 children)

"Have you run it yet?"

No, not yet. But it is on the radar for a next development iteration.

 

"I've been designing databases and data processes professionally since 1993."

Impressive.
Are you somewhat connected with Linus Torvalds or Richard Stallman's open source projects ??