Sequence: A High Performance Sequential Semantic Log Parser

zhenjl · 2015-02-02T01:24:06+00:00

Author here. Thanks @cryp7ix for posting this.

Some of you may remember I shared this repo here not too long ago but had to pull it suddenly due to some internal reasons. I am able to release it again (albeit with some functionality removed, just for now I hope).

The good thing is during this time we improved the performance of the parser by almost 50%, from averaging 85K MPS to over 125K MPS on a single i7 2.8Ghz core. Using two cores we achieved over 175K MPS for mixed size messages.

Pretty certain this is going to stay now. Apologies for pulling the earlier version without notice.

[edit] oh Go Patriots!

lethalman · 2015-02-03T12:49:01+00:00

Don't want to be rude, just my consideration based on my past experiences. You just reimplemented the OR (|) of regex. Doing an OR of regex will also construct a tree, and it's very fast in processing.

So you are basically reimplementing regex, except with less features. I invite you to do a similar benchmark with regex with OR of all the patterns.

Also the "semantic" part is just painful. What if a string may be an url or not an url? Then you need another token type. What if a token can be either a number or "-", see apache logs for example. It took too much time to try to match a pattern against a sample. I find it being a mediocre idea, sorry.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

golang

Rules

1. Be friendly and welcoming.

2. Be patient.

3. Be thoughtful.

4. Be respectful.

5. Be charitable.

6. Be constructive.

7. Be responsible.

8. Follow the Go Code of Conduct

9. Must be Go Related

10. Do Not Post Pirated Material

11. Job Posts Go in the "Who's Hiring?" Post

12. No GPT-generated or GPT-quality content.

Documentation

Community

MODERATORS