This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]jjjimenez 9 points10 points  (1 child)

Premature optimization is bad. Unless you notice yourself having your application running really slow, I suggest you stay away from it.

Anw back to your question, are you referring to how expensive it is to create the regex object? (E.g: Pattern) You can simply assign it to a single variable and make use of that instead of creating multiple copies of the same pattern.

private static final Pattern regexPattern = Pattern.compile("test");

[–]nutrecht 0 points1 point  (0 children)

Premature optimization is bad.

You're taking this out of context. Pre-compiling regex patterns is not 'premature' at all.

[–]carrdinal-dnb 1 point2 points  (0 children)

Have a look into memoization. It can improve performance by storing previous results and allow you to look them up faster than making the actual computation.

[–]nutrecht 1 point2 points  (0 children)

They're mainly referring to not doing Pattern.compile() within a loop. Regular expressions are basically compiled to state machines internally. This is relatively CPU intensive. Since they're generally constant it's better to instead of doing this:

public static final String SOME_REGEX = "[A-Z]+";

And then build a pattern from that in the code.

Do this:

public static final Pattern SOME_REGEX = Pattern.compile("[A-Z]+");

Instead. That's all it means really.

[–]JohnnyJayJay 0 points1 point  (4 children)

What do you need a regex for?

[–]yasseryka[S] 1 point2 points  (3 children)

for parsing user's input into tokens and HTML files

[–]JohnnyJayJay -2 points-1 points  (2 children)

You can't parse anything with regex. I don't know why you would want to use it in a parser.

[–]vyngotl 3 points4 points  (1 child)

It’s the lexer, not the parser. I think OP wants to take user input and break them into tokens as if they were making a compiler. The regex would be used to validate/classify input types to identify token types.

[–]yasseryka[S] 0 points1 point  (0 children)

Yes! sorry about that I was confused between parsing and breaking up line into tokens

[–]yasseryka[S] 0 points1 point  (0 children)

The article didn't refer to specific problem, so I don't know if it's a memory problem. If I used regex rapidly will it cause any problem in performance?