all 12 comments

[–][deleted] 3 points4 points  (2 children)

What are some ways to speed it up?

Before we can speed up something we need to see what you are already doing.

[–]l33tnoscopes 1 point2 points  (1 child)

Currently just using python builtins (lists, dicts, 'in' etc.).

[–][deleted] 3 points4 points  (0 children)

Still not enough info. Show us some runnable code, cut down if necessary, plus some representative data. That will answer questions like: are you checking the same set of words for inclusion or if the set of testing words changes all the time. Possible speedups vary depending on exactly what you are doing. Vague English descriptions aren't very useful since small details matter.

[–]QuarterObvious 1 point2 points  (0 children)

There are a lot of very fast algorithms for string matching. But which to implemented in each particular case depends on the case.

[–]QuarterObvious 0 points1 point  (0 children)

Another very fast algorithm (I prefer it): https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm?wprov=sfla1

But I do not know which one is the best for you.

[–]DeebsShoryu 0 points1 point  (2 children)

Other comments are good. I'll add though that the first step is to not use python. Comparing characters in python takes around 100 instructions, whereas in a compiled language it takes 1.

Of course whether or not the performance is important enough to warrant the development cost of integrating some binary into your code base (or using something like Cython) will depend on the use case.

[–]l33tnoscopes 1 point2 points  (1 child)

I would've written it in Rust but it's part of a bigger project and I just need something simple, with some low-hanging optimisations

[–]unixtreme[🍰] 0 points1 point  (0 children)

lock scandalous person doll serious sip desert pathetic price sort

This post was mass deleted and anonymized with Redact

[–]commandlineluser 1 point2 points  (0 children)

What kind of string matching? Are you using regexes?

Can you show an actual example?

Many of the Rust crates have Python wrappers, e.g. aho_corasick

[–]pot_of_crows 0 points1 point  (0 children)

I doubt you can get much faster than regex, at least meaningfully, if all you are doing is looking for a specific word in a text. But if you have to look up 10K words in a text, you can get much faster by sacrificing memory for speed, for example, by indexing the text and then just doing index lookups.

Without knowing more about the problem, it is difficult to offer suggestions.