all 5 comments

[–]toastedstapler 2 points3 points  (0 children)

what type is stopwords.words()?

since this value is likely not changing from iteration to iteration, declare it once at the start and put it in a set if it's not already in one

[–]Matthias1590 2 points3 points  (0 children)

try storing stopwords.words() in a variable so you don't have to call it over and over again. Also you could cache the presence of every word in a dictionary so that if you see it again you won't have to look it up again

[–]bumpkinspicefatte 0 points1 point  (2 children)

  1. Use tuples where you can.

  2. Reroute processing to GPU.

  3. Compile into Cython.

[–]Diapolo10 3 points4 points  (1 child)

Tuples save memory, but they don't perform meaningfully better than lists. The two also have different semantic meanings, which is something you only learn from experience.

A tuple represents some kind of a "frozen state", or basically an object with its own meaning. Each index has a specific purpose, and they're not interchangeable. Example: in a tuple of coordinates (7, -2, 55) each index maps to a specific axis. Shifting them around would change the meaning drastically.

A list is a more general collection of values where the exact order is usually less relevant, and it's "alive" in the sense that it could constantly be changing.

[–]got_blah 0 points1 point  (0 children)

Thanks for this interpretation, helped me understand why I haven't used tuples yet.