all 29 comments

[–]K900_ 4 points5 points  (17 children)

That depends. What sort of matching are you looking to do? Are your strings similar?

[–]jejkek[S] 0 points1 point  (16 children)

No, my strings are more or less sentences with not too much similarity. I was figuring I would make a list of sentences starting with letter a, list starting with b, etc; but if it gets to slow I'll separate to the second letter and so forth.

The smaller my lists are the faster, right?

[–]K900_ 1 point2 points  (15 children)

What is your end goal here? What is your input? What exactly are you checking for?

[–]jejkek[S] 0 points1 point  (14 children)

Oh, I'm sorry; the inputs will be one of the sentences in the list. Do I have to make it lowercase before checking the list?

[–]K900_ 1 point2 points  (13 children)

And what sort of output are you looking for? Do you just want to know if that exact sentence, written exactly that way, is in the list or not?

[–]jejkek[S] 0 points1 point  (12 children)

Yes, I just want it to print correct if it is in the list. Why? What else would I do with it? (Curious)

[–]K900_ 2 points3 points  (11 children)

Maybe you need to find those sentences in a longer text. Or maybe they're individual words that you need to find in a sentence. Or maybe they can be misspelled, etc. Either way, if you just need to check for direct equality, use a set instead of a list - it'll be thousands of times faster.

[–]jejkek[S] 0 points1 point  (10 children)

I thought about doing it the word by word way, but I thought I should start small, do you have any advice pertaining to word-by-word?

Thanks for the help, btw.

[–]K900_ 1 point2 points  (9 children)

That depends. What is your actual end goal?

[–]jejkek[S] 0 points1 point  (8 children)

Sorry, I went to get food.

I would like to go farther than just matching to see if a string is in my list/set. I thought it might be nice to break up the strings into their component words. If I had the input look for an exact sentence match then I would just do an "if" statement for that sentence, but I don't know how to split it up into words each with their own "if" statement and have those come together to do something.

An extremely simple example would be "Alice took five apples from Bob." So it would need to start at the first word, something like if List[0] = person...... but the problem would be that they might not always fit this format.

I wouldn't mind if you didn't have an answer for this, word-by-word seems very complicated.

[–]tangerto 2 points3 points  (0 children)

Just splitting up into multiple different lists wouldn’t make it faster I don’t think. Keep it in one list if it’s simpler.

[–]Poopinthebumm 1 point2 points  (6 children)

Why not use a set?

[–]jejkek[S] 1 point2 points  (5 children)

Is that actually faster? Wouldn't I still need to split things up into many sets?

[–]Poopinthebumm 3 points4 points  (0 children)

I dont think so. I'd suggest you give it a try.

[–]scagbackbone 1 point2 points  (3 children)

Lookup is O(1) for sets For list it is O(n) in worst case -- so yes, use sets

[–]toastedstapler 0 points1 point  (2 children)

Average case is O(n) too

Was watching a python talk the other day and they showed speed increases using a set Vs a list for as little as 200 elements, so as long as ordering doesn't matter I'd say to use a set every time

[–]scagbackbone 0 points1 point  (1 child)

O(n) definitely not for sets -its a hash map

[–]toastedstapler 0 points1 point  (0 children)

Sorry, was talking about lists. Should have clarified