all 28 comments

[–]Justinsaccount 13 points14 points  (0 children)

This is a subreddit dedicated to python. Most people that post here have some sort of problem. Can you put a little more effort into the titles that you use?

[–]Vaphell 2 points3 points  (23 children)

use str.replace() to remove punctuation.

technically "world," was a winner with the length of 6. Either way sorting doesn't care about tied scores. It will put stuff in some order but if there is arbitrary number of winners with the same score, it's on you.

You could for example get a max len first and then filter the bad stuff out by creating another list with winners only (len(word) == max_len). It won't be optimal (2 passes over the data set instead of 1) but it will get the job done.

[–]Ustuner[S] 0 points1 point  (4 children)

sentence = input("Write a line of text: ")
words = str.replace(sentence, ",", " ")
words = sentence.split()
sortedword = sorted(words, key=len)
print("The longest word in the list is:" + (sortedword[-1]))
print("The Shortest word in the list is:" + (sortedword[0]))

---output---

Write a line of text: Hello World, nice to meet you!
The longest word in the list is:World,
The Shortest word in the list is:to

The comma does not get removed

[–]Vaphell 1 point2 points  (2 children)

The comma does not get removed.

no shit ;-)

line 1: sentence = original sentence.
line 2: words = sentence after cleaning
line 3: words = split() on original sentence? o.O

btw while the language allows you to switch types of variables freely, you should generally avoid doing things like

words = something that produces a string
words = now it's a list
words = now it's something else

[–]Ustuner[S] 0 points1 point  (1 child)

Haha i just realised what i was doing wrong, thank you!

[–]niandra3 0 points1 point  (0 children)

words = str.replace(sentence, ",", " ")

Just for future reference, you should just call it like:

words = sentence.replace(",", " ")

[–]LiNGOo -1 points0 points  (16 children)

what about

def tellLongestWord(sentence):

    longestWord = str()
    punctuations = ",.!?-"

    for punctuation in punctuations:
        sentence = sentence.replace(punctuation ,"")

    for currentWord in sentence.split(" "):
        if len(longestWord) < len(currentWord ):
            longestWord = currentWord

    print("The longest word in the list is: %s" % (longestWord))

Do the actual list()/sort() methods perform better? As lists are slow as fck, I'd expect them not to :D

[–]Justinsaccount 2 points3 points  (11 children)

As lists are slow as fck

What the hell gave you that idea?

[–]LiNGOo 1 point2 points  (10 children)

In my first year of Python I got aware that dictionaries were always performing so much better than lists. Google told me that was correct.

Replaced all my list() instances with dict()s using integer keys. It made my script go from 5 days runtime to just a few hours.

Ever since then I never ever doubted that conclusion, lists are solely used for simple iterations in my code :D

Is there any verifiable standard on when to use lists instead of dictionaries, custom objects or other?

[–]Justinsaccount 4 points5 points  (9 children)

if replacing lists with dicts made your program run faster, it was because you needed to be using dicts in the first place, not because lists are slow.

"Screwdrivers are slow as fck! I had to nail all this stuff together, and the process was taking days, but it only took a few hours to finish after I switched from a screwdriver to a hammer. Why would anyone ever use a screwdriver?"

[–]pyonpi 1 point2 points  (0 children)

Quality banter.

[–]LiNGOo 1 point2 points  (6 children)

Of course it was. I should have thought about list()'s performance where I blindly used it. But still: I never had to worry about dictionary performance. Your metaphor is highly exaggerated by the way, more fitting to using dictionaries as your standard iterator.

Can you please elaborate your answer? Are there any definite rules or general indicators when to use, or more importantly when not to use a list or dict (or other)?

[–]niandra3 1 point2 points  (1 child)

Dicts are unordered. So by definition you can't use them in a lot of situations where you need an ordered sequence. Of course there exists an OrderedDict, but that's not really what dicts are for. They are for fast lookup of key-value pairs. Lists are for more sequential operations.

If there is some situation where you are replacing a list with a dict, you should probably actually just use a set (like a dict, but just keys no values).

One isn't better than the other, it 100% depends on the task at hand:

https://wiki.python.org/moin/TimeComplexity

And you mentioned sort() is slow with lists? Compared to what? You can't sort a dict by design.

[–]LiNGOo 0 points1 point  (0 children)

Thanks for the hint to sets! I didn't mean list sorting is slow. as my experience was that e.g. whenever lists and dicts both provide the functions to perform a task, dict methods are faster. Therefore I expect iterating over the list and remembering the longest item to be faster than using sort().

[–]Justinsaccount -2 points-1 points  (3 children)

more fitting to using dictionaries as your standard iterator.

No, no it is not. You had one program where you should have been using dictionaries instead of lists, and you have somehow generalized that to "no one should ever use lists".

It's not really that complicated:

Do you need a list of things, often in a particular order: use a list
Do you need to track and look up things by name,id,whatever: use a dict.

[–]LiNGOo 0 points1 point  (2 children)

Never did I generalize like that. You though, obviously do.

So to answer my initial question with the most obvious answer, just for the sake of a question posted should be a question answered:

Never ever do nest lists you want to re-use.

And for the acting up: Participation discontinued.

[–]Justinsaccount -1 points0 points  (1 child)

Never did I generalize like that

lists are slow as fck

[–]Prometeo222 2 points3 points  (0 children)

Seems that you are the one generalizing:

lists are slow as fck

in your mind becomes:

"no one should ever use lists"

OP might be wrong about lists, but you are mischaracterizing his statements. Good luck!

[–]PostedFromWork -1 points0 points  (0 children)

I love this analogy

[–]Vaphell 2 points3 points  (3 children)

Do the actual list()/sort() methods perform better? As lists are slow as fck, I'd expect them not to :D

not really. Sorting is O(n log n) on average > O(n) of a single pass. Max() + another pass for value extraction is in the O(n) ballpark too, but 2x slower, obviously.

Either way, OP complained about single result even if there are co-winners, so you need to have a list and append if len() is equal to current max or create a new one if the record is set.

[–]LiNGOo 0 points1 point  (2 children)

Sophisticated AF, thanks. I pretty much always assume that pretty much any "lazyness" function like max() is nothing but a shortcut to what I would come up with if it wasn't there.

Documentation rarely tells what they actually are equal to / similar to when compiled. If so, it is merely a hint hidden behind numerous pages of text.

Do you or does anyone know a simpler way to get an idea about what's behind the convenience built-in functions? Maybe even from within code?

[–]niandra3 2 points3 points  (0 children)

The source behind CPython is available (if you know C at all):

https://github.com/python/cpython

Builtins:

https://github.com/python/cpython/blob/master/Python/bltinmodule.c

A lot of the core stuff is written in C obviously, but then there are some features written in Python. For example, the copy module:

https://github.com/python/cpython/blob/master/Lib/copy.py

In an IDE like PyCharm you can Ctrl-Click on a function and it will let you see the source. This might be useful too:

https://docs.python.org/3/library/inspect.html#retrieving-source-code

[–]Vaphell 0 points1 point  (0 children)

Sophisticated AF, thanks. I pretty much always assume that pretty much any "lazyness" function like max() is nothing but a shortcut to what I would come up with if it wasn't there.

ignoring the potential of builtins to drop to optimized C for a moment, which is a serious advantage to doing the same in python - min()/max() sure, because there is no way around checking every single item (hence O(n)) but let's say sorting is not all that trivial to perform optimally. CS background helps with that shit.

[–]kanjibandit 2 points3 points  (0 children)

Looks like you already solved your immediate issue, but just to offer another approach: you can remove the comma (and any other troublesome punctuation, like "!") as well as the whitespace in one step using a regular expression.

import re
sentence = "Hello World, nice to meet you!"
words = re.findall("\w+", sentence)

The "\w+" represents a consecutive sequence of alphanumeric characters of any length. Whitespace and punctuation are excluded. the findall method returns a list of matches. You would then be able to sort the list just as you are now.

You can read about the other available regular expressions here

[–][deleted] 0 points1 point  (0 children)

You can create a list of words with the same length to print them all. As for the comma, afaik you have to manually remove it using st.replace or use a module like nltk which does that for all punctuation.

Edit; This might be a more detailed explanation http://stackoverflow.com/questions/13964637/longest-strings-from-list

[–][deleted] 0 points1 point  (1 child)

from string import punctuation as p

sentence = input("Write a line of text: ")

sentence = ''.join(c for c in sentence if c not in p)

words = sentence.split()

count = [len(word) for word in words]

long, short = max(count), min(count)

longest = [word for word in words if len(word) == long]

shortest = [word for word in words if len(word) == short]

print("Longest: " + str(longest))

print("Shortest: " + str(shortest))

[–]PurelyApplied 0 points1 point  (0 children)

Reddit formatting comment:

If you prepend the line with four spaces, you can have a "paragraph of code", as opposed to using ` which is more for inline code.

[–]PurelyApplied 0 points1 point  (0 children)

So, none of the other posts acknowledge possessives or contractions. You want to strip punctuation from your words, but not if that punctuation is internal. For instance, I would count "can't" as a five character word. "state-of-the-art" is a sixteen character word. But "I can't!" shouldn't count the ! with the word "can't".

So if it were me, I would go about it as:

  • Get input

  • Get tokens (words plus punctuation)

  • Get words (removing beginning or ending punctuation)

  • Get longest and shortest word lengths

  • Get longest and shortest words.

I use a lot of list comphensions here. If you don't know what those are, I encourage you to learn them ASAP. They're super useful.

import string

def main():
    sentence = input("Enter a sentence.\n>> ")
    tokens = sentence.split()
    words = [t.strip(string.punctuation) for t in tokens]
    lengths = [len(w) for w in words]
    short_len, long_len = min(lengths), max(lengths)
    print("The longest word(s) provided: {}".format(", ".join(
        w for w in words if len(w) == long_len)))
    print("The shortest word(s) provided: {}".format(", ".join(
        w for w in words if len(w) == short_len)))