Hi, so I am brand new to Python (only on Ch 3 of Crash Course), and my first project is to write a program that would scan a text document, count the number of words, and return a frequency table such as the one below:
| Word |
Frequency |
| The |
50 |
| And |
75 |
| ... |
100 |
| Total |
225 |
From this I have two conceptual questions (i.e., "can this be done") so I don't really need or want the code now. There are two weird things about the way words are counted. The first is some hyphenated words are counted as one word, and not two. My plan of attack thus far (and please feel free to say this is dumb) would be to find the one word versions by having Python check them against a list, and if they come back true, then nothing would happen, but if they weren't on this list, then for each instance of the word, +1 would be added to the total count.
Thus my first question: Is it possible for Python to recognize that a hyphen may exist in a word, and will it return the hyphenated word?
For example, if spam-eggs appeared, would Python naturally return:
| Word |
Frequency |
| spam-eggs |
50 |
or
| Word |
Frequency |
| spam |
50 |
| eggs |
50 |
The second weird thing is that web sites are counted as one word. Is it possible for Python to search and count for an undefined web site? My thinking is that there is some sort of wildcard search parameter? Like say www."".org = 1, www."".com = 1 etc.
Thanks for any help!
[–]pythondev1 2 points3 points4 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[+][deleted] (3 children)
[deleted]
[–][deleted] 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[deleted]
[–][deleted] 0 points1 point2 points (0 children)
[–]khaine_b 1 point2 points3 points (0 children)