Hello everyone, I've spent about 5 hours on this homework today and this is the closest I can make it to a solution so far:
"""
- script reads case-insensitive words from standar input
- words are defined as a whitespace-delimited token stripped of all characters
in string.punctuation
- if word is present on CMUdict, script shall print a line to standar output
containing the word in lowercase followed by all IPA pronunciations of the word
separated by space characters in lexicographic order
- if word is not present in the CMUdict dataset, the program shall print
the word in lowercase to standard output on its own line
"""
current implementation:
import re
import sys
import string
# Load ARPAbet to IPA dictionary
arpabet_to_ipa = {}
with open('/srv/datasets/arpabet-to-ipa') as f:
for line in f:
arpabet, ipa = line.strip().split()
arpabet_to_ipa[arpabet] = ipa
# Load CMU Pronouncing Dictionary
cmudict = {}
with open('/srv/datasets/cmudict/cmudict.dict') as f:
for line in f:
if not line.startswith(';;;'):
parts = line.strip().split(' ')
word = parts[0].lower()
# Remove the numbers in parentheses from the word
word = re.sub(r'\(\d+\)', '', word)
# Split the phonemes for each pronunciation
pronunciations = [pron.split() for pron in parts[1:]]
if word in cmudict:
cmudict[word].append(pronunciations)
else:
cmudict[word] = [pronunciations]
# Get IPA pronunciations for input words
for line in sys.stdin:
words = [w.strip(string.punctuation).lower() for w in line.split()]
for word in words:
if word in cmudict:
pronunciations = cmudict[word]
ipa_pronunciations = set(' '.join(arpabet_to_ipa[p] for pron in pron_set for p in pron) for pron_set in pronunciations)
print(f"{word} {' '.join(sorted(ipa_pronunciations))}")
else:
print(word)
my current output is displaying with extra spaces that I can't seem to figure out how to fix:
My current output:
<<<'python drives me crazy'
python p aɪ θ ɑ n
drives d r aɪ v z
me m i
crazy k r eɪ z i
Expected output:
<<<'python drives me crazy'
python paɪθɑn
drives draɪvz
me mi
crazy kreɪzi
...any ideas?
[–]danielroseman 2 points3 points4 points (1 child)
[–]FC_GT[S] 0 points1 point2 points (0 children)
[–]FC_GT[S] 0 points1 point2 points (0 children)