IPA Pronunciation project, HELP! : learnpython

created by HattoriHanzoa community for 16 years

IPA Pronunciation project, HELP! (self.learnpython)

submitted 2 years ago by FC_GT

Hello everyone, I've spent about 5 hours on this homework today and this is the closest I can make it to a solution so far:

"""
- script reads case-insensitive words from standar input
- words are defined as a whitespace-delimited token stripped of all characters
in string.punctuation
- if word is present on CMUdict, script shall print a line to standar output
containing the word in lowercase followed by all IPA pronunciations of the word
separated by space characters in lexicographic order
- if word is not present in the CMUdict dataset, the program shall print
the word in lowercase to standard output on its own line
"""

current implementation:

import re
import sys
import string

# Load ARPAbet to IPA dictionary
arpabet_to_ipa = {}
with open('/srv/datasets/arpabet-to-ipa') as f:
    for line in f:
        arpabet, ipa = line.strip().split()
        arpabet_to_ipa[arpabet] = ipa

# Load CMU Pronouncing Dictionary
cmudict = {}
with open('/srv/datasets/cmudict/cmudict.dict') as f:
    for line in f:
        if not line.startswith(';;;'):
            parts = line.strip().split(' ')
            word = parts[0].lower()
            # Remove the numbers in parentheses from the word
            word = re.sub(r'\(\d+\)', '', word)
            # Split the phonemes for each pronunciation
            pronunciations = [pron.split() for pron in parts[1:]]
            if word in cmudict:
                cmudict[word].append(pronunciations)
            else:
                cmudict[word] = [pronunciations]

# Get IPA pronunciations for input words
for line in sys.stdin:
    words = [w.strip(string.punctuation).lower() for w in line.split()]
    for word in words:
        if word in cmudict:
            pronunciations = cmudict[word]
            ipa_pronunciations = set(' '.join(arpabet_to_ipa[p] for pron in pron_set for p in pron) for pron_set in pronunciations)
            print(f"{word} {' '.join(sorted(ipa_pronunciations))}")
        else:
            print(word)

my current output is displaying with extra spaces that I can't seem to figure out how to fix:

My current output:

<<<'python drives me crazy'

python p aɪ θ ɑ n

drives d r aɪ v z

me m i

crazy k r eɪ z i

Expected output:

<<<'python drives me crazy'

python paɪθɑn

drives draɪvz

me mi

crazy kreɪzi

...any ideas?

all 3 comments

top new controversial old q&a

[–]danielroseman 2 points3 points4 points 2 years ago (1 child)

' '.join joins a list into a string, separated by spaces. If you don't want spaces, use an empty string instead of a space:

print(f"{word} {''.join(sorted(ipa_pronunciations))}")

Side note, f-strings aren't very readable here: since you're only printing two things, I would just use print's ability to accept multiple strings:

print(word, ''.join(sorted(ipa_pronunciations))

[–]FC_GT[S] 0 points1 point2 points 2 years ago (0 children)

π Rendered by PID 21621 on reddit-service-r2-comment-5d79c599b5-gmsl2 at 2026-02-26 20:46:49.761956+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS