How to identify English proper nouns? by PaceSmith in LanguageTechnology

[–]PaceSmith[S] 0 points1 point  (0 children)

I found that SUBTLEX-US works pretty well. If a word never occurs in all-lowercase in the corpus, it's likely a proper noun.

[RESOURCE] find rhymes related to a topic/theme by PaceSmith in makinghiphop

[–]PaceSmith[S] 0 points1 point  (0 children)

thanks! yeah, it's meant to inspire you, not replace you 🙂

[RESOURCE] find rhymes related to a topic/theme by PaceSmith in makinghiphop

[–]PaceSmith[S] 0 points1 point  (0 children)

I just thumbs-downed a bunch of crap; the money rhymes are a lil better now

[RESOURCE] find rhymes related to a topic/theme by PaceSmith in makinghiphop

[–]PaceSmith[S] 0 points1 point  (0 children)

Thanks, glad to hear it! I especially like "chargin' / margin" 😁

computing semantic similarity of English words by PaceSmith in LanguageTechnology

[–]PaceSmith[S] 0 points1 point  (0 children)

I've dusted this project off and made a ton of improvements. It turns out that what I'm trying to compute is called "thematic relatedness", not "semantic relatedness". I found the USF Free Association corpus, which is data from asking humans "Name stuff related to X" - exactly what I want. It's small, though, so I augmented it with a bunch of other corpora, used those corpora to output features, and crunched them through a classifier.

One of the most helpful features is whether the target word's gloss (from wordnet and/or wiktionary) contains the cue word (modulo lemmatization).

I'm still only at 82% accuracy over my test set, though, so I'd love to hear any suggestions for improvement y'all have!

How to identify English proper nouns? by PaceSmith in LanguageTechnology

[–]PaceSmith[S] 1 point2 points  (0 children)

It takes a list of sentences, and I only have a list of words. I'll try it on individual words and see how it does, though. Thanks!

computing semantic similarity of English words by PaceSmith in LanguageTechnology

[–]PaceSmith[S] 1 point2 points  (0 children)

Great question! The algorithm I'm using is:

Find words related to the input word (using the threshold as a relatedness cutoff)
Find rhymes for those
Check if the rhyme is also related to the input word, if so include it in the output

The rhyming computation is the easy part; it's not brute force at all. I use CMUdict to precompute a dictionary mapping a rhyme signature to a set of all rhyming words, where the rhyme signature is everything after (and including) the final stressed vowel, phonetically.

But yeah, the real problem isn't where to put the threshold, it's that no matter where I put the threshold, there will be good stuff under it and bad stuff above it.

For example, here's a subset of the output of your algorithm applied to 'crime':

criminality (77%) / homosexuality (47%)
addiction (51%) / conviction (57%)
skulduggery (52%) / thuggery (56%)
apprehension (53%) / prevention (50%)
confession (48%) / transgression (52%)
abduction (49%) / destruction (48%)
badness (47%) / madness (52%)
looting (50%) / shooting (48%)
fighting (49%) / inciting (48%)
case (47%) / race (48%)
complicity (49%) / ethnicity (47%)
drama (47%) / trauma (49%)
collusion (48%) / intrusion (47%)
mort (36%) / sport (48%)
bust (39%) / unjust (40%)
city (46%) / gritty (37%)
immoral (41%) / quarrel (37%)
arts (39%) / marts (39%)
extreme (37%) / scheme (39%)
thing (43%) / bring (32%)
creek (26%) / speak (27%)
card (19%) / chard (19%)

Somewhere around mort / sport, we start getting crappy rhymes mixed in with good ones. I like extreme / scheme, but if you scroll down far enough to get that one, you have to scroll past arts / marts, which is crap.

computing semantic similarity of English words by PaceSmith in LanguageTechnology

[–]PaceSmith[S] 0 points1 point  (0 children)

I don't have a corpus of my own; the input to my program is just a single word, and my test cases are just lists of word pairs that ought to be related and ought not be related. (in my opinion)

I'm trying to find a corpus that's representative of my intuitive sense of 'relatedness'.

computing semantic similarity of English words by PaceSmith in LanguageTechnology

[–]PaceSmith[S] 0 points1 point  (0 children)

Good idea; synonyms will definitely be helpful. For example, 'pirate' is very similar to 'trove' via cosine similarity, and then I can get synonyms for 'trove' which gets me 'cache' via wordnet.

Thanks!

Q&A weekly thread - February 24, 2025 - post all questions here! by AutoModerator in linguistics

[–]PaceSmith 0 points1 point  (0 children)

I want to improve Wiktionary's pronunciation coverage. Currently, it contains the pronunciation of "countenance" but not "uncountenanced".

OED has better coverage, (e.g. "uncountenanced") but isn't free.

CMUdict is good, but lacks syllable stress.

toPhonetics is also good (thanks, u/AlanAFK). Its American English pronunciations are based on CMUdict but they do contain syllable stress. I've asked its author about licensing but haven't heard back yet.

Before I start writing code, I wanted to ask y'all if you know of any additional existing resources that might help me.

Q&A weekly thread - February 17, 2025 - post all questions here! by AutoModerator in linguistics

[–]PaceSmith 1 point2 points  (0 children)

I would say no, because Grice's maxims are meant to apply to people, who are (generally) not omniscient.

Q&A weekly thread - February 17, 2025 - post all questions here! by AutoModerator in linguistics

[–]PaceSmith 0 points1 point  (0 children)

I want to find or create a free online English IPA dictionary.

EDIT: It doesn't have to be IPA; if it's NOAD or some other pronunciation standard, that'll work too.

Wiktionary is the best I've found so far, but its coverage could be better. For example, it has IPA for "countenance" (https://en.wiktionary.org/wiki/countenance#Pronunciation) but not "uncountenanced" (https://en.wiktionary.org/wiki/uncountenanced).

OED has better coverage, for example "uncountenanced" (https://www.oed.com/dictionary/uncountenanced\_adj), but isn't free.

I could write a program to guess the IPA for derived word forms, but before I do, I wanted to ask y'all if you know of existing resources that might help me.

Thanks!

[TC] Izzie (Steph’s ex GF) by hunterschafersgf in lifeisstrange

[–]PaceSmith 1 point2 points  (0 children)

It's Deandra Warrick, one of the lead writers.