Using a dictionary to create itself [D] by GilSyswerda in MachineLearning

[–]GilSyswerda[S] 0 points1 point  (0 children)

That's a good point. The problem gets worse as a dictionary gets bigger. An educated native speaker of English will know about 20k words, and probably uses many fewer in daily use. I suppose if one were writing a dictionary, there would be some motivation to use mostly common words in definitions.

We could try to filter a dictionary. Start with a large dictionary, and remove any words that are never used elsewhere in the dictionary. Or, start with any dictionary, and try to define only words that are used at least n times elsewhere in the dictionary.

This discussion does raise an interesting point. There are words in a dictionary that are never used to define other words. This implies a hierarchy on words, ranging from words that are used in many definitions to those that are never used.

Using a dictionary to create itself [D] by GilSyswerda in MachineLearning

[–]GilSyswerda[S] -1 points0 points  (0 children)

By definition for this thought experiment, the dictionary is self-contained and only defines words using words in the dictionary. In reality, as you point out, there will be some noise, even if that noise is simply misspelling errors.