[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: WG: average word length in VMS
Claus Anders wrote:
> > The next step will looking at the roots of all these words to produce a
> > kind of vocabulary.
> > Any hints for extracting a root-word out this (like Jorge's
> > mantle/crust/core)?
> > Claus
Sounds interesting, I don't even have time to look at it right
now, I'm bogged down in learning Linux so I can get in and play,
too. For extracting roots, I considered several different
mathematical ways to do it, but what I finally decided that I
would do first is just sort the list in alphabetical order and
then scroll through the list and see what sticks out. I'd also
reverse the order of characters in every word and sort that
alphabetically too. If you could make both of those lists into
spreadsheets with one character per cell you could pretty easily
follow up all kinds of hunches. I'd say all endings should be
easy to find, except that I remember in formal writing being
confined to one tense, speaking in third person and using active
voice. That makes the 'tion' at the end of words show up more
often than some regular verb endings. Whatever you use, you
might try it on a known language first, to see what you get.
Try it with an ending heavy, yet somewhat irregular language
like Russian, a pretty regular language with endings like
German, and something not too ending intensive and irregular
like English.
Also, don't solve this thing before I get up to speed on my
BASHing and GAWKing!
Regards,
Brian