[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: word database and binomial distribution



Thanks, looks like a good place to play.

We must be doing something differently. Compare my plot with yours.

http://home.earthlink.net/~knoxmix/id21.html

I have not tried to determine why this list gives a less extended right leg than most of the meaningful documents I have looked at. The Towneley Plays and Liber Salomonis are exceptional so perhaps the VMS is, also. However, I am not able to compare the curves mathematically. A true binomial distribution of unique words could mean something entirely different than an apparent but not binomial distribution.

Ciao ...... Knox


Eric wrote:


First, for all of you text analysis people out there,
I found this database of English words (in case not
already known):

http://www.psy.uwa.edu.au/mrcdatabase/uwa_mrc.htm

There is an online interface to the database and you
can download the whole database and search
application. It is very extensive and includes useful
items like # of syllables, # of phonemes and phonemic
syllabic transcription.

I found it while playing around with ideas of
investigating syllabic and phonemic properties of
English text, especially in regards to what Stolfi had
reported years ago about the VMS word length:

http://www.dcc.unicamp.br/~stolfi/voynich/00-12-21-word-length-distr/

I was wondering perhaps if that was syllabic or
phonemic in structure (all VMS phonemes written as two
characters or all syllables being two characters for
instance). While plotting those, I also plotted word
length and... I got a binomial plot for it???. I used
the summary data from this page:

http://www.psy.uwa.edu.au/mrcdatabase/mrc2.html#NLET

There are some anomolies of the data in the database
(search for all one letter words for instance), but it
doesn't look systemic. The plot is much sharper than
what Stolfi shows for "English" and "Latin" - maybe
because the database has a much larger sample of
words. My first reaction is to take that to mean the
binomial nature of the VMS pointed out by Stolfi is
unique only in that it occurs in a small data sample.

Thanks,
Eric


______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list