[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Re: Word distribution
I have written code that will scan text, extract individual words and count
occurances. However I do not have a full transcription of the VMS, only the
first 19 pages. Anyone got a full transcription handy?
I am also working on a grammar analysis algorithm so a full transcription
would be useful for that too. I currently have Italian, Latin & old English
texts to compare, but any others would be useful. Modern texts are far
easier to obtain and I would also be willing to run those tests.
Any data and conclusions I would post on the George Boole site.
Jeff
----- Original Message -----
From: "Rene Zandbergen" <r_zandbergen@xxxxxxxxx>
To: <vms-list@xxxxxxxxxxx>
Sent: 07 March 2004 07:15
Subject: Re: VMs: Re: Word distribution
>
> --- Nick Pelling <incoming@xxxxxxxxxxxxxxxxx> wrote:
>
> > > > I stand by my assertion (though it chimes with
> > > > my own experience, I don't
> > > > believe I originated it?) that the instance
> > > > count of Voynichese words seems
> > > > generally low compared with natural languages:
> > > > and I also don't believe
> > > > that Zipf's Laws are the right way to test this
> > > > assertion.
>
> [...]
>
> > However, what I suspect happens differently in
> > Voynichese is that which
> > your graphs don't capture for natural language (you
> > only plot the most
> > important 4000 words, right?) - that there are very
> > many more
> > single-instance words in Voynichese than in English
> > texts.
>
> Well, here we are in the unusual situation that
> we can actually test this and settle it once and
> for all.
> I remember having had the same feeling (too many
> words with single instance) and I remember having
> tested it against a plain text, and I remember
> coming to the conclusion that the statistics of
> the PT were similar to that of the VMs.
> I don't have the numbers anymore, and I am not
> sure what the PT language was. I am guessing
> Vulgate Latin. This is obviously significant.
> As Gabriel points out: the length of the PT
> is also relevant.
>
> The impact of uncertain spaces is a bit harder
> to test, but it can be done if there is a
> transcription available (in the interlinear) which
> identifies them.
>
> The point is: someone with a bit of time could
> look at this, against various PT sources, and
> come to a significant conclusion.
>
> Cheers, Rene
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Search - Find what you're looking for faster
> http://search.yahoo.com
> ______________________________________________________________________
> To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
> unsubscribe vms-list
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list