[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Re: Word distribution



Hi Nick,

On Saturday 06 March 2004 6:10 pm, Nick Pelling wrote:
> I don't recall even a half-way plausible model for VMs grammatical
> constructs being proposed: and wouldn't really know where to start the
> search for one. Any suggestions or half-ideas?

Jorge Stolfi did a concordance analysis (still in his website, I think) but he 
used equivalent characters so similar concordances would still be detected.
I do not remember any candidates for insightful grammatical structures, but it 
is a long time since I read it. 
If words are really "words", then I think that we should expect at least a few
repeated groups of words here and there. Having seen Askham's herbal I thought 
that it was very repetitious: "Tis herbe is called...", but I do not remember 
seeing very common phrases in the vms.

> However, what I suspect happens differently in Voynichese is that which
> your graphs don't capture for natural language (you only plot the most
> important 4000 words, right?) - that there are very many more
> single-instance words in Voynichese than in English texts.

You are referring to the Zipf's webpage with the vms in FSG  and Currier 
alphabets (the only reason for the first 4000 data points was that an early 
version of excel would not allow enough data to be plotted!) but in the 
Cryptologia preprint you get the full plots in EVA.
The low frequency words follow the 2nd Zipf's law (number-frequency law, figs 
5 to 8 in the webpage). Still the vms seems to show the same slope as the 
other few texts I looked into (exponent -0.5).
Really I do not see anything unusual
> Furthermore, I also suspect that a large number of half-spaces (which
> perhaps should be internal to words) have been transcribed as full-spaces,
> which would have the effect of skewing the stats towards common (but non-)
> words, like <or> (which I believe could well be verbose letter-pairs).

I have no answet to that. I can say that the webpage data is from the FSG & 
Currier-D'Imperio transcriptions in FSG  and Currier alphabets respectively, 
the Cryptologia preprint is from the first draft of the evmt transcription in 
EVA with all the possible spaces considered as true spaces. I think that --in 
the sense of how sensitive the analysis is to alphabets and space certainty 
-- the Zipf is robust. 

> Again, I understand and accept your point here: perhaps the assertion I'm
> trying to reach towards is that single-instance Voynichese words seem to
> take up a much larger proportion of the dictionary than in natural
> languages.

That may well be, I am not sure whether it is significant, or what would an 
appropriate statistical test. Fig 8 in the webpage should help to clarify 
this, the problem is that the texts have different lengths.

Cheers

Gabriel




______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list