[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Voynich and bee-dance



Gabriel Landini wrote:
On Sunday 28 November 2004 01:13, Jacques Guy wrote:

"Later other researchers such as Wentian Li [10]
have proven that Zipf's law also holds for less
interesting phenomena, such as randomly generated
sequences of characters."

I have the feeling that some think that as Z law appears in random sequences, then it makes Z law in languages understood or irrelevant.


I think it is neither of those.

Z law in randomly generated sequences is due to a *completely different* (and understood) reason compared with its presence in natural languages.
For random sequences it has to do with the probability of the word-separator character and the appearance of increasingly long strings. "Word" and "token" length distributions are therefore very much different from those in natural languages.

At any rate, some of the examples in that paper are clearly false, such as, for
instance, the populations of cities. Try that in
Germany!

Doesn't it hold there? (Am I missing something?).

Explaining the tendency. Google: Die zentralen Orte in Sueddeutschland by Walter Christaller or for English only readers (like me) Central Places in Southern Germany Here is one site: http://csiss.ncgia.ucsb.edu/classics/content/67

Let's say that the VMS did *not* follow Z law. Would that be useful to know? I think so.
Knowing that Z law holds, then polyalphabetic (poly > 2 or 3) substitutions are less likely because each plaintext word should have multiple forms (depending on the alignment with the key). This pushes down the frequencies of all words and the plot becomes flattened (no power law).


Surely Z law alone does not *prove* the existence of a language, but in combination with other pieces of information it becomes (at least to me) more plausible to think that there is one disguised in some way.

Cheers,

Gabriel

A test. Divide a meaningless series of letters containing almost continuously pronounceable strings according to your predilection, isolating the unpronounceable strings, also as tokens. Providing the original series conforms to your chosen language, will the vocabulary word lengths and token lengths distribute themselves the same as in the natural language? With suitable source series, will the rhythm of individual languages show in the result? Some may be familiar enough with Voynichese to remove the blanks and use that. What I am getting at is that the VMS, if artificially generated, could have been divided into tokens by such a method. The question will remain whether the Eastern (isn't it?) rhythm of the VMS is natural or artificial.

Regards ........ Knox



______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list