[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: WG: average word length in VMS

Jacques Guy wrote:
> Brian Eric Farnell wrote:
> >   There has
> > to be a more decisive way to fingerprint a language than entropy
> > vs avg. word length.

	To belabor one of my favorite points.  I think the
thing to do is to assume a language as a hypothesis and
then try to solve on that basis.  

	My current plan.  I assume that the underlying
language is medieval French and that the word divisions
are in fact syllable breaks, because:

1)  Spoken medieval and modern French do not have
distinct word divisions,

2)  If the "words" of Voynichese were in fact
syllables, that would explain the average short "word"
length; and

3)  Medieval French is historically plausible.  By 1480
it was the language of communication of the European
upper classes.  Marco Polo wrote (or his amanuensis
wrote) his travelogue in medieval French.

	I assume that the Voynich system is homophonic; i.e.,
it has several substitutes for the common syllables. 
The pieces of the Voynich word paradigm may well be
short words for mnemonic value.  

	(As I think about this, I wonder whether the Voynich
system is necessarily homophonic.  That would explain
the different statistics for A and B: a homophonic
system used by two different operators.  But we now
think that the difference between A and B may be
different subject matter, different authors' style, and

	Assuming for now that the system is homophonic.  H. F.
Gaines tells how to break homophonic systems.  While
there will be several choices for frequent syllables,
there will be only one for each *infrequent* syllable. 
So one looks for pattern words that include the
infrequent tokens.  This will be no easy matter.  

	There's a precedent for all this.  Louis XIV's Royal
Cipher worked just like this, and the late 19th century
crippie Etienne Bazeries solved it as I just described. 

> Letter frequency comes immediately to the mind.
> (Why am I posting the obvious? My brain must have
> gone soft -- next I am going to mention digraph
> frequency and triumphantly add "and trigraphs  too!")

	Just been there with Hamptonese, and it didn't work. 
That's how you solve a simple substitution cipher, but
both Hamptonese and Voynichese are more complex.

> Er... phonosyntactic oddity? You mean the way in which
> the letters or groups of letters presumably representing
> sounds combine together? Jorge Stolfi has done that and
> he has come up with something which looks very much like
> Chinese -- the infamous "Chinese hypothesis". It sure
> does look the spit and image of Chinese to me. 

	I increasingly agree.  Once again, make this a
hypothesis and try it.  Assume Cantonese or something
and see what happens.

>   If you
>    are after secrecy, it is a much better "cipher" than
>    anything available at the time. A "Navaho code", as it were.

	I can't resist.  I watched an episode of "The X-Files"
where Muldaur and Sculley were looking for secret
government files about contact with aliens.   These had
been leaked, much in the manner of the recent nuclear
secrets hard disk case in the USA.  The government
conspiracy wanted the UFO files back.  However, they
had originally been enciphered by the Navajo
code-talkers.  Their jargon for modern items, of
course, had never been put to paper, so the
code-talkers were the only ones who could ever decipher
the files. What's more, the secret files had made their
way to present-day Navajo, and several of them had
committed them to memory!  So secrecy was blown good
and properly.  The government definitely lost that