[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: WG: average word length in VMS
Brian Eric Farnell wrote:
> There has
> to be a more decisive way to fingerprint a language than entropy
> vs avg. word length.
Letter frequency comes immediately to the mind.
(Why am I posting the obvious? My brain must have
gone soft -- next I am going to mention digraph
frequency and triumphantly add "and trigraphs too!")
> Again, in line with my idea to translate
> the text into a different format, what if a relative value of
> 'phonosyntactic oddity' were assigned to each VMS token,
> wouldn't we be able to see a pattern that reflected the
> language's outside influences that would help ID it?
Er... phonosyntactic oddity? You mean the way in which
the letters or groups of letters presumably representing
sounds combine together? Jorge Stolfi has done that and
he has come up with something which looks very much like
Chinese -- the infamous "Chinese hypothesis". It sure
does look the spit and image of Chinese to me. The trouble
is:
1. Assuming that it is Chinese, which variety of Chinese?
There are dozen of varieties of Chinese, all really
different languages, mutually unintelligible. Plus,
four hundred years ago they were certainly rather
different from what they are today.
2. It is not necessarily Chinese. My pet "serious" theory
(no tongue in cheek for once) is an extinct language
isolate, just like Basque, or Etruscan, but of course
totally unrelated to either, and which happened to
have a phonological structure reminiscent of Chinese.
I am persuaded that there were hundreds of such languages
in Europe alone once. In other words, that the linguistic
picture was very much like Papua New Guinea now. If you
are after secrecy, it is a much better "cipher" than
anything available at the time. A "Navaho code", as it were.