[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: WG: average word length in VMS



Brian Eric Farnell wrote:
>   There has
> to be a more decisive way to fingerprint a language than entropy
> vs avg. word length.

Letter frequency comes immediately to the mind.
(Why am I posting the obvious? My brain must have
gone soft -- next I am going to mention digraph
frequency and triumphantly add "and trigraphs  too!")

>  Again, in line with my idea to translate
> the text into a different format, what if a relative value of
> 'phonosyntactic oddity' were assigned to each VMS token,
> wouldn't we be able to see a pattern that reflected the
> language's outside influences that would help ID it?

Er... phonosyntactic oddity? You mean the way in which
the letters or groups of letters presumably representing
sounds combine together? Jorge Stolfi has done that and
he has come up with something which looks very much like
Chinese -- the infamous "Chinese hypothesis". It sure 
does look the spit and image of Chinese to me. The trouble
is: 

1. Assuming that it is Chinese, which variety of Chinese?
   There are dozen of varieties of Chinese, all really
   different languages, mutually unintelligible. Plus,
   four hundred years ago they were certainly rather
   different from what they are today.

2. It is not necessarily Chinese. My pet "serious" theory
   (no tongue in cheek for once) is an extinct language
   isolate, just like Basque, or Etruscan, but of course
   totally unrelated to either, and which happened to
   have a phonological  structure reminiscent of Chinese.
   I am persuaded that there were hundreds of such languages
   in Europe alone once. In other words, that the linguistic
   picture was very much like Papua New Guinea now. If you
   are after secrecy, it is a much better "cipher" than
   anything available at the time. A "Navaho code", as it were.