VMs: Re: various things

(Brief intrusion after a month of forced VMS abstinence...)

> [Rene:] We have proposed solutions by Newbold, Child, Feely, Strong,
> Levitov and Stojko. We also have a non-solution from Gordon Rugg.

Please don't forget Banasik's Manchu solution: it is fairly
well-specified, and it is at least as plausible as those above.

> [Bruce:] P.S. It's hard to believe there isn't a Hmong Bible kicking
> around online somewhere - perhaps some other religious book, such as
> the Koran or the writings of Mary Baker Eddy might be something to
> try.

Once I searched hard for an online bible in Burmese, which should be a
much easier target. I found none, sigh. There is a printed bible,
which is (partly) available as a bunch of scanned images.

Beware that a translation of the Bible makes a very poor sample text
for linguistic analysis. That's because Bible translators usually take
pains to be consistent, namely use the same vernacular word for the
same source word (Hebrew or Latin for Catholic versions, often English
for Protestant ones). They also generally try to preserve the sentence
structure of the original. The resulting text has very skewed word
frequencies, a severely limited vocabulary, and peculiar sentence

For instance, the Bible has many sentences that start with the word
"And". This feature sounds awkward in most European languages, and
presumably in Latin too; in English it is downright ungrammatical. The
feature seems to be the result of literal translation from the Hebrew
text, where (as in Arabic and Ethiopian) starting successive sentences
with "and" seems to be normal and does not sound that bad -- the word
"and" is only a short prefix attached to the next word.

I can see this translation effect when comparing the statistics of
Chinese Bibles with those of a native Chinese novel: the latter has a
larger vocabulary and more doubled words, and the word frequencies
are quite different.

> [Pierre Abbat:] I tried [vtt] and it doesn't work.

The vtt version at http://www.geocities.com/ctesibos/hampton/tools/vtt.zip
is very old (1.3).  I don't know what is the latest one, but I have got
a version 1.7 from somewhere, sometime.

I have fixed vtt 1.7 to accept my interlinear file, which, as Rene
noted, deviates somewhat from the official EVMT file format. The main
differences are (1) it has two periods in the line locators, e.g.
"<f17v.L3.10;V>" instead of "<f17v.L10;V>"; and (2) it has "#"s inside
some {}-comments. The former change breaks the "+@"/"-@" options; the
latter may generate spurious error messages.  I have just mailed the 
vtt patches to Rene; watch this space for further announcements.

> There is no majority vote version.

Indeed the majority versions are not in the interlinear 1.6e6;
they were created later, and will be included in 1.6e7 if and when
I find the time to put that together.  Meanwhile, you can get

which is 1.6e6 with the majority (transcriber code 'A') and consensus
('Y') versions inserted. Beware that the initial blurb of this file
lies when it says "version 1.6e6" (it should be "1.6e6+"), and it does
not explain what those two additional transcriber codes are.

> [Rene:] Take any text in a known language (works best if known to
> yourself :-) ). Do a single substitution but make sure that vowels
> stay vowels and consonants stay consonants.You can be more
> restrictive about the consonant permutations, by (e.g.) keeping the
> liquids as liquids or anyhing you care to try out.

  A charg chid a zhiss rofon loo
  I beom sefosy il i dnoo,

  I dnoo whelo hurkny meuch al bnold
  Ikiarld cho oinch'l lwood vsewark pnoild;

  I dnoo chid seegl id Ket iss tiy
  Int savdl hon soivy inml de bniy,

  I dnoo chad miy ar lummon woin
  I rold ev neparl ar hon hian;

  Uber whelo pelem lrew hil siar;
  Whe ardamidosy safol wach niar.

  Beoml ino mito py veesl sago mo,
  Pud ersy Ket cir migo i dnoo.

         (Xeaco Gasmon, 1914)
(Hand-transposed, watch out for mistakes...)

All the best,

