[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: John Chadwick (Linear B) of corpus size. Comments invited.

    > Yes, my point was that Chadwick's formula is dead wrong.
    > However, I would like other opinions. I know, I know, over the
    > years, we have thrashed this matter to the death. .... I would
    > hate to see nonsense like "Chadwick's formula" fed to a wide
    > readership. IF it is nonsense. I think it is, but I prefer not
    > to trust my judgment. Comments, everybody?

I had never heard of "Chadwick's formula", and I can't imagine how it
could be derived. Your binary reencoding argument is a good point---
at best, the formula needs some special assumptions.

One can define 6 "limiting" types of undeciphered languages, depending
on whether (1) the script, (2) the language, and (3) the meaning of
the corpus texts are known or unknown. Thus Rongorongo, which is
almost surely in the local language, would be of type NYN (Y=known,
N=not known). Phaistos and Voynich are NNN, Etruscan would lie
somewhere between types YNY and YNN, etc.

Clearly, the amount of text one needs for successful decipherment
strongly depends on the language's type.  Roughly, in
order of increasing difficulty:

  scr lng mng  example                   decipherment needs:
  --- --- ---  ------------------------  -----------------------------------
   N   Y   Y   Egyptian after Rosetta,   A fairly small corpus, basically 
               almost.                   large enough for each glyph to 
                                         occur at least once.
   N   Y   N   Linear B after Ventris's  A somewhat larger corpus, basically
               breakthrough, almost.     large enough for a few dozen function
                                         words and inflections to occur
                                         and be reconized.
   Y   N   Y   Etruscan after Pyrgi,     A fairly large corpus, large
               perhaps?                  enough to pinpoint the meaning
                                         of individual words (rather than
                                         whole sentences) and extract a 
                                         basic vocabulary.
   N   N   Y   (no idea)                 Basically the same as the previous
   Y   N   N   Elamite, perhaps?         A very large corpus, large enough
                                         to spot syntactic structures
                                         and reliably guess their meaning.
   N   N   N   Voynich, Phaistos         Ditto, only harder.

(The type YYY means there is no problem to solve, and YYN is nonsensical.)

The last two entries of the table include cases where the language is
actually known but is still unidentified. Thus Linear B and Egyptian
used to be NNN, but once the language was identified they became YNN
or YNY, and decipherment soon followed. (Hopefully the same will
happen to Voynichese .)

Chadwick's formula does not make sense if it ignores the "little
details" of language and text meaning. Your binary encoding trick may
be said to simplify the script, but make the language much more

All the best,