[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Diringer and 's "imprecision" and copy(-daiin) was: intercultural artefact



26/01/02 01:11:20, "Rafal T. Prinke" <rafalp@xxxxxxxxxx> wrote:

>The interesting thing about it is that Diringer says it is
>"imprecise", ie. there are the same letters for different 
>[sounds], so that different words are spelt the same. This would
>probably affect the copy(-x) stats so interestingly explained
>by Jacques?

No, not at all. Imagine a writing system in which the same letter
is used for all the consonants, and another letter for all
the vowels. Thus:

Cv, cvc vc vcc. Vcvcvcv v ... etc., etc.

You will observe far fewer different words, but  the
probabilities of finding the same word exactly P positions
apart remain the same: (n-1)/(N-1) where n is the number of
occurrences of this word in the text, and N the number of
words in the text. 

I even think that, in the case of English (and most European
languages), we will still see that the frequency of copy(-1)
that is, the same word occurring twice in a row, will be
quite low, not as low as in properly spelt English, but 
still significantly lower. I don't feel like testing it
right now, but here is a thought experiment.

Imagine English written in the *ultimate* deficient alphabet:
only one letter!

The above Cv, cvc vc vcc. Vcvcvcv v ... etc., etc., becomes:

xx xxx xx xxx xxxxxxx x


The original question was: what is the probability of
finding the same word exactly P positions apart?

The new question is: what is the probability of 
finding the _same-length_ word exactly P positions
apart?

If the language forbids the same word occurring twice 
in a row, I think it will still show in the statistics.

Does anyone care to comment on this hypothesis before I test
it?