[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: Counting the Gallow Bits




Gabriel Landini wrote:
> If one is interested in the correlation between symbols, then
> assigning numerical values to the symbols of a symbolic sequence
> introduces bias in the new sequence. This has been done in the
> past with DNA sequences and it has been heavily criticised in the
> literature.
> The characteristics of the newly made numerical sequence will vary
> with the selection of the numbers you assign to each symbol.

You are exactly right, that's where I am stuck at.  If the
values were assigned in a meaningful way, it might not be so. 
Alternately, if they were assigned values according to some
formula based on frequency for example, they would have meaning
when compared with other texts prepared the same way.  The way
I'd like to see it would be that charcters with the least
correlation had the largest amplitude difference and the
characters with the most had the smallest.  This might help in
developing a more thorough mathematical fingerprint of a
language's written system.  I'm becoming increasingly aware of
the difference between a natural language and it's written
expression.  As a language geek, I'm very intrigued with the
work the group has done and I think that alot of understanding
about written systems and languages will come out of this work. 
I agree there is a danger of introducing bogons into the data,
but if we could do it without bogons we might be intrigued by
what we find.  Unfortuneately, we may not be able to understand
what we find when we see it without tons of research on other
languages.  One thing I think might come out of the method I
proposed would be a way to identify foreign words, or perhaps a
mathematical measure of foreign influences on the language. More
later.
Regards,
Brian