[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Cyrilic

On Fri, 2005-07-15 at 10:38 -0500, Nancy Noell Burk wrote:
> Stefan Urbanek wrote:
> > Hm, is there any table showing occurence of each character at the beginning of a
> > word, at the end, in the middle? Like: Character | Beginning count | Middle
> > count | End count.
> > 
> If that is all you want, it's easy enough to generate with a couple Perl 
> scripts:

Yes, that is all I wanted to know.

> Middle Letter Counts:


> First Letter Counts:
>     8780 o
>     7015 c
>     5392 q
>     4616 s
>     3862 d
>     2187 a
>     2086 y


I would stop here for a moment. Now IF the script is kind of hand
written variant/predcessor of latin or cyrilic alphabet, AND IF the
language is a language from around central/eastern europe (for example
slavic) then the first two letter statistics are interesting. 'o' and
'c' are definitely not with that high statistics as first letter of the
words in the mentioned region/languages. Taking into account the
assumptions, then it looks like that the 'o' and 'c' are parts of a
composed character. As mentioned in one of previous emails, c + l or o +
l can be 'd'. Similar can be applied to combinations of 'i' being u or
iii can be 'tch' (english approximation of c + caron) or 'stch', 'u',
'n' or 'm'...

Same can be applied to other characters. Or not?


To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list