[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Cyrilic

To: vms-list@xxxxxxxxxxx
Subject: Re: VMs: Cyrilic
From: Stefan Urbanek <stefan@xxxxxxxxxxxxxx>
Date: Fri, 15 Jul 2005 18:26:08 +0200
In-reply-to: <42D7D867.2020107@iw.net>
References: <1121417841.42d77a7193712@mail.atlantis.sk> <19828.1121426162@www42.gmx.net> <1121428244.42d7a31419701@mail.atlantis.sk> <42D7D867.2020107@iw.net>
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx

On Fri, 2005-07-15 at 10:38 -0500, Nancy Noell Burk wrote:
> Stefan Urbanek wrote:
> 
> > Hm, is there any table showing occurence of each character at the beginning of a
> > word, at the end, in the middle? Like: Character | Beginning count | Middle
> > count | End count.
> > 
> 
> If that is all you want, it's easy enough to generate with a couple Perl 
> scripts:
> 

Yes, that is all I wanted to know.

> Middle Letter Counts:

<snip>

> First Letter Counts:
>     8780 o
>     7015 c
>     5392 q
>     4616 s
>     3862 d
>     2187 a
>     2086 y

<snip>

I would stop here for a moment. Now IF the script is kind of hand
written variant/predcessor of latin or cyrilic alphabet, AND IF the
language is a language from around central/eastern europe (for example
slavic) then the first two letter statistics are interesting. 'o' and
'c' are definitely not with that high statistics as first letter of the
words in the mentioned region/languages. Taking into account the
assumptions, then it looks like that the 'o' and 'c' are parts of a
composed character. As mentioned in one of previous emails, c + l or o +
l can be 'd'. Similar can be applied to combinations of 'i' being u or
iii can be 'tch' (english approximation of c + caron) or 'stch', 'u',
'n' or 'm'...

Same can be applied to other characters. Or not?

Stefan

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

References:
- VMs: Cyrilic
  - From: Stefan Urbanek
- Re: VMs: Cyrilic
  - From: Elmar Vogt
- Re: VMs: Cyrilic
  - From: Stefan Urbanek
- Re: VMs: Cyrilic
  - From: Nancy Noell Burk

Prev by Date: RE: VMs: A suggestion for Dana's Yale visit.
Next by Date: RE: VMs: A suggestion for Dana's Yale visit.
Previous by thread: Re: VMs: Cyrilic
Next by thread: VMs: A couple of questions.
Index(es):
- Date
- Thread