[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: determining the word-break character in VMS



"Anders, Claus" wrote:

> IMHO the word-break char "." is really the word-break because
>         - it's the most common character at all
>         - the average word length is statistically with min and max
> Comments ?

Off the top of my head, without calculating any statistics, I
would say that in Hungarian the letter e is more common than
word-breaks (e.g. egyeségedre!). And again, in Arabic breaks
between letters do not correspond to word breaks, thus anhar
"rivers" is written a-space-nha-space-r because a (alif) 
cannot connect to the next letter. Likewise dar "house" is
written d-space-a-space-r.

No, we cannot be sure at all.