[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: determining the word-break character in VMS
"Anders, Claus" wrote:
> IMHO the word-break char "." is really the word-break because
> - it's the most common character at all
> - the average word length is statistically with min and max
> Comments ?
Off the top of my head, without calculating any statistics, I
would say that in Hungarian the letter e is more common than
word-breaks (e.g. egyeségedre!). And again, in Arabic breaks
between letters do not correspond to word breaks, thus anhar
"rivers" is written a-space-nha-space-r because a (alif)
cannot connect to the next letter. Likewise dar "house" is
written d-space-a-space-r.
No, we cannot be sure at all.