[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: determining the word-break character in VMS

To: "'Jacques Guy'" <jguy@xxxxxxxxxxxxxxxx>, voynich@xxxxxxxx
Subject: AW: determining the word-break character in VMS
From: "Anders, Claus" <canders@xxxxxxxxx>
Date: Wed, 30 Aug 2000 09:10:56 +0200
Delivered-to: reeds@research.att.com
Sender: jim@xxxxxxxxxxxxx



> " [Anders, Claus]  Jacques Guy  " wrote:
> 
> 
	[Anders, Claus]  > Off the top of my head, without calculating any
statistics, I
	[Anders, Claus]  > would say that in Hungarian the letter e is more
common than
	[Anders, Claus]  > word-breaks (e.g. egyeségedre!). And again, in
Arabic breaks
	[Anders, Claus]  > between letters do not correspond to word breaks,
thus anhar
	[Anders, Claus]  > "rivers" is written a-space-nha-space-r because a
(alif) 
	[Anders, Claus]  > cannot connect to the next letter. Likewise dar
"house" is
	[Anders, Claus]  > written d-space-a-space-r.

	[Anders, Claus]  > No, we cannot be sure at all.
	[Anders, Claus]  What I wanted to show with my calculations, that
with "." as word/token break the min/average/max word/token length is
consitently within range. If "e"/"o" (or any other char) would be the break
char, then max word/token length will be far to big.
	("o" as break char would produce a max length of 59, whereas "." has
words of max 13 char)
	Claus

Prev by Date: Re: Brute Force attack on VMS
Next by Date: Re: Brute Force attack on VMS
Previous by thread: Re: Brute Force attack on VMS
Next by thread: [no subject]
Index(es):
- Date
- Thread