[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

WG: average word length in VMS

To: <voynich@xxxxxxxx>
Subject: WG: average word length in VMS
From: Claus_Anders@xxxxxxxxxxx (Claus Anders)
Date: Sat, 23 Sep 2000 19:30:02 +0200
Delivered-to: reeds@research.att.com
Importance: Normal
Sender: jim@xxxxxxxxxxxxx

> Dear all,
> to compute the "real" word length (as opposite to token length), I wrote a
> small awk-script to compile all char. combinations of the VMS - ignoring
> the token break char. The ouput was reduced to "words" with frequency > 4.
> And only up to folio 101 because my computer became quite slow because the
> needed memory (more than 600 MB). The processing for 1 line was inscreased
> to 1 hour and became expon. bigger with everex next line. 
> The result:
> # of different words with frq. > 4		:	20603 
> average word length of these words	: 	6.88861
> Most of the words differ only in the endings (maybe declinations or
> conjugations).
> It seems, these numbers are more similar to known laguages than the number
> coputed for tokens:
> The next step will looking at the roots of all these words to produce a
> kind of vocabulary.
> Any hints for extracting a root-word out this (like Jorge's
> mantle/crust/core)?
> Claus
> PS words in this context are clusters of characsters within one VMS line
> ignoring token/line/par breaks.

<<attachment: winmail.dat>>

Follow-Ups:
- Re: WG: average word length in VMS
  - From: Brian Eric Farnell
- Re: WG: average word length in VMS
  - From: Jorge Stolfi
- Re: WG: average word length in VMS
  - From: Bruce Grant

Prev by Date: Le Scriptorium
Next by Date: Re: Latin abbreviations (in VMS?)
Previous by thread: Le Scriptorium
Next by thread: Re: WG: average word length in VMS
Index(es):
- Date
- Thread