[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: VMs: algorithm to generate VMS like text



Hi Rene,
 
Here are the results for a file created using stats taken from the complete Currier transcription in interlinear 1.7 Again the file is 1000 lines, 9956 words, 52843 chars broken into 2 files
 
1. approx 5000 words 27875 chars
h0 = 4.52356
h1 = 3.79244
h2 = 2.27052
h3 = 2.34924
 
2. approx 4000 words 24968 chars
h0 = 4.52356
h1 = 3.78540
h2 = 2.25737
h3 = 2.22393
 
I'll leave off the short posts and get a more complete list of stats (for what they are worth!) and post early next week : ave word length, number of words, number of unique words, etc.
 
Regards
Brett

Rene Zandbergen <r_zandbergen@xxxxxxxxx> wrote:

--- Brett Cotton wrote:

> 2. 26k file approx 5000 words
> h0 = 4.45943
> h1 = 3.76424
> h2 = 2.23032
> h3 = 2.20558
>
> 3. another 26k file approx 5000 words
> h0 = 4.45943
> h1 = 3.77193
> h2 = 2.22793
> h3 = 2.24149

Your original file (the one used to build up the
transition probabilities) would have had the same
h1 and h2, but a lower h3. In fact, in the last
example the fact that h3>h2 is either a bug or
a rounding error in the MONKEY program.
Theoretically, the output of a second-order
MONKEY would have resulted in h3 = h2 (=h4 =h5 ...)

Cheers ,Rene

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list



Yahoo! Plus - For a better Internet experience