[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: RE: Character Frequency Analysis



On Sunday 24 Aug 2003 5:21 am, Larry Roux wrote:
> In this case "c" is usually the "c" in the "ch", "cPh", etc glyphs and most
> other appearances of a "c" like character are listed as "e". 

That's correct. <c> top "arm" is always long and connecting to something else.

> But I think
> it would be constructive to try distinguishing between "ii" (which in
> several cases I believe is 'u') and "iin" and some others to see what the
> structure looks like then.

if <in> is "n" and <ii> is "u" then what is <iin>?
Is it "in" or "m" or "u+something"

And <iiin>?
i+m
u+n
i+u+something?

Same with <ee>. Maybe this is another character, but then what is <eee>?
It could be "e"+"ee" or "ee"+"e" and so on.

> The best I can say about the current stats is
> that it they are consistent within themselves.  The main idea I was working
> from doing this was to see if the pages had a similar set of statistics,
> and it is apparent that this is not always so.  Hopefully that might clue
> something to someone who could take it and run.

One has to keep in mind that the text in many pages is extremely short. The 
comparisons may not be completely reliable. Perhaps you could to the table 
(nice) add how many characters the frequencies are drawn from because the % 
value does not reflect the size of the data set.

Cheers,

Gabriel


______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list