[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Entropy again
The hardest part of doing letter frequency counts or entropy analysis is determining what an individual glyph is. I have not looked at the fonts other than EVA but at least that particular style is better built for re-creating the "look" of the language than for character analysis.
For instance, EVA "t" is probably "q-something" l? and is "ch" 2 letters? Is "cPh" one, two, three or more letters? if more than one, how many and what order? Is that particular character "e" or "i"? "d" or "m"? Do you write "ch" or "eh" or "ei"? "r" or "s" (a lotta those look alike)?
I guess for analysis you need at least need to be sure to use "cph" instead of "cPh" or you would get a p vs P thing going on.
The more I look at the pages the more I think there are only about 12 basic Voy characters:
a o e i b q d l n r s y
The rest are combinations of these. ch is ei run together, t is ql run together, etc. Not sure what p is yet - ql and ?....x, in places, looks a lot like a version of Sh.
Just my 2 cents - and that won't even get you a cup of coffee so you know what that's worth.
******************************
Larry Roux
Syracuse University
lroux@xxxxxxx
*******************************
>>> jguy@xxxxxxxxxxxxxxxx 03/01/03 19:55 PM >>>
I am, slowly, considering rewriting Monkey, to allow, among
other thing, for the extended (8-bit) ASCII set. At least,
now that I have installed Armenian and Cyrillic
keyboard drivers, and the fonts, I know better what I am
up against (a mild nightmare).
I plan to have a different alphabet-definition screen, with
3 "letter types":
1. active letters
2. completely ignored letters
3. letters to be treated as spaces (word breaks)
and to allow for saving the alphabet definitions onto
disk.
Likewise, I plan to allow for saving the matrices
used for computing the entropy and generating text,
since building them is quite expensive in computer
time. This means that you could have N configuration
files for, say the VMS, one a character-monkey with
such and such glyphs ignored or treated as spaces,
another a character-monkey again but with a different
"alphabet", another a word-monkey with a particular
definition of the alphabet, another a word-monkey again
but with another definition, and so on, and so on.
And other configurations for Russian, English, French
or whatever languages. I would have to stop short of
16-bit encoding, though, because I have no idea how
to handle displaying Egyptian hieroglyphs or Chinese
characters, short of going into graphics mode, and
how portable or, rather, UNportable that would become
(I am writing this in Euphoria, an interpreted language
available for DOS, Linux, and BSD at the moment. For
Windows too, but if I go into that, the code will no
longer be portable).
Of course, I expect I have missed a few things.
So, your turn.
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list