[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: VMs: split words
> I think you may have misinterpreted the character "*" - where
> this occurs in the transcription, it means "an unreadable or
> ambiguous character"
>
> Can you check this, please?
No problem. I have been treating '*' as a wildcard that represents
either an illegible instance of a 'known' character, or maybe
something else.
So in my statistics I am not counting items like k**ch:1 k**.ch:1
because the intended text in each case may be different.
Although it is impossible to be scientific about it without
repeating the experiment on a whole host of other languages my
feeling is that too often the ending of one word and the beginning
of the next can together form a sequence that also occurs within
VMs words.
If this is true three theories that benefit from this fact are
(1) spaces are (sometimes) misdirectional
(2) Words are constructed by picking (perhaps random) sections from
an underlying text, leading to the same sequences appearing
multiply but with spaces in different positions.
(3) VMs text is entirely numeric. With many numbering systems you
can fuse the end of one number and the beginning of another and
produce a valid number.
Marke
P.S. If anyone wants me to repeat the experiment with a particular
language then please send me a sample
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list