[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: VMs: split words

> I think you may have misinterpreted the character "*" - where 
> this occurs in the transcription, it means "an unreadable or 
> ambiguous character"
> Can you check this, please?

No problem.  I have been treating '*' as a wildcard that represents 
either an illegible instance of a 'known' character, or maybe 
something else.  

So in my statistics I am not counting items like k**ch:1 k**.ch:1
because the intended text in each case may be different.

Although it is impossible to be scientific about it without 
repeating the experiment on a whole host of other languages my 
feeling is that too often the ending of one word and the beginning 
of the next can together form a sequence that also occurs within 
VMs words.

If this is true three theories that benefit from this fact are
(1) spaces are (sometimes) misdirectional
(2) Words are constructed by picking (perhaps random) sections from
    an underlying text, leading to the same sequences appearing 
    multiply but with spaces in different positions.
(3) VMs text is entirely numeric.  With many numbering systems you
    can fuse the end of one number and the beginning of another and 
    produce a valid number.


P.S.  If anyone wants me to repeat the experiment with a particular
language then please send me a sample
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list