[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: Voynich mailing list <vms-list@xxxxxxxxxxx>*Subject*: Re: VMs: Number crunching the Fincher window*From*: Gabriel Landini <G.Landini@xxxxxxxxxx>*Date*: Wed, 15 Sep 2004 09:49:11 +0100*Organization*: The University of Birmingham, UK.*Reply-to*: vms-list@xxxxxxxxxxx*Sender*: owner-vms-list@xxxxxxxxxxx*User-agent*: KMail/1.7

I sent this yesterday, but I haven't received it or seen it in the archive. Sorry if you received this twice. G. Subject: Re: VMs: Number crunching the Fincher window Date: Tuesday 14 September 2004 15:43 From: Gabriel Landini <G.Landini@xxxxxxxxxx> To: vms-list@xxxxxxxxxxx On Tuesday 14 September 2004 14:31, elvogt@xxxxxxxxxxx wrote: > If Marke is right (and I understand him correctly), the VM is a hoax. I still do not fully understand why one should be interested in these sorted sub-strings (or substring families), but I find it curious that they should imply a hoax. Why? Mostly because it seems to be unknown what is the effect of this sub-string distribution in other languages and in versions of any word-scrambled texts. It may be more common than one thinks or it can be meaningless. I have the suspicion that this effect may also be related to a measure of string complexity in terms of Lempel-Ziv entropy. Consider that the counting commas algorithm does something (remotely) similar but in the opposite direction (parses the stream into segments that contain new strings and one ends up with a dictionary of substrings). So for a moment think about making the string search it in these terms: For a string s (window size n), calculate the probability of s in the corpus by counting the hits when one slides the window through the entire stream. Repeat for all existing strings of size n. Then repeat for strings with n-1 characters and so on until n=1. Now from the distributions of s at size n, one can calculate the entropy of the n-plets. Then plot this entropy as a function of n. That graph could be a descriptor of the repeatability of the substrings and takes in consideration all strings at all sizes. One could do this for several texts and languages. Now (if I got it right) the 'Fincher' strings are the subset from the entire collection of strings starting from length n=1 that can be mapped into super-strings size n+1 (=2) plus those that can also be mapped in to n=3 and so on until one reaches some arbitrary high n. Considering that vms word structure is quite rigid (and with low entropy) then I would expect that this mapping up (or down if one starts from the longest strings) would be more common in the vms than in sequences with higher entropy. I therefore suspect that looking at the distribution of 'Fincher' strings is an indirect measure of string complexity and entropy. If that is the case, then I am not sure that they'll tell us anything new. Cheers, Gabriel ______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list

- Prev by Date:
**Re: Re: VMs: voynich dice game ... sunday thoughts** - Next by Date:
**RE: VMs: Ways to generate voynichese-like text** - Previous by thread:
**Re: VMs: superblock** - Next by thread:
**AW: VMs: Number crunching the Fincher window** - Index(es):