[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Best-fit 2-state PFSMs
29/03/2005 9:46:18 PM, "JH" <haleyj@xxxxxxxxxx> wrote:
>Hi Gabriel
>I did all that ages ago. You eventually end up with a very manageable
>alphabet size.
>The VMS fairy
And what was your alphabet, Miss Fairy?
I had missed this:
>----- Original Message -----
>From: "Gabriel Landini" <G.Landini@xxxxxxxxxx>
>To: <vms-list@xxxxxxxxxxx>
>Sent: 09 March 2005 13:11
>Subject: Re: VMs: RE: RE: Best-fit 2-state PFSMs
>
>
>> On Wednesday 09 March 2005 09:10, Nick Pelling wrote:
>> > Perhaps the best approach might simply be to compare the curves of (best
>> > fit PFSM's information content) vs (number of states in the PFSM) for
>> > different transcriptions? I'd predict that the best transcription should
>> > show a sharp drop in information content once a critical number of
>states
>> > is included... just a thought! :-o
>>
>> There may be something in that. I have been thinking for some time on a
>> similar idea: whether it is possible to investigate the optimum size of
>the
>> alphabet by graphing the number of unexplained occurrences at the expense
>of
>> "contiguous character" amalgamation. Perhaps this can be done using the
>> digraph or trigraph frequencies of frogguy, jsa or eva (or whatever):
>accept
>> as a "new character" the n-graphs which have the highest frequency and
>count
>> the number of other n-graphs (and their frequencies) that become
>unaccounted
>> for.
>> I wonder if there is some transition point that indicates a sudden
>increase of
>> "unexplainability" (what a word!) and if so, whether this is near the true
>> alphabet size...
>> Jacques, any comments?
I have toyed with various objective functions for a long, long time now.
It always come back to the same principles, but I have not found a
way (I have not looked very hard, though). Here it is: use the sum
of the squares of the frequencies of the individual letters. Not good
enough, because you have an obvious, trivial, solution: the one-letter
alphabet! But... let me find it... found it! Here is computation,
written in a bit of an eccentric programming language, but quite
easy to understand. The variable "dic" is a list, an array of
words, each with its absolute frequence, thus dic[i][2] is
the frequency of the i-th word. The language is case-sensitive.
There are only two basic types: atom and sequence. You can
figure out the rest easily.
function Cohesion(sequence dic)
atom sx, sx2, fq, cohesion
if not length(dic) then
return 0
end if
sx = 0 sx2 = 0
for i=1 to length(dic) by 1 do
fq = dic[i][2]
sx += fq
sx2+= fq*fq
end for
cohesion = sx2/(sx*sx)
return cohesion
end function
I don't know what results it would give on various Voynich
alphabets: I have not tried it because there remains the
problem "how do we find the alphabet?" And that is where
I am stuck.
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list