Hi Nick,
I applied your suggestion (mapping pairs of chars of the original to one or more pairs of a key ( more freq. pairs get more mapped pairs, thus a freq. count will be gain a very flat distribution).372 distinct input pairs are mapped to 1196 possible coded pairs.If you have this coding table:
us 0f0e0d0c
um 95949392919z9y9x
ue 9m9l9k9j
tu 80898887
ti 8x8w8v8u
te 8r8q8p8o
st 8h8g8f8e
se 7u7t7s7r
ri 6z6y6x6w
re 6r6q6p6o
ra 6h6g6f6e
qu 6d6c6b6a5 5.5:59
or 5b5a4 4.
nt 3r3q3p3o
li zxzwzvzt
it y8y7y6y5y4y3y2y1
is yzyyyxyw
in yiyhygyfyeybyax
et vqvpvnvmvlvkvjvh
es vgvfvdvc
er vbu u.u:u0u9u8u7
en uyuwukuj
em t t.t0t9
di sjshr r.
de r5r4r3r2
at pcpbo o.
an o1oyowou
am okojn n.
you can see, 'um' is mapped to :
um 95949392919z9y9x
so, if you have to encode 'um' you have the choice of '95' '94' '93' '92' '91' '9z' '9y' or '9x'.
This mapping is unambigous, as each pair mapping is unique.
Monkey's results are:h0=4.75, h1=3.72 and h0=3.05.
To Gabriel:
What I want to learn by these experiments, is if a pair-wise encoding can reduce the h2 entropy.Looks like that it can be done with sophisticated mapping.
Claus
-----Ursprüngliche Nachricht-----
Von: Nick Pelling [mailto:incoming@xxxxxxxxxxxxxxxxx]
Gesendet: Donnerstag, 20. März 2003 10:55
An: vms-list@xxxxxxxxxxx
Betreff: Re: AW: VMs: context sensitive encoding
Hi Claus,
At 08:38 20/03/03 +0100, Claus Anders wrote:
>BUt I think now, if you encode a text with cipher, no coding algorithm
>exists, which can lower the h2 entropy of a given text to the VMS level
>without lengthen the text.My conclusion is, if the VMS is a cipher, the
>original language has to have the same low h2 entropy too.As the average
>token length is short (compared to most known languages), there can only
>be few 'nulls'.I don't believe (but has to be proven), that a few nulls
>can lower the h2 drastically.
Here's an idea: how about a syllabic cipher (similar to Japanese), but
which is expressed as.pairs of letters from a (fake) alphabet. Something
like...
pa <--> fc
pe <--> of
pi <--> ol
po <--> cc
pu <--> ee
(etc)
Many of the more complex ciphers in the Milanese chancery ledger (most
notably the very first, the Tranchedino's own cipher for communicating with
Milan's envoy in Florence, which [I can only presume] evolved to its level
of complexity over a number of years) have a large number of cipher symbols
for common syllables (like quo, que, qua, etc). So a syllabic cipher is
perfectly consistent with a 1450-1460 dating. In fact, a pure syllabic
cipher would be the logical extension of this
We also have evidence even as early as 1440 of ciphers that use pairs of
letters in a misdirecting alphabet, so a cipher based on pairs of (fake)
letters would also be consistent with the same dating - the only "novelty"
(and I don't think it would be that great a novelty) here is the
combination of the two.
If the code-maker were to do a frequency analysis of pairs of letters of
even a single page of normal MS text, it would be quickly clear which
combinations came up most often, and he/she would then be able to allocate
fake letter-pairs on the basis of maximal misdirection and confusion.
And the punchline? This would decouple the fake alphabet's statistical
distribution from the real alphabet's statistical distribution - a
fifteenth century cryptographer's dream! Anyone performing analysis on the
apparent alphabet would then see only what the code-maker wanted them to see.
Comments?
Cheers, .....Nick Pelling.....
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list