# VMS words and Roman numerals

```    > [rene:] A base-60 system as used by the Babylonians and
> understood (to the best of my knowledge) by later cultures is
> one interesting possibility [to generate the factor `12'] that
> comes to mind immediately. The base numbers could be:
> 1,5,10,20,60,300,600,1200,3600 Hmmm, only 9, not 12.

Hm, these could be the nine yes/no slots in the binomial part of the code.
Indeed the Babylonians (and the Greek, Roman, Chinese...) used a
digit-position code: with different sets of symbols for each position,
omitting the zeros.  Unfortunately, all but the Romans had several
choices per slot; so the word length distribution for those
numerals is not symmetrical.

But your remark made me realize that the *Roman* system, unlike the
others, is actually quite similar to the binary bit-position code,
except that it allows multiple I/X/C letters. Here is the length
distribution d_k for the Roman "digits" from 0 to 9, without the
subtractive notation:

k  d_k  words
-  ---  -----------
0   1   ()
1   2   I V
2   2   II VI
3   2   III VII
4   2   IIII VIII
5   1   VIIII

The length of a Roman numeral between 0 and 999 will be the sum of
three variables, each with this distribution (one for each decimal
position). With a couple of unix hacks, I computed the number R_k of
distinct Roman numerals in 0-999 with each given length k:

k   R_k
---  ---
0     1 (empty)
1     6 (I, V, X, L, C, D)
2    18 (II, VI, XI, XV, XX, LI,... DC)
3    38
4    66
5    99
6   128
7   144
8   144
9   128
10    99
11    66
12    38
13    18
14     6
15     1 (DCCCCLXXXXVIIII)

This distribution is not quite a binomial distribution, but, thanks to
the law of large numbers, it is not very far from one --- specifically,
to binomial(15,k), except for a constant factor:

k   R_k  binm  ratio
---  ---  ----  -----
0     1     1  1.000
1     6    15  0.400
2    18   105  0.171
3    38   455  0.084
4    66  1365  0.048
5    99  3003  0.033
6   128  5005  0.026
7   144  6435  0.022
8   144  6435  0.022
9   128  5005  0.026
10    99  3003  0.033
11    66  1365  0.048
12    38   455  0.084
13    18   105  0.171
14     6    15  0.400
15     1     1  1.000

The match between R_k and the binom(15,k) distribution is not as good
as in the case of the VMS words (the ratio varies from 0.02 to 0.05
over the significative range), but it is close enough to be
suggestive.

So perhaps we do not need to assume nine independent X/empty slots in
the VMS words. Perhaps there are only (say) three slots, each of which
may be filled with a "digit" string of length between 0 and 3. Let d_k
be the number of distinct "digits" of each length k, in a given slot.
It is not necessary that these counts be in the ratio 1:3:3:1. As long
as they are symmetrical (i.e. d_0=d_3 and d_1=d_2), the word
length distribution will be symmetrical and approximately binomial.

All the best,

--stolfi

```