The phrases "bench bits" and "almost exactly half" make me think of something. The gallows characters really have several dichotomous features, don't they? That is, - one loop vs. two loops - two straight legs vs. a leg and a hook - leg straddled by a bench or not In addition, you have "bench + no hook + no gallows" and "bench + hook + no gallows" characters. In other words, almost all the combinations of: (no loops, one loop, two loops) x (no hook, hook) x (no bench, bench) They could code decimal digits, though I would expect a more even distribution if so. Perhaps it would be a good idea to look at the joint distribution of these features. Bruce Jorge Stolfi wrote: > > [Bruce Grant:] Speaking of encoded Roman numerals, a dead > > giveaway ought to be the presence of seven different symbols, > > four of which appear in multiples (I, X, C, M) and three of > > which do not (V, L, D), and with certain forbidden diagraph > > patterns: > > > > IV,VI, IX, XI, XV => OK VX => not OK > > XL, LX, XC, CX, CL => OK LC => not OK and so on > > There are indeed rules of this sort that apply to the sequence > of letters in the VMS words. However, I haven't been able to see > any obvious match to the patterns of standard Roman numerals. > > One intriguing fact is that almost exactly half of the VMS tokens have > exactly one gallows, while the other half has none. Also, almost > exactly half of the tokens have "bench" letters (EVA ch, sh, ee); and > this "bench bit" seems to be independent of the "gallows bit". It is > therefore tempting to identify those letters with the 5's of Roman > numerals, e.g. {gallows = V, benches = L}. But then what? And why are > there 4 different gallows, and several different benches? > > Perhaps the 4 gallows represent the Roman "digits" V,VI, VII, VIII, > while the benches stand for L, LX, etc.. But then what are the EVA > letters "a"/"o", and "e", which seem to be pre- and postfix modifiers > for other letters? > > > [Robert:] A quick thought. If the VMs is mostly encoded numbers, > > then there is a fairly powerful test of this hypothesis. > > > > Just as Zipf's Law predicts word frequency, so Benford's > > Law predicts the frequencies of the initial digits of a > > sequence of numbers. In a nutshell, P(n) = log(n+1) - log(n) > > That law may hold for "open" number sets, where the frequency > of a number decreases with its magnitude in the approrpiate way. > It is unlikely to hold for "closed" number sets, such as > telephone numbers or train times. > > Would it hold for a numerical code? I guess that it depends on how the > numbers are assigned to the words. > > All the best, > > --stolfi

