[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Reducing the VMS to a stream of grouped glyphs...?



For my understanding - what you're looking for seems to be:

Find a probable set of glyphs where 1 VMS glyph (consisting of 1 or more VMS characters) can be
equated with 1 cleartext glyph (again consisting of 1 or more cleartext characters).

iin = the
aiii = a
quo = in
eedy = on

So in fact a monoalphabetic substitution where you use BIG alphabets (like an alphabet consisting of
26*26 bigrams or 26*26*26 trigrams).

This does not seem a bad idea - but how do you construct the set of VMS glyphs ?

Maybe an initial guess would be to find a set of glyphs that is able to construct the whole content
of the VMS while minimizing the amount of glyphs.

This sounds like an optimization algorithm that must have been researched in some mathematical
domain before. But I don't know in which one. What you're describing sounds like a sub-optimal
approximation of this ideal:

1) Choose a set of the most frequent + longest bigrams, trigrams, 4-grams etc.
2) Delete this set.
3) See what's left
4) See if you can fit another set of N-grams to fit this set.

It reminds me of the heuristic I use to find hackers in log-files. First remove all the innocent
events, usually 90%. Then look through the rest. Most of it is innocent too, but a few remaining
events are hack-activity.

Now I'm sure there must be a direct road to what you're looking for. But I don't know where to look
for it.

Hasn't this been discussed before ?

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list