[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMs: New again to list...
Hi, after several years of being off the list and going back to school I am
back again. I left because I was frustrated with ideas that I couldn't
pursue because of lack of skills and some embarassment at being an armchair
cryptopgrapher. I just started a new job as a developer on the ADaM toolkit
which is used for all sorts of data mining research and am eager to get back
in and try out some of my new skills. The first thing I am interested in
trying is an association rules analysis. Basically the standard thing you do
with this is to analyze sets (say for instance, items in a shopping cart) and
develop 'P and Q imply R' sets of rules as well as lists of common subsets of
arbitrary cardinality. My linguistics training says that treating tokens as
sets is too broad of an approach(order of characters is irrelevant), but in
the back of my mind it occurs to me that it might pick up something that was
missed. Basically this boils down to another attempt at defining a language
'fingerprint' that might be somewhat independant of the underlying
code/cipher.
If anyone is interested in the toolkit, it is freely available for Linux,
Windows and soon we'll be adding OS X. I'm not sure if it's aceptable to
post links but google 'uah itsc adam' and you'll find it. It has dozens of
command line tools that handle text and quite a few more for images. Most
take as an argument .arff files, which is a data mining standard and is
little more than a CSV file with a header, although you can separate with
white space as well as commas. The tool kit includes all kinds of clustering
classifiers, bayes classifers and so on. Any of these commands run with no
arguments yields a help file and there are python wrappers as well.
Is there an index of statistical analyses that have been run so far?
Looking forward to playing again,
Brian
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list