[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: New again to list...

Hi, after several years of being off the list and going back to school I am 
back again.  I left because I was frustrated with ideas that I couldn't 
pursue because of lack of skills and some embarassment at being an armchair 
cryptopgrapher.  I just started a new job as a developer on the ADaM toolkit 
which is used for all sorts of data mining research and am eager to get back 
in and try out some of my new skills.  The first thing I am interested in 
trying is an association rules analysis.  Basically the standard thing you do 
with this is to analyze sets (say for instance, items in a shopping cart) and 
develop 'P and Q imply R' sets of rules as well as lists of common subsets of 
arbitrary cardinality.  My linguistics training says that treating tokens as 
sets is too broad of an approach(order of characters is irrelevant), but in 
the back of my mind it occurs to me that it might pick up something that was 
missed.  Basically this boils down to another attempt at defining a language 
'fingerprint' that might be somewhat independant of the underlying 

If anyone is interested in the toolkit, it is freely available for Linux, 
Windows and soon we'll be adding OS X.  I'm not sure if it's aceptable to 
post links but google 'uah itsc adam' and you'll find it.  It has dozens of 
command line tools that handle text and quite a few more for images.  Most 
take as an argument .arff files, which is a data mining standard and is 
little more than a CSV file with a header, although you can separate with 
white space as well as commas.  The tool kit includes all kinds of clustering 
classifiers, bayes classifers and so on.  Any of these commands run with no 
arguments yields a help file and there are python wrappers as well. 

Is there an index of statistical analyses that have been run so far?

Looking forward to playing again,
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list