[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: VMS Word context similarities
First, congratulations on getting an algorithm to work
this well. My own efforts at a grouping algorithm
(actually, at many different grouping algorithms)
always seemed to be very sensitive to their initial
conditions and locked themselves into sub-optimal
groupings.
* It would be interesting to see how your algorithm
grouped letters (vowels, consonants, mixed). If your
algorithm's context is based on adjacency, it should
give some result like that.
* It is interesting our most common VMS "word" - daiin
- does not appear in the groups you listed.
I was only investigating "hard" algorithms (entities
must belong to a single group) while it looks like
yours is soft (since "the" appears in more than one
group).
* If so, then I would guess some of your large groups
are supersets of your smaller groups (i.e. "god" also
appears in one or more larger noun groups). And then,
if so, leads to the interesting result that apparently
VMS "words" don't form supersets and subsets of one
another and don't have any mixed use (i.e. sometimes
the same "word" acts like a verb and sometimes a
noun).
:Eric
--- Marke Fincher
<MarkeFincher@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> I wrote a crude program which given a large input
> text tries to identify groups of words which occur
> in
> a similar context.
>
> When I ran it on a bible it suggested the following
> word groupings:
>
> (a,the,and,of,to,in)
> (were,be,was,is,are)
> (had,have,has)
> (his,their,my,your,the)
> (he,i,they,who,you,him,them)
> (shall,will)
> (yahweh,god)
>
> ...along with some very large groups, i.e. one group
>
> for nouns, one for verbs, another for adjectives,
> etc.
>
> I found this quite encouraging, so then using the
> same
> parameters I ran it on the VMS and here is what I
> got:
>
> (ar,or)
> (kor,sor,okor)
> (otchol,tchey)
> (ol,chol,chedy,shedy,qokeey,qokeedy,qokedy)
> (dar,qokaiin,okaiin,qokai!n,okal,qokar,saiin,otar)
> (qokain,otai!n)
> (r,l,sol)
> (tar,ykar)
> (shecthy,olchedy)
> (ched,lkar)
>
> ...but no large groups.
>
> Marke
>
>
>
______________________________________________________________________
> To unsubscribe, send mail to majordomo@xxxxxxxxxxx
> with a body saying:
> unsubscribe vms-list
>
______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list