[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: VMS Word context similarities
On Thursday 08 September 2005 09:58, Marke Fincher wrote:
> It is a valid question to ask, what happens if you run
> this program on a file in which we know there is little
> or no contextual relationship between tokens;
Yes, one could try scambling the words in a known text and see what comes out.
Also one can scramble the vms words and see if you get any differences in the
thresholds needed to find any groupings. (which is what one would expect if
there is any structure).
> I have to admit though, the thing that bugs me about this
> approach is that I'm not sure I really believe that
> VMS-words are truly words.
Yes, that is a problem.
> The vocabulary size seems too small;
> the common words are too common and the others
> too rare,
Hm... Zipf's law tells us that it is not too small for the size of the vms and
that the proportions of words are about what one would expect from a natural
language.
What size were you expecting?
Cheers,
G.
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list