[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Re: Word distribution



Hi Elmar,

At 11:43 06/03/2004 +0100, Elmar Vogt wrote:
> At 12:20 05/03/2004 +0000, Nick Pelling wrote:
> >However, AIUI Voynichese words generally have a very low instance count,
> >which is hard to reconcile with their being part of a language (whether
> >real or artificial).

Indeed?

I have read Gabriel's excellent work on

http://web.bham.ac.uk/G.Landini/evmt/zipf.htm

lately, and I got the impression that word frequency distribution was
essentially the same as for "regular" languages. Did I go wrong?

You must be extremely careful when interpreting rank frequency law graphs: what they're claiming is that, if you rank all the words in a text by their frequency, then their frequencies will generally tail off according to a certain kind of (logarithmically straight-line) way. However, the same is also broadly true of random texts (as Gabriel mentions and we know the VMs is, in many ways, more structured than random). This is therefore problematic to draw conclusions (especially as to "languageness") from. You must similarly be careful when interpreting number frequency law graphs.


What are Zipf's Laws all about in natural language? FWIW, I believe they reflect three different kinds of mechanisms, which have different (overlapping) degrees of usefulness (and hence frequencies):
(1) syntactic infrastructure (words like "the", "and" etc);
(2) global relevance (signifiers reused globally to explain/describe different things); and
(3) local relevance (signifiers reused locally in a narrative to provide dramatic structure).


You might well imagine (for instance) that "dain"-words are (1), "qoteedy"-words are (2), and that the rest is (3). Maybe. But Zipf's Laws are blunt instruments for probing beneath this kind of skin.

The good thing about Zipf's Laws is that they allow a kind of comparison between radically different texts: but the bad thing about them is they don't tell you about actual instance count per se, because those kinds of things are (for the most part) abstracted out as part of the process.

I stand by my assertion (though it chimes with my own experience, I don't believe I originated it?) that the instance count of Voynichese words seems generally low compared with natural languages: and I also don't believe that Zipf's Laws are the right way to test this assertion.

Cheers, .....Nick Pelling.....


______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list