[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Rene: Stat questions

To: "VMS List" <vms-list@xxxxxxxxxxx>
Subject: VMs: Rene: Stat questions
From: "GC" <glenclaston@xxxxxxxxxxx>
Date: Mon, 21 Jul 2003 21:15:01 -0500
Importance: Normal
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx


Rene,

In regard to your page http://www.voynich.nu/extra/lang.html, a question.

Your page says:

The correlation between two pages was defined as the number of words common
to both pages. If any word occurred several times on one page, each
occurrence was counted. The following example may explain this more clearly:

Page 1:   Ape Ape Bear Cat Cat Cat
Page 2:   Ape Ape Ape  Boar Cat
The number of common words is three: two times Ape and one Cat. Obviously,
the number of common words depends heavily on the number of words on each
page. Since the number of words per page is highly variable (and correlated
with the language used, B pages being much more verbose), a normalisation
factor had to be used. This factor was chosen as a constant divided by the
square root of the product of the number of words on the two pages being
compared. This is not a perfect method, and suggestions for finding a better
'rule' would be appreciated.

I have some interest in the last part especially:

"This factor was chosen as a constant divided by the square root of the
product of the number of words on the two pages being compared."

Are you speaking of a chi2 standardization factor?  And why use something
like this, when you have a verifiable count of pages and words in each
"language"?  What is the downside of using a b-total/a-total as
standardization factor, and then moderating that against the page variant
stats?  "A-page is a percentage of all A pages/B-page is a percentage of all
B pages, standardized by the percentage of b/a pages?  Why wouldn't this be
the most representative of any page statistic?  I ask because I'm facing the
identical statistical quandry and seeking to express myself in the best
light (light that hides the hunched back and mole on the nose.)

GC



______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

Follow-Ups:
- Re: VMs: Rene: Stat questions
  - From: Rene Zandbergen

Prev by Date: VMs: Qualitative vs. Quantitative
Next by Date: Re: VMs: Qualitative vs. Quantitative
Previous by thread: RE: VMs: Operators
Next by thread: Re: VMs: Rene: Stat questions
Index(es):
- Date
- Thread