# Re: VMs: Proposed method

```The weighted mean is calculated from the letter counts for each sample. Take
the english letter E. It may appear 1010 times in one sample, 980 in another
and 950 in another. Finding the mean of these is easy, however if we
calculate a best fit straight line and find the reading halfway through this
then we have weighted it. If we do this for each letter, for the english
language then we will have a series of occurance counts we can use to get a
universal set of occurance percentages. This could produce a unique set for
each language that could be said to be independant of subject matter. This
will then give a curve for each language that would be the average
distribution for that language. Comparing these should show overall
differences between labguages. As I said the number of input samples is the
key point.

If we did the same thing for sections of the VMS, although these samples are
smaller the curves ganerated should show up differences between sections, by
content, which deviate in different ways from the universal. This should
show up content differences with regard to subject matter if there are any.

jeff

> Hello Jeff,
>
> it sounds kind of interesting but I can't quite
>
> --- Jeff <jeff@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > Firstly the universal percentages.
> >
> > Take a sample of input texts in various languages.
> > The larger the number of
> > samples the better. Calculate the percentage
> > occurance for each letter.
> > Using these percentages calculate the weighted mean
> > for each percentage.
> > These will be the universal percentages for the
> > respective languages.
>
> I am lost already. What is the weighted mean percen-
> tage? Is it per letter or per language? How are
> the weights defined?
>
> Cheers, Rene
>
