[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: VMs: Overfitting the Data

To: <vms-list@xxxxxxxxxxx>
Subject: RE: VMs: Overfitting the Data
From: "Marke Fincher" <MarkeFincher@xxxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 31 Jan 2005 10:34:46 -0000
Importance: Normal
In-reply-to: <Pine.GSO.4.58.0501281646020.11921@spot.colorado.edu>
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx

Most of the time I don't think it is necessary to use sophisticated 
mathematics to gauge overfitting.  Given a generative system, just
examine what percentage of the words it generates are found in the 
VMs.  Some of the hypothetical wheel systems and their functional 
equivalents will unavoidably generate 60,000 or more words! 

Of course, if the VMs was generated by a process that involved a
random element, and even if that EXACT process was repeated, you 
would not get exactly the same set of words again.  But given the 
frequency distribution of VMs words that we see (which is far from 
flat) you would expect the hard-core of frequent words to reappear 
in any subsequent rerun.

...and similarly for any modern proposed generating system to be 
considered successful, it should generate nearly all of the 
frequent VMs words, _but in similar proportions_

At the moment wheels just don't do it for me, unless they are 
highly controlled via some other system from where the frequency, 
word order and phrase patterns originate.

Marke

P.S.  A related thought:
The much debunked and forgotten superblock experiments were able to
generate 50-70% of the real VMs vocab, but within an overall 
generated vocab of only 16,000.

But, for those who weren't there, it was also possible to create 
an English superblock of about 8000 bytes which could recreate
a comparable proportion of the KJV bible!

 
 
 

  
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

Follow-Ups:
- RE: VMs: Overfitting the Data
  - From: Koontz John E

References:
- Re: VMs: Overfitting the Data
  - From: Koontz John E

Prev by Date: Re: VMs: Re: Something of Interest
Next by Date: RE: VMs: Overfitting the Data
Previous by thread: Re: VMs: Overfitting the Data
Next by thread: RE: VMs: Overfitting the Data
Index(es):
- Date
- Thread