[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Overfitting the Data
On Sat, 29 Jan 2005, Jacques Guy wrote:
> I won't define it. I'll give an example.
It's hard to beat an example, but if you want a reasonably concise
definition, see http://www.e-paranoids.com/o/ov/overfitting.html.
There seem to be two components to the idea of overfitting. First, using
a model with a number of parameters (input and things that can control how
the input becomes output) that is large relative to the amount of data
underlying the model (the data to which the model is fit). Second,
excessive zeal in getting the model to reproduce the exact properties of a
dataset - not taking into account that it contains random influences, or
at least influences that you cannot possibly take into account.
I think these (always?) come down to the same thing in operational terms.
That is, using an excessive number of parameters - an increasingly
Byzantine procedure as Nick Pelling puts it - is the way to obtain an
excessively accurate fit to the data. This is the process that Guy's turf
example illustrates.
A standard example of overfitting in computation is fitting a set of data
with a polynomial of too high an order. The higher the order of
polynomial the more the curve flexes. If you let it flex enough it can be
made to pass precisely through each of the observed data points, but the
flexing of the curve between the nicely "predicted" (or at least nailed)
points probably has nothing to do with the natural basis of phenomenon
being observed.
A fairly simple process that reproduced any features of Voynichese-oid
text is always interesting of course. It or something functionally
equivalent to it may well tell us something about the nature of the text.
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list