[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Overfitting the Data (WAS: Another method different from Cardano Grilles)



Jacques Guy wrote:

Overfitting the data happens when the quantity of information (in the mathematical sense of Shannon's information theory) in your predictive machine (your
spinning wheels for instance) converges with the
amount of information in the data set which the
machine tries to reproduce.

Superb, Jacques! I intuitively understood this but you have stated it formally. Would you define "overfitting the data" more fully, for a half-baked (unseasoned) statistician? :-)


Especially - how is overfitting the data different from forming a hypothesis, aside from a hypothesis' ideally being formulated before the fact? It sounds like having too many degrees of freedom in the hypothesis to reproduce the data uniquely. How exactly do you decide whether this has occurred?

and, adding more circles (up to five) he says he could reconstruct the
whole 40,000 words in the VMs.

Voilà! Overfitting the data. But not good enough yet, by far. For how many words which DO NOT occur will those five wheels reconstruct?

Is that, then, the test for whether the data are overfitted?


Dennis
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list