[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Overfitting the Data



28/01/2005 5:19:47 PM, Koontz John E <John.Koontz@xxxxxxxxxxxx> wrote:

>A standard example of overfitting in computation is fitting a set of data
>with a polynomial of too high an order.  The higher the order of
>polynomial the more the curve flexes.  If you let it flex enough it can be
>made to pass precisely through each of the observed data points, but the
>flexing of the curve between the nicely "predicted" (or at least nailed)
>points probably has nothing to do with the natural basis of phenomenon
>being observed.



You often see it in those books that claim to teach you how to make 
money on the stock exchange by the "technical analysis" methods, 
that is, by just looking at the price fluctuations (and often also
volume fluctuations). The worst (or best?) examples I have seen
resort to Fourier transforms!

>A fairly simple process that reproduced any features of Voynichese-oid
>text is always interesting of course.  It or something functionally
>equivalent to it may well tell us something about the nature of the text.

In fact such a process is what we linguist would consider to
be a purely _structural_ descriptive grammar of the language,
that is, a grammar that accounts for the language observed,
once the meaning and nature of the words (and morphemes) have
been abstracted.  Let me give you an example, taking English.

Imagine that you know no English at all.
This grammar will tell you that "the cat jumped on the table"
is structurally equivalent to "the snake crawled under the rock".
It will tell you that "table" and "rock" somehow belong to
the same category (but it won't tell you what that category is).
It will tell you that "crawled" and "jumped" also belong to
a same category, and that that category is quite distinct from that
of "table" and "rock". It will not tell you that "crawled" and
"jumped" are verbs, but it will tell you that "crawl" is to
"crawls", "crawled" and "crawling" as "jump" is to "jumps",
"jumped" and "jumping". 

I have long held the opinion that solving the VMS involved,
first of all, extracting such a grammar from its text.

If I was still in AI and free to research what I please,
that is what I would do: elaborate the methods, algorithms
if you prefer, to extract such grammars from any corpora
of texts in any languages. But no-one believes in this
approach. That is why I have given up on the VMS, for all
practical purposes. I really don't feel up to tackling the
problem on my own, while having to fight the stupidities
I have heard when I _was_ in AI.


______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list