[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Words, terms, tokens, etc.



On Mar 8, 10:45, Jorge Stolfi wrote:

 
> The recent exchange about the meaning of "word" and "term" in Antoine's
> messages highlights a common problem.  
> 
> Anyone who deals with text has the need to distinguish between those
> two concepts. Unfortunately each community (if not each author) will
> pick different names for them. For instance, I recall that Gabriel's
> Zipf law paper uses "word" for the instance, and "token" for the
> dictionary entry; whereas the compiler-writer community makes precisly
> the opposite choice.
> 
> So, a plea for all list members: when you post statistics to the list,
> please take the time to define your terms (or words, tokens, whatever 8-).

I agree.  I seem to remember a book titled "Type and Token" by
some lexicostatistician or numerolinguist (or whatever-you-call-em)
a long time ago, making this distinction. There might be 130
instances of the word "cat" in a text; to Gustav Herdan that makes
150 tokens of the "cat" type.  So when I first saw Gabriel's
terms I had a hard time adjusting.

-- 
Jim Reeds, AT&T Labs - Research
Shannon Laboratory, Room C229, Building 103
180 Park Avenue, Florham Park, NJ 07932-0971, USA

reeds@xxxxxxxxxxxxxxxx, phone: +1 973 360 8414, fax: +1 973 360 8178